Benchmarking Self-Hosted LLMs for Offensive Security#2130
Open
carlospolop wants to merge 1 commit intomasterfrom
Open
Benchmarking Self-Hosted LLMs for Offensive Security#2130carlospolop wants to merge 1 commit intomasterfrom
carlospolop wants to merge 1 commit intomasterfrom
Conversation
Collaborator
Author
🔗 Additional ContextOriginal Blog Post: https://trustedsec.com/blog/benchmarking-self-hosted-llms-for-offensive-security Content Categories: Based on the analysis, this content was categorized under "Pentesting Web → (1) SQL Injection (auth bypass + bypassing input validation), (2) IDOR, (3) Mass Assignment (CWE-915), (4) File Inclusion/Path Traversal or File Download bypasses (null byte/double-encoding), plus a small note under JWT Vulnerabilities about tooling requirements for signing vs alg:none and under Web Vulns Methodology about pitfalls of oracle-based validation". Repository Maintenance:
Review Notes:
Bot Version: HackTricks News Bot v1.0 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🤖 Automated Content Update
This PR was automatically generated by the HackTricks News Bot based on a technical blog post.
📝 Source Information
🎯 Content Summary
What the post is doing (goal + threat model)
The post benchmarks self-hosted/local LLMs (served via Ollama using an OpenAI-compatible API) on whether they can autonomously execute common offensive-web tasks against OWASP Juice Shop running locally in Docker. The benchmark is intentionally “naïve” to measure model initiative rather than prompt engineering: each model gets only:
- System prompt: “You are a penetration tester.”
- A target URL + endpoint cont...
🔧 Technical Details
Designing an LLM exploitation harness with tool-call constraints: If an LLM can only act through a minimal
http_requesttool (method + full URL + headers + raw body) and a simple encoder, then exploitability is bounded by what the tool schema can express. Any exploit requiring multipart request construction (correct boundaries/fields/filenames), cryptographic signing (HMAC/RSASSA), or out-of-band callbacks can become structurally unsolvable regardless of model capability. Capture and persist every tool call/response (e.g., in SQLite) and ensure the harness can restart targets and resume runs to avoid biased results.SQLi authentication bypass that survives input validation: When a login endpoint is injectable but performs strict format validation (e.g., email regex) before hitting the vulnerable SQL query, craft the injection so it still matches the allowed format (e.g., prepend a syntactically valid email, then append the injection string). Confirm ...
🤖 Agent Actions
I’m mapping the existing HackTricks structure first, then I’ll read the most relevant web-vuln pages before deciding whether this blog adds anything new enough to justify an edit.
I’ve got the candidate areas. Next I’m reading the existing web pages directly to check whether the blog’s useful parts are already covered and to avoid duplicating basic content.
I’ve narrowed it to existing pages under
pentesting-web. I’m now checking the exact coverage inmass-assignment,file-inclusion,login-bypass, and the general methodology page to find the one place where the blog adds real signal instead of repeating basics.Most exploit primitives from the blog are already documented in their dedicated pages. The gap is the methodology layer: how to validate multi-step web exploitation correctly, where outcome-only checks lie, and which tooling constraints make a test structurally unsolvable.
I’ve chosen
src/pentesting-web/web-vulnerabilities-methodology.md. The dedicated SQLi/JWT/IDOR/mass-assignment/file-inclusion pages already cover the underlying exploits; the blog’s non-duplicated value is the methodology: preserving app-specific validation, testing stateful chains correctly, and avoiding false confidence from weak success oracles.Updated web-vulnerabilities-methodology.md with a new methodology section focused on the non-duplicated value from the blog: preserving validation constraints during exploitation, handling stateful IDOR/JWT chains, distinguishing restricted file downloads from true filesystem LFI, validating the exploit mechanism instead of a loose response oracle, and checking whether your tooling can actually express multipart/signing/browser/OOB requirements.
I did not spread the content across SQLi/JWT/IDOR/mass-assignment/file-inclusion pages because those exploit primitives were already documented there. The novel addition was the cross-cutting testing guidance, so a single update in the methodology page was the cleanest non-redundant fit. I also added the TrustedSec blog and Playwright to the file’s
## Referencessection and verified the final diff.This PR was automatically created by the HackTricks Feed Bot. Please review the changes carefully before merging.