The flag is usually hidden in plain sight
You opened your first web challenge. The page loads, there is a title, maybe a login box, maybe nothing at all. There is no obvious place to type a flag, no big red SOLVE ME button. So where is it? In the overwhelming majority of beginner web challenges, the flag (or the next clue to it) is already on the page or one short request away. You just have to look where the browser does not show you by default.
Here is the entire entry-level recon loop, in order. Run it on every web challenge before you do anything clever:
1. View source (Ctrl+U). Read every HTML comment and <script> tag.2. Open DevTools (F12). Check the Network tab and the Sources tab.3. Fetch /robots.txt and /sitemap.xml.4. Probe the obvious files: /.git/, /backup.zip, /index.php.bak, /admin.5. Look at cookies and localStorage in the Application tab.6. If nothing yet, brute-force directories with ffuf or gobuster.
That is the whole post in six lines. Everything below explains why each step works, what you are actually looking at, and the exact commands to run. If you are brand new to CTFs, start with the picoCTF Beginners Guide for the lay of the land, then come back here for the web-specific moves.
What is the recon mindset, and why does it win?
A web server hands your browser a pile of files: HTML, JavaScript, CSS, images, and the occasional configuration leak. The browser renders a tidy page out of that pile and throws the messy parts away from your view. Recon is the habit of refusing to accept the rendered page as the whole truth. The flag lives in the pile, not the picture.
The rendered page is what the author wanted you to see. Recon is reading everything they forgot to hide.
Three questions drive every web recon session. Keep them taped to your monitor:
- What did the server actually send? Not the rendered page, the raw bytes. Comments, hidden fields, and dead code all survive in the source.
- What else is on this server that I was not linked to? Backup files, admin panels, old endpoints, and version-control directories sit at predictable paths.
- What is the client storing or sending on my behalf? Cookies, tokens, and local storage are a recon surface the page never advertises.
Beginners lose time because they treat a web challenge like a puzzle to be reasoned out from the visible page. Strong players treat it like a search problem: enumerate everything the server exposes, then read it. The challenge picoCTF 2019 dont-use-client-side is the canonical lesson here: the password check runs entirely in JavaScript that was shipped to your browser, so reading the source hands you the answer.
How do I read the page source and find hidden comments?
The fastest recon move on the planet is Ctrl+U (or right-click and choose "View Page Source"). It shows the raw HTML the server sent, before JavaScript rewrote anything. Authors of intro challenges love to tuck the flag, a hint, or a hidden path into an HTML comment that never renders on screen:
<!-- TODO: remove before launch --><!-- flag is at /super_secret_admin_page.html --><!-- picoCTF{c0mm3nts_4r3_n0t_s3cr3t} --><input type="hidden" name="debug" value="0">
Read the whole thing. Comments, hidden <input> fields, base64 blobs in data- attributes, and linked .js files are all fair game. If you prefer the terminal, curl fetches the raw source so you can grep it:
# Dump the raw HTML the server sendscurl -s http://example.com:8080/# Pull out just the comments and hidden fieldscurl -s http://example.com:8080/ | grep -E '<!--|hidden|picoCTF'# List every script the page loads so you can read each onecurl -s http://example.com:8080/ | grep -oE 'src="[^"]+\.js"'
Two picoCTF challenges drill this exact skill. picoCTF 2022 Inspect HTML hides the flag in the markup, and picoCTF 2022 Search Source buries it inside one of several linked source files so you have to grep across all of them. Solve both and the reflex sticks.
What do the DevTools Network and Sources tabs reveal?
Press F12 (or Ctrl+Shift+I) to open Developer Tools. It is the single most important web recon instrument in the browser, and it is free. Four tabs matter for CTF recon:
- Elements shows the live DOM after JavaScript has run. Use it when the page builds content dynamically and view-source comes up empty.
- Network records every request the page made: the HTML, every script, every image, and crucially every background API call (often labeled XHR or Fetch). Reload with the tab open and read the list top to bottom.
- Sources lists every JavaScript file the page loaded, fully readable and pretty-printable. This is where client-side logic, hardcoded endpoints, and the occasional plaintext credential live.
- Application shows cookies, localStorage, and sessionStorage. Covered in its own section below.
The Network tab is the one beginners under-use. Many challenges fetch data from a /api/ endpoint that is never linked anywhere on the visible page. Reload the page with Network open, click each request, and read its Response. You will often see the endpoint that holds the next clue:
GET / 200 document (the page)GET /style.css 200 stylesheetGET /app.js 200 script <- read this in SourcesGET /api/v1/user?id=1 200 xhr <- try id=2, id=0, id=adminGET /api/v1/flag 403 xhr <- interesting, why forbidden?
In the Sources tab, click the { } pretty-print button to un-minify a script before reading it. Search across all loaded scripts with Ctrl+Shift+F for strings like flag, password, admin, secret, and /api. For a deeper tour of the tooling, the official MDN guide to browser developer tools is the authoritative reference.
What can robots.txt and sitemap.xml leak?
robots.txt is a plaintext file at the web root that tells search-engine crawlers which paths to skip. The irony is delicious: to tell a crawler to stay out of /admin, the site has to publicly name /admin. For a CTF, the Disallow list is a free map of the paths the author considered sensitive enough to hide:
$ curl -s http://example.com:8080/robots.txtUser-agent: *Disallow: /admin/Disallow: /backup/Disallow: /secret_flag_directory/ <- well, thanksDisallow: /api/internal/
Every path in that Disallow list is somewhere you should immediately visit. sitemap.xml is the companion file: it lists URLs the site wants indexed, and sometimes it includes old or forgotten pages that are not linked from the homepage. Fetch both on every challenge:
curl -s http://example.com:8080/robots.txtcurl -s http://example.com:8080/sitemap.xmlcurl -s http://example.com:8080/security.txtcurl -s http://example.com:8080/.well-known/
This is exactly the lesson of picoCTF 2019 Where are the robots, whose entire solution is reading robots.txt and following the disallowed path it reveals. It is also a stop on the longer scavenger hunt in picoCTF 2021 Scavenger Hunt, where pieces of the flag are scattered across the page source, the CSS, the robots.txt, and other recon surfaces, one clue pointing to the next.
How do exposed .git directories and backup files give the game away?
Developers deploy by copying a folder to the server, and that folder often still contains the things that were never meant to ship. The two richest finds are exposed version control and stray backup files.
A .git directory left in the web root is a full history of the source code. If http://target/.git/HEAD returns content instead of a 404, you can reconstruct the entire repository, including files that were deleted in later commits but still live in history:
# Is .git exposed? If this returns 'ref: refs/heads/...' you are in businesscurl -s http://example.com:8080/.git/HEAD# Dump the whole repo from an exposed .git directorypip install git-dumpergit-dumper http://example.com:8080/.git/ ./loot# Then read the history for anything deleted or secretcd loot && git log --all --oneline && git show <commit>
Backup and editor-leftover files are the other staple. When someone edits index.php with certain editors, a copy gets left behind. Configuration and archive files get forgotten in the web root. Probe the predictable names directly:
curl -s -o /dev/null -w '%{http_code} %{url_effective}\n' \http://example.com:8080/index.php.bak \http://example.com:8080/index.php~ \http://example.com:8080/.index.php.swp \http://example.com:8080/backup.zip \http://example.com:8080/config.php.old \http://example.com:8080/.env
200 status code means the file exists and you can fetch it. A 403 means it exists but access is forbidden, which is itself a strong signal that something is there worth fighting for. A 404 means keep looking. Always read the status code, not just the page body.How do I brute-force hidden directories with ffuf and gobuster?
When manual probing runs dry, automate it. Directory brute-forcing throws a wordlist of common file and folder names at the server and reports which ones exist. Two tools dominate: ffuf and gobuster. They do the same job; pick whichever you have installed.
You need a wordlist. The community standard is SecLists; a great starting list is directory-list-2.3-medium.txt or, for a quick first pass, the smaller common.txt. Here is gobuster:
# gobuster: dir mode, one wordlist, show found pathsgobuster dir \-u http://example.com:8080 \-w /usr/share/seclists/Discovery/Web-Content/common.txt \-t 40# Add common extensions so it finds files, not just foldersgobuster dir -u http://example.com:8080 \-w common.txt -x php,html,txt,bak,zip
And the same idea with ffuf. The FUZZ keyword marks where each wordlist entry gets substituted:
# ffuf: FUZZ is replaced by each line of the wordlistffuf -u http://example.com:8080/FUZZ \-w /usr/share/seclists/Discovery/Web-Content/common.txt# Filter out the noise: hide 404s, or match only 200 and 403ffuf -u http://example.com:8080/FUZZ -w common.txt -mc 200,301,302,403# Hide responses of a boring size (e.g. a default 'not found' page)ffuf -u http://example.com:8080/FUZZ -w common.txt -fs 1234
The skill in brute-forcing is filtering. Servers often return 200 for everything (a catch-all page), so a naive run reports thousands of false hits. Use -mc (match status code) and -fs (filter by response size) in ffuf, or --status-codes and --exclude-length in gobuster, to cut the wall of noise down to the handful of paths that are genuinely different.
The reward for finding a hidden path is the next half of the challenge, which is often access control. picoCTF 2022 Forbidden Paths is a clean follow-on: once you know a file exists, the puzzle becomes reaching it through a path that the server tried to restrict.
When should I stop using the browser and open Burp Suite?
Everything above runs in a browser, curl, and two command-line tools. That covers most beginner web challenges. You hit the ceiling of those tools when the challenge needs you to intercept and modify requests mid-flight, replay a request dozens of times with small changes, or work through a multi-step flow where each request depends on the last.
That is when you reach for an intercepting proxy. The standard is Burp Suite (the free Community Edition is plenty for CTF). Burp sits between your browser and the server, so you can pause any request, edit headers or parameters the browser would never let you touch, and send it on. Reach for it when you need to:
- Change a request method, header, or body that the page hardcodes (for example, flip a
POSTfield the form will not let you edit). - Replay one request many times with tweaks, using Burp Repeater, to test how the server responds to each variation.
- Automate a parameter sweep (Burp Intruder) such as trying every user ID or every value of a token.
- Inspect or strip headers that the browser adds automatically and the challenge cares about.
The handoff is natural: recon with the browser and command line tells you where the interesting endpoints are, and Burp lets you manipulate the requests to them. The dedicated Burp Suite for CTF post walks through setup, the proxy, Repeater, and Intruder from zero.
Which picoCTF challenges teach pure recon?
picoCTF is the best place to drill these reflexes because its web track is dense with recon-only challenges. Work through these roughly in order and each step of the loop above gets its own dedicated practice:
- picoCTF 2022 Inspect HTML and picoCTF 2022 Search Source are pure view-source and grep-the-scripts practice.
- picoCTF 2019 Where are the robots is the
robots.txtchallenge by name. - picoCTF 2019 dont-use-client-side teaches that anything running in your browser is yours to read.
- picoCTF 2021 Scavenger Hunt chains every recon surface together, one clue pointing to the next.
- picoCTF 2021 Cookies introduces client-side state as a recon and tampering surface.
- picoCTF 2022 Forbidden Paths is where recon (finding a path) hands off to exploitation (reaching it).
Once recon feels automatic, the natural next steps are the injection families. The SQL Injection for CTF post and the Command Injection for CTF post both start from an endpoint you found during recon and turn it into a flag.
Quick reference: the recon checklist
Run this on every web challenge, in order
Ctrl+Uview source. Read every comment, hidden field, and<script>tag.F12DevTools. Network tab for background API calls, Sources tab to read every.js, Elements tab for the live DOM.- Fetch
/robots.txt,/sitemap.xml, and/.well-known/. Visit everyDisallowpath. - Probe exposed files:
/.git/HEAD,/.env,/backup.zip,/index.php.bak. Watch the status code. - Application tab: read and tamper with cookies, localStorage, sessionStorage.
- Stuck? Brute-force with
ffuf -u http://target/FUZZ -w common.txt -mc 200,301,403orgobuster dir -u http://target -w common.txt -x php,bak,zip. - Need to forge or replay requests? Open Burp Suite and switch to the proxy workflow.
curl recon one-liners
# Raw source with headerscurl -s -i http://target/# Just the status code of a path (great for probing files)curl -s -o /dev/null -w '%{http_code}\n' http://target/.git/HEAD# Follow redirects and keep cookies across requestscurl -s -L -c jar.txt -b jar.txt http://target/# Send a tampered cookiecurl -s http://target/ --cookie 'admin=1'
The whole discipline reduces to one habit: never trust the rendered page to be the whole story. View the source, watch the network, read the cookies, and knock on every door the server forgot to lock. Do that first, every time, and most beginner web challenges solve themselves.