Description
Can you find the robots? A website URL is provided -- figure out where they are hiding.
Setup
Open the provided challenge website URL in your browser.
Solution
- Step 1Check robots.txtrobots.txt is a public file at the root of every website that instructs search engine crawlers which paths not to index. It ironically reveals 'secret' paths to any human visitor. Navigate to /robots.txt and read the Disallow entries.curl http://<challenge-server>/robots.txt
Learn more
robots.txt is a plain text file placed at the root of a web server (
/robots.txt) that implements the Robots Exclusion Standard. It tells well-behaved web crawlers (like Googlebot, Bingbot) which URLs they should not visit or index. The format specifiesUser-agent(which crawler the rule applies to) andDisallow(paths that crawler should skip).The critical security flaw: robots.txt is completely public. Any human -- or malicious bot -- can navigate directly to
/robots.txtand read every disallowed path. This means that listing a path asDisallowactively advertises its existence to anyone curious enough to look. Common mistakes developers make include listing paths like:/admin-- admin panels/backup-- backup files/api/v1/internal-- internal API endpoints/staging-- staging environment/.git-- version control directories
robots.txt is always one of the first places checked during a web application penetration test or bug bounty reconnaissance phase. Security tools like
gobuster,dirbuster, andferoxbusterautomatically fetch robots.txt as part of their directory enumeration process. The proper way to protect sensitive paths is through authentication and authorization -- not by relying on crawlers to ignore them. - Step 2Visit the disallowed pathrobots.txt will list a disallowed path like /477ce.html. Navigate directly to that URL in your browser -- the flag is displayed on that page.
Learn more
Once a disallowed path is found in robots.txt, navigating to it is trivial -- just append the path to the base URL. This step highlights the core lesson: obscurity is not security. The path is "hidden" only from search engine indexes, not from direct access. Anyone who knows the URL can visit it freely.
This is sometimes called security through obscurity -- the misguided belief that keeping implementation details secret provides security. Bruce Schneier and other security experts have long argued that security systems must be secure even if everything about the system except the key is public knowledge (Kerckhoffs's principle). Applying this to web security: every URL on your server should be assumed publicly known, and authorization must be enforced server-side for every request.
In bug bounty programs, robots.txt enumeration is a standard first step. Hunters have found admin panels, debug endpoints, internal APIs, and sensitive files this way on major websites. Google itself publishes its full
robots.txtdisallow list forgoogle.com, which reveals interesting internal path structures even though all those paths require authentication to access.
Flag
picoCTF{...}
robots.txt is meant to guide search engine crawlers but is fully public -- listing a path as Disallow reveals it to any human who thinks to look.