where are the robots

Published: April 2, 2026

Description

Can you find the robots? A website URL is provided -- figure out where they are hiding.

Web

Open the provided challenge website URL in your browser.

Solution

  1. Step 1Check robots.txt
    robots.txt is a public file at the root of every website that instructs search engine crawlers which paths not to index. It ironically reveals 'secret' paths to any human visitor. Navigate to /robots.txt and read the Disallow entries.
    curl http://<challenge-server>/robots.txt
    Learn more

    robots.txt is a plain text file placed at the root of a web server (/robots.txt) that implements the Robots Exclusion Standard. It tells well-behaved web crawlers (like Googlebot, Bingbot) which URLs they should not visit or index. The format specifies User-agent (which crawler the rule applies to) and Disallow (paths that crawler should skip).

    The critical security flaw: robots.txt is completely public. Any human -- or malicious bot -- can navigate directly to /robots.txt and read every disallowed path. This means that listing a path as Disallow actively advertises its existence to anyone curious enough to look. Common mistakes developers make include listing paths like:

    • /admin -- admin panels
    • /backup -- backup files
    • /api/v1/internal -- internal API endpoints
    • /staging -- staging environment
    • /.git -- version control directories

    robots.txt is always one of the first places checked during a web application penetration test or bug bounty reconnaissance phase. Security tools like gobuster, dirbuster, and feroxbuster automatically fetch robots.txt as part of their directory enumeration process. The proper way to protect sensitive paths is through authentication and authorization -- not by relying on crawlers to ignore them.

  2. Step 2Visit the disallowed path
    robots.txt will list a disallowed path like /477ce.html. Navigate directly to that URL in your browser -- the flag is displayed on that page.
    Learn more

    Once a disallowed path is found in robots.txt, navigating to it is trivial -- just append the path to the base URL. This step highlights the core lesson: obscurity is not security. The path is "hidden" only from search engine indexes, not from direct access. Anyone who knows the URL can visit it freely.

    This is sometimes called security through obscurity -- the misguided belief that keeping implementation details secret provides security. Bruce Schneier and other security experts have long argued that security systems must be secure even if everything about the system except the key is public knowledge (Kerckhoffs's principle). Applying this to web security: every URL on your server should be assumed publicly known, and authorization must be enforced server-side for every request.

    In bug bounty programs, robots.txt enumeration is a standard first step. Hunters have found admin panels, debug endpoints, internal APIs, and sensitive files this way on major websites. Google itself publishes its full robots.txt disallow list for google.com, which reveals interesting internal path structures even though all those paths require authentication to access.

Flag

picoCTF{...}

robots.txt is meant to guide search engine crawlers but is fully public -- listing a path as Disallow reveals it to any human who thinks to look.

More Web Exploitation