Secret of the Polyglot picoCTF 2024 Solution

Published: April 3, 2024

Description

The Network Operations Center (NOC) of your local institution picked up a suspicious file, they're getting conflicting information on what type of file it is. They've brought you in as an external expert to examine the file. Can you extract all the information from this strange file? Download the suspicious file here.

Polyglot analysis

Download flag2of2-final.pdf locally.

Install pdftotext (poppler-utils) and an OCR tool such as gocr.

bash
wget https://artifacts.picoctf.net/c_titan/9/flag2of2-final.pdf && \
sudo apt install poppler-utils gocr
The Introduction to Steganography Tools covers binwalk and other forensics tools useful for polyglot file analysis.
  1. Step 1Confirm the polyglot
    A quick hex dump shows the file is both PNG and PDF: PNG magic at byte 0, %PDF- further in. See the hex dumps for CTF guide for more.
    bash
    xxd flag2of2-final.pdf | head
    You should see 89 50 4E 47 (PNG signature) at offset 0 and a %PDF- string later in the dump. That tells you both parsers will accept the file.
  2. Step 2Extract the PDF half
    pdftotext seeks the %PDF- marker and ignores the PNG bytes that come before it, so the embedded PDF reads cleanly. The output holds the second half of the flag.
    bash
    pdftotext flag2of2-final.pdf && cat flag2of2-final.txt
    Learn more

    A polyglot file is a single file that is simultaneously valid in two or more different formats. Because most file parsers only read as much of a file as their format requires, you can construct files where the PDF parser sees a valid PDF and the PNG parser sees a valid PNG, each extracting different content from the same byte stream.

    pdftotext (part of poppler-utils) converts a PDF's text content to a plain text file. PDF parsers locate the %PDF- header anywhere in the file rather than requiring it at byte 0, then walk the cross-reference table (xref) backwards from %%EOF. That is why the embedded PDF extracts cleanly even though PNG bytes come first.

    • PDF files contain %PDF- near the start and end with %%EOF; the parser scans for these markers regardless of what precedes them.
    • PNG files begin with an 8-byte magic signature 89 50 4E 47 0D 0A 1A 0A; the PNG parser reads from byte 0.
    • The polyglot is crafted so neither parser is confused by the other format's data.
  3. Step 3Treat it as a PNG
    The magic bytes also match a PNG. Rename the file with .png and OCR the image to recover the opening characters picoCTF{... . OCR can introduce spurious whitespace and confuse 0/O or l/1, so verify the result against the picoCTF{...} format carefully.
    Learn more

    Because the file is a polyglot, the same byte stream can satisfy the magic-byte checks for more than one format. Renaming the file and opening it with image tooling reveals content that the PDF viewer path does not show directly.

    OCR is enough here because the embedded image exposes a visible fragment of the flag rather than hiding it with steganography or encryption.

  4. Step 4Combine halves
    Concatenate the PNG-derived prefix with the PDF-derived suffix to get the full flag picoCTF{f1u3n7_1n_pn9_&_pdf_7f9...}.
    Learn more

    Splitting a secret across two extraction methods is a clever CTF design that tests whether solvers understand that a single file can contain multiple data layers. Neither the image nor the text rendering alone gives the complete flag; you must use both parsers and combine their outputs.

    This mirrors real-world scenarios where malware or hidden data exploits format ambiguity. Security researchers have demonstrated polyglots combining PDF+ZIP, PNG+ZIP, JPEG+HTML, and many other pairings. Some web upload validators can be bypassed this way: a file that passes as an image but also contains active HTML or script content.

    The key insight for all polyglot challenges is to ask: what tool treats this file differently than my first assumption? Trying binwalk, file, strings, and format-specific extractors on every suspicious file is standard forensics methodology.

Flag

picoCTF{f1u3n7_1n_pn9_&_pdf_7f9...}

Half PNG + half PDF = full flag.

Want more picoCTF 2024 writeups?

Tools used in this challenge

Related reading

What to try next