Description
The Network Operations Center (NOC) of your local institution picked up a suspicious file, they're getting conflicting information on what type of file it is. They've brought you in as an external expert to examine the file. Can you extract all the information from this strange file? Download the suspicious file here.
Setup
Download flag2of2-final.pdf locally.
Install pdftotext (poppler-utils) and an OCR tool such as gocr.
wget https://artifacts.picoctf.net/c_titan/9/flag2of2-final.pdf && \
sudo apt install poppler-utils gocrSolution
Walk me through it- Step 1Confirm the polyglotA quick hex dump shows the file is both PNG and PDF: PNG magic at byte 0,
%PDF-further in. See the hex dumps for CTF guide for more.bashxxd flag2of2-final.pdf | headYou should see89 50 4E 47(PNG signature) at offset 0 and a%PDF-string later in the dump. That tells you both parsers will accept the file. - Step 2Extract the PDF halfpdftotext seeks the %PDF- marker and ignores the PNG bytes that come before it, so the embedded PDF reads cleanly. The output holds the second half of the flag.bash
pdftotext flag2of2-final.pdf && cat flag2of2-final.txtLearn more
A polyglot file is a single file that is simultaneously valid in two or more different formats. Because most file parsers only read as much of a file as their format requires, you can construct files where the PDF parser sees a valid PDF and the PNG parser sees a valid PNG, each extracting different content from the same byte stream.
pdftotext(part of poppler-utils) converts a PDF's text content to a plain text file. PDF parsers locate the%PDF-header anywhere in the file rather than requiring it at byte 0, then walk the cross-reference table (xref) backwards from%%EOF. That is why the embedded PDF extracts cleanly even though PNG bytes come first.- PDF files contain
%PDF-near the start and end with%%EOF; the parser scans for these markers regardless of what precedes them. - PNG files begin with an 8-byte magic signature
89 50 4E 47 0D 0A 1A 0A; the PNG parser reads from byte 0. - The polyglot is crafted so neither parser is confused by the other format's data.
- PDF files contain
- Step 3Treat it as a PNGThe magic bytes also match a PNG. Rename the file with .png and OCR the image to recover the opening characters picoCTF{... . OCR can introduce spurious whitespace and confuse 0/O or l/1, so verify the result against the picoCTF{...} format carefully.
Learn more
Because the file is a polyglot, the same byte stream can satisfy the magic-byte checks for more than one format. Renaming the file and opening it with image tooling reveals content that the PDF viewer path does not show directly.
OCR is enough here because the embedded image exposes a visible fragment of the flag rather than hiding it with steganography or encryption.
- Step 4Combine halvesConcatenate the PNG-derived prefix with the PDF-derived suffix to get the full flag picoCTF{f1u3n7_1n_pn9_&_pdf_7f9...}.
Learn more
Splitting a secret across two extraction methods is a clever CTF design that tests whether solvers understand that a single file can contain multiple data layers. Neither the image nor the text rendering alone gives the complete flag; you must use both parsers and combine their outputs.
This mirrors real-world scenarios where malware or hidden data exploits format ambiguity. Security researchers have demonstrated polyglots combining PDF+ZIP, PNG+ZIP, JPEG+HTML, and many other pairings. Some web upload validators can be bypassed this way: a file that passes as an image but also contains active HTML or script content.
The key insight for all polyglot challenges is to ask: what tool treats this file differently than my first assumption? Trying
binwalk,file,strings, and format-specific extractors on every suspicious file is standard forensics methodology.
Flag
picoCTF{f1u3n7_1n_pn9_&_pdf_7f9...}
Half PNG + half PDF = full flag.