Redaction gone wrong picoCTF 2022 Solution

Published: July 20, 2023

Description

Sensitive text in a PDF was only visually redacted. Convert the PDF to text (or copy/paste) to reveal the hidden flag.

Install pdftotext. On Debian/Ubuntu: sudo apt install poppler-utils. On macOS: brew install poppler.

Run pdftotext Financial_Report_for_ABC_Labs.pdf to create a .txt version, or open the PDF in a viewer that respects text layers (Preview.app on macOS, Foxit Reader, evince) and try to select across the black boxes.

Grep the extracted text for picoCTF.

bash
sudo apt install poppler-utils  # or brew install poppler on macOS
bash
pdftotext Financial_Report_for_ABC_Labs.pdf
bash
grep -oE "picoCTF\{.*\}" Financial_Report_for_ABC_Labs.txt
  1. Step 1Convert the PDF
    Visual redactions don't remove the underlying text. pdftotext extracts everything, including the supposedly hidden sections.
    Learn more

    PDF (Portable Document Format) stores content as a layered document structure. A black rectangle drawn on top of text is a separate visual element - the original text data remains fully intact in the file's content stream. This is fundamentally different from actually deleting or overwriting the text.

    The pdftotext tool (part of the poppler-utils package; install via apt install poppler-utils on Linux or brew install poppler on macOS) strips all visual formatting and extracts the raw text content, bypassing any overlaid shapes. Even simpler: PDF viewers like Preview.app, Foxit Reader, and evince let you select and copy text that appears visually redacted, because the redaction is just a black rectangle drawn on top of the still-intact text layer rather than the text being deleted from the document.

    This is not a theoretical vulnerability - it has caused real-world data breaches. High-profile examples include leaked NSA documents and court filings where sensitive names were "blacked out" using this flawed method. The correct approach is to use purpose-built redaction tools that remove the text from the document, not merely cover it.

  2. Step 2Search for the flag
    Grep the generated text file for picoCTF to immediately locate the flag string.
    Learn more

    grep -oE "picoCTF\{.*\}" uses an extended regular expression to match the flag pattern. The -o flag prints only the matching portion (not the whole line), and -E enables extended regex syntax like .* for "any characters."

    In forensics and incident response, pattern-matching against extracted text is a core workflow. Tools like bulk_extractor automate this at scale, scanning disk images or raw files for email addresses, URLs, credit card numbers, and other structured data patterns - even across file boundaries in unallocated space.

    Proper document redaction for sensitive material requires tools certified for the purpose, such as Adobe Acrobat's built-in redaction feature (which actually removes content), or dedicated solutions used in legal and government contexts that produce a new, sanitized document with the underlying data permanently removed.

Flag

picoCTF{C4n_Y0u_S33_m3_f...}

Real-world lesson: always remove sensitive text entirely before distributing redacted documents.

Want more picoCTF 2022 writeups?

Useful tools for Forensics

Related reading

What to try next