Redaction gone wrong

Published: July 20, 2023

Description

Sensitive text in a PDF was only visually redacted. Convert the PDF to text (or copy/paste) to reveal the hidden flag.

Install `pdftotext` (from poppler-utils or xpdf).

Run `pdftotext Financial_Report_for_ABC_Labs.pdf` to create a .txt version.

Search the text output for `picoCTF` (or simply copy the blacked-out text directly inside a PDF viewer).

pdftotext Financial_Report_for_ABC_Labs.pdf
grep -oE "picoCTF\{.*\}" Financial_Report_for_ABC_Labs.txt

Solution

  1. Step 1Convert the PDF
    Visual redactions don't remove the underlying text. `pdftotext` extracts everything, including the supposedly hidden sections.
    Learn more

    PDF (Portable Document Format)stores content as a layered document structure. A black rectangle drawn on top of text is a separate visual element - the original text data remains fully intact in the file's content stream. This is fundamentally different from actually deleting or overwriting the text.

    The pdftotext tool (part of the poppler-utils package) strips all visual formatting and extracts the raw text content, bypassing any overlaid shapes. Even simpler: many PDF viewers let you select and copy text that appears visually redacted - the text layer is still there and selectable.

    This is not a theoretical vulnerability - it has caused real-world data breaches. High-profile examples include leaked NSA documents and court filings where sensitive names were "blacked out" using this flawed method. The correct approach is to use purpose-built redaction tools that remove the text from the document, not merely cover it.

  2. Step 2Search for the flag
    Grep the generated text file for picoCTF to immediately locate the flag string.
    Learn more

    grep -oE "picoCTF\{.*\}" uses an extended regular expression to match the flag pattern. The -o flag prints only the matching portion (not the whole line), and -E enables extended regex syntax like .*for "any characters."

    In forensics and incident response, pattern-matching against extracted text is a core workflow. Tools like bulk_extractor automate this at scale, scanning disk images or raw files for email addresses, URLs, credit card numbers, and other structured data patterns - even across file boundaries in unallocated space.

    Proper document redaction for sensitive material requires tools certified for the purpose, such as Adobe Acrobat's built-in redaction feature (which actually removes content), or dedicated solutions used in legal and government contexts that produce a new, sanitized document with the underlying data permanently removed.

Flag

picoCTF{C4n_Y0u_S33_m3_f...}

Real-world lesson: always remove sensitive text entirely before distributing redacted documents.

Want more picoCTF 2022 writeups?

Useful tools for Forensics

Related reading

What to try next