flag_printer picoCTF 2024 Solution

Published: April 3, 2024

Description

Help! I encoded my flag printer using a really cool encoding scheme but now I can't access it! Can you help me retrieve the flag?

Download the encoded points file the challenge ships. Each line is a pair x y of integers in GF(p) where p = 7514777789.

Install SageMath. The naive Lagrange interpolation is O(n^3) and the dataset has about 1.77 million points. You will not finish without the divide-and-conquer trick.

bash
wget https://artifacts.picoctf.net/c/.../encoded.txt
bash
wc -l encoded.txt              # ~1769611 lines
bash
head -3 encoded.txt            # confirm 'x y' integer pairs
bash
sudo apt install sagemath      # or use the conda-forge sage package
  1. Step 1Recognise the shape: polynomial interpolation
    The challenge ships a transcript of (x, f(x)) pairs over GF(7514777789). A unique polynomial of degree n-1 passes through n distinct points; finding its coefficients is the entire challenge. Once you have them, the bytes of those coefficients spell out the flag.
    Learn more

    Why a polynomial. Plain text is a sequence of bytes; a polynomial is a sequence of coefficients. The challenge author treated the flag-printer's memory image as polynomial coefficients a_0, a_1, ..., a_{n-1} over GF(p), then sampled f(x_i) = sum a_k * x_i^k mod p at n distinct x_i values and shipped the table. Recovering a_0..a_{n-1} is exactly polynomial interpolation.

    Why GF(p). Working in a finite field keeps the coefficients bounded (everything stays under p, which fits in 33 bits). Over the rationals, an interpolating polynomial of degree 1.77 million would have coefficients with billions of digits each, blowing up storage and arithmetic. p = 7514777789 is just over 2^32, large enough to encode any byte sequence as an integer modulo p with no ambiguity, small enough for fast modular arithmetic.

    Why the result is a BMP. When you reconstruct a_0..a_{n-2} and dump them as raw bytes (lowest 8 bits of each, or the first byte of the little-endian encoding), the byte stream starts with 42 4D(BMP magic) and renders as an image of the flag. The challenge name "flag printer" is the giveaway: the polynomial is the flag-printer's firmware, and decoding it prints the flag visually.

  2. Step 2Why naive Lagrange is hopeless
    Use the divide-and-conquer interpolation algorithm (Modern Computer Algebra ch. 10): split the points in half, compute vanishing polynomials Z1, Z2 for each half, recurse, and combine as f1*Z2 + f2*Z1. Total complexity O(n log^2 n).
    Learn more

    The recursion. Given (X, Y) with |X| = n, partition into halves (X1, Y1) and (X2, Y2). Define vanishing polynomials:

    Z1(x) = product over xk in X1 of (x - xk)
    Z2(x) = product over xk in X2 of (x - xk)

    Z1 is zero at every x in X1 and nonzero on X2; symmetrically for Z2. Then the interpolating polynomial is:

    f(x) = f1(x) * Z2(x) + f2(x) * Z1(x)
    
    where f1 interpolates (X1, Y1 / Z2(X1))   <- adjusted Y values
          f2 interpolates (X2, Y2 / Z1(X2))

    On X1 the second term vanishes (because Z1 is zero there) and the first term contributes f1(x) * Z2(x). Pre-dividing Y1 by Z2(X1) exactly cancels the Z2 factor, giving f(x_i) = Y_i. Symmetric on X2.

    Why it ends up O(n log^2 n). Building each Z via the standard subproduct tree is O(n log^2 n). Evaluating one polynomial at many points (multipoint evaluation) is also O(n log^2 n). Recursion depth is log n. Total: O(n log^2 n), roughly n * 21^2 = ~440 * n field ops at this scale, very tractable.

    Library choices. SageMath's R.lagrange_polynomial uses the naive algorithm and is unusable here. FLINT's fmpz_mod_poly_interpolate_fmpz_vec ships the divide-and-conquer version and finishes in around 25 seconds; pure SageMath with a hand-rolled recursion takes ~1 minute. Plain Python or NumPy will not finish.

  3. Step 3Implement the recursion, dump coefficients as BMP
    Run the Sage solver below, pipe coefficients to a BMP file, and open the file in any image viewer. The flag is rendered as image content.
    bash
    sage solve.sage   # ~1 minute on a modern laptop
    bash
    xdg-open output.bmp
    bash
    # or convert to PNG for sharing:
    bash
    convert output.bmp output.png && xdg-open output.png
    Learn more

    solve.sage (saved alongside encoded.txt):

    import multiprocessing as mp
    
    p = 7514777789
    X, Y = [], []
    for line in open('encoded.txt').read().strip().split('\n'):
        x, y = line.split(' ')
        X.append(int(x))
        Y.append(int(y))
    
    K = GF(p)
    R = PolynomialRing(K, 'x')
    
    def compZ(X):
        x = R.gen()
        Z = K(1)
        for xk in X:
            Z *= (x - xk)
        return Z
    
    def comp(X, Y, Xother):
        Z = compZ(Xother)
        Y = [y / Z(x) for x, y in zip(X, Y)]
        return Y, Z
    
    def solve(X, Y):
        n = len(Y)
        if n <= 10:
            return R.lagrange_polynomial(list(zip(X, Y)))
        nhalf = n // 2
        X1, Y1 = X[:nhalf], Y[:nhalf]
        X2, Y2 = X[nhalf:], Y[nhalf:]
        if nhalf > 10000:
            with mp.Pool(2) as pool:
                r1 = pool.apply_async(comp, (X1, Y1, X2))
                r2 = pool.apply_async(comp, (X2, Y2, X1))
                Y1, Z2 = r1.get()
                Y2, Z1 = r2.get()
        else:
            Y1, Z2 = comp(X1, Y1, X2)
            Y2, Z1 = comp(X2, Y2, X1)
        return solve(X1, Y1) * Z2 + solve(X2, Y2) * Z1
    
    f = solve(X, Y)
    coeffs = f.coefficients(sparse=False)
    # Drop the highest-degree coefficient (the leading 1 from the vanishing-polynomial product
    # is the BMP terminator marker the gen script appends), then take the low byte of each.
    open('output.bmp', 'wb').write(bytearray(coeffs[:-1]))
    print(f'wrote {len(coeffs) - 1} bytes')

    Why the bottom byte. The encoder maps each byte of the BMP file to a coefficient by zero-extending it to fit in GF(p). So the recovered coefficients live in [0, 256) exactly; bytearray(coeffs) just packs the integer-valued coefficients back to bytes. If your recovered coeffs contain values outside [0, 256), your interpolation is wrong (off-by-one in the recursion is the usual culprit).

    Sanity-test on a slice first. Before running the full n = 1.77M interpolation, run the recursion on the first 1000 pairs and check f(x_i) == y_i for those points. If the recursion is broken, you find out in 0.1 seconds instead of 60.

    See the Python for CTF post for the small-scale Lagrange building blocks and the FLINT manual for the C-level interface if Sage is too slow.

Alternate Solution

If you have a FLINT-bound language available (Python with python-flint, or Julia's Nemo.jl), call the library's native multipoint interpolation routine directly:

from flint import fmpz_mod_poly_ctx, fmpz_mod_ctx
ctx = fmpz_mod_ctx(7514777789)
polyctx = fmpz_mod_poly_ctx(ctx)
f = polyctx.interpolate(X, Y)   # ~25 seconds at n = 1.77M

That single call replaces the entire recursive Sage script and runs roughly 2.5x faster. The result is the same coefficient list; dump it as bytes the same way.

Flag

picoCTF{...}

The flag is rendered as image content inside output.bmp once the polynomial is interpolated; open the BMP in any viewer to read it. The exact characters depend on the encoded.txt your instance ships, but the methodology (divide-and-conquer interpolation with vanishing polynomials, then dump coefficients as bytes) is what unlocks any instance of this challenge.

Want more picoCTF 2024 writeups?

Useful tools for Cryptography

Related reading

What to try next