Very Smooth picoCTF 2022 Solution

Published: July 20, 2023

Description

An RSA challenge where the prime p was generated with a 'smooth' p-1 value - all prime factors of p-1 are small. This makes N vulnerable to Pollard's p-1 factoring algorithm.

Pollard's p-1 exploits the structure: if p-1 is B-smooth (all prime factors <= B), then for any base a, a^(B!) ≡ 1 (mod p), and gcd(a^(B!) - 1, N) = p.

Download the challenge files to get N and e.

Implement Pollard's p-1 in Python to factor N.

Use p and q to compute the private key d and decrypt the ciphertext.

bash
wget https://artifacts.picoctf.net/c/195/gen.py
bash
cat gen.py
bash
pip install pycryptodome

Solution

Want to try it yourself first?

The guided walkthrough reveals hints one step at a time.

Walk me through it
For more RSA shortcut attacks (small e, common modulus, Wiener, Fermat, plaintext recovery), see the RSA Attacks for CTF guide.
  1. Step 1
    Understand the smooth p-1 weakness
    Observation
    I noticed the challenge is named 'Very Smooth' and gen.py constructs primes where p-1 has only small factors, which directly pointed to Pollard's p-1 as the applicable factoring algorithm since it exploits exactly that smoothness property via Fermat's little theorem.
    If all prime factors of p-1 are small (say <= 10^6), then B! is divisible by p-1. By Fermat's little theorem, a^(p-1) ≡ 1 (mod p), so a^(B!) ≡ 1 (mod p), meaning p divides a^(B!)-1.
    Learn more

    Fermat's Little Theorem: For prime p and any a not divisible by p: a^(p-1) ≡ 1 (mod p).

    A number M is B-smooth if every prime factor of M is at most B. If p-1 is B-smooth, then p-1 divides B! (because B! contains every prime up to B raised to a high power). Therefore a^(B!) ≡ a^(k*(p-1)) ≡ (a^(p-1))^k ≡ 1 (mod p), so p | a^(B!) - 1. As long as q-1 is NOT B-smooth, q does not divide a^(B!) - 1, and gcd(a^(B!) - 1, N) = p exactly.

    Toy walkthrough with N = pq, p = 331, q = 271:
      p - 1 = 330 = 2 * 3 * 5 * 11    (all small -> 11-smooth)
      q - 1 = 270 = 2 * 3^3 * 5       (also smooth, but suppose only p-1 were)
    
      Use bound B = 11. Then 11! is a multiple of 330 = p-1.
      Compute a = 2^(11!) mod N incrementally:
          a = 2
          a = a^2 mod N   (now 2^2)
          a = a^3 mod N   (now 2^6)
          a = a^4 mod N   (now 2^24)
          ...
          a = a^11 mod N  (now 2^(11!) mod N)
    
      At each step, check g = gcd(a - 1, N).
      As soon as the running exponent becomes a multiple of p-1 = 330,
      we have a ≡ 1 (mod p) but a != 1 (mod q),
      so gcd(a - 1, N) returns p (a non-trivial factor of N).

    The crucial efficiency trick: a^(B!) mod N is built incrementally - one short modular exponentiation per j - so we never construct the astronomical number B! itself. Each step is a = pow(a, j, N), which uses square-and-multiply in O(log j * log^2 N) bit operations.

  2. Step 2
    Implement Pollard's p-1
    Observation
    I noticed that once the smoothness weakness was confirmed, the standard Pollard's p-1 algorithm needed to be implemented to accumulate a = 2^(j!) mod N incrementally and check gcd(a-1, N) at each stage, which would yield p as a non-trivial factor of N.
    Start with a=2 and repeatedly compute a = a^j mod N for j=2..B. After each step, check if gcd(a-1, N) is a non-trivial factor.
    python
    python3 -c "
    from math import gcd
    
    # Replace with actual values from the challenge files
    n = 0xYOUR_N_VALUE
    e = 65537
    c = 0xYOUR_CIPHERTEXT
    
    def pollard_p_minus_1(n, B):
        a = 2
        for j in range(2, B + 1):
            a = pow(a, j, n)            # accumulate: after iter j, a = 2^(j!) mod n
            if j % 10000 == 0 or j == B:
                g = gcd(a - 1, n)
                if 1 < g < n:
                    return g
        return None
    
    for B in (10**4, 10**5, 10**6, 10**7):
        print(f'Trying B = {B}')
        p = pollard_p_minus_1(n, B)
        if p is not None:
            q = n // p
            print(f'Found: p = {p}')
            print(f'       q = {q}')
            phi = (p-1)*(q-1)
            d = pow(e, -1, phi)
            flag_int = pow(c, d, n)
            print('Flag:', flag_int.to_bytes((flag_int.bit_length()+7)//8, 'big').decode())
            break
    else:
        print('no smooth factor found up to B = 10^7; switch attacks (Fermat, Wiener, ECM)')
    "
    What didn't work first

    Tried: Use sympy.factorint or factor() from a CAS tool to factor N directly.

    General-purpose integer factorization in sympy falls back to trial division and Pollard's rho, neither of which exploits the smooth p-1 structure. For a 1024-bit N, these methods will run indefinitely. Pollard's p-1 is the targeted algorithm here because it is fast specifically when p-1 is B-smooth - general factoring tools do not apply that shortcut automatically.

    Tried: Set B = 10^7 from the start without checking intermediate gcds.

    Waiting until j = B to check gcd(a - 1, N) for the first time can overshoot: once both p-1 and q-1 divide the accumulated exponent, gcd(a - 1, N) collapses to N and no non-trivial factor is returned. The fix is to check the gcd frequently (e.g., every 10,000 steps) so the loop can stop as soon as one prime factor is isolated before the other one also divides in.

    Learn more

    Why the loop is one accumulator, not pow each time. The standard Pollard p-1 loop computes a = 2^(B!) mod n incrementally: after iteration j, the running value of a equals 2^(j!) mod n, because a = pow(a, j, n) raises the previous 2^((j-1)!) to the j-th power. This matches the toy walkthrough above. Recomputing pow(a, j!, n) from scratch each step would multiply the work by another log(j) factor for nothing.

    Heuristic for choosing B. Start at B = 10^4. If the loop completes without finding a non-trivial factor, double an order of magnitude: 10^5, then 10^6, then 10^7. By B = 10^7 the prime is unlikely to be smooth at all, and you should switch attacks (Fermat for close primes, Wiener for small private exponent, ECM for medium-smooth factors). The script above iterates this ladder for you.

    Non-trivial-factor check. The 1 < g < n guard is essential: gcd(a - 1, n) = 1 means we have not yet hit a multiple of p-1, and gcd(a - 1, n) = n means both p-1 and q-1 are smooth and we wrapped past both factors. The script returns None in those cases instead of failing silently.

    The algorithm runs in O(B * log(B) * log^2(N)) time, which is very fast for small B. Checking gcd every 10,000 steps rather than every step is an optimization: the gcd check is cheap, but checking every iteration still adds visible overhead at large B.

    The pow(a, j, n) call uses Python's built-in three-argument modular exponentiation, which uses fast square-and-multiply and stays within modular bounds. This is far faster than computing the full exponentiation then taking mod.

  3. Step 3
    Decrypt the ciphertext with the recovered factors
    Observation
    I noticed that once Pollard's p-1 returned p, the remaining RSA decryption followed the standard formula: compute q = N/p, derive phi(N) = (p-1)(q-1), find the modular inverse d, and recover the plaintext via c^d mod N.
    With p and q known, compute phi(N) = (p-1)(q-1), then d = e^-1 mod phi(N), then flag = c^d mod N.
    python
    python3 -c "
    from Crypto.Util.number import long_to_bytes
    
    p = 0xYOUR_P
    q = 0xYOUR_Q
    e = 65537
    c = 0xYOUR_CIPHERTEXT
    n = p * q
    
    phi = (p-1) * (q-1)
    d = pow(e, -1, phi)
    m = pow(c, d, n)
    print(long_to_bytes(m).decode())
    "
    What didn't work first

    Tried: Compute phi(N) as N - 1 instead of (p-1)(q-1) because N looks prime.

    N is a semiprime (product of two large primes), not a prime itself. phi(N) = N - 1 only holds for prime N. Using N - 1 as the modulus for the modular inverse yields a completely wrong value of d, and pow(c, d, n) produces garbage. The correct Euler totient for N = p*q is (p-1)*(q-1), which requires knowing p and q - exactly what Pollard's p-1 recovered.

    Tried: Use long_to_bytes(m) without .decode() and pipe to a hex dump to look for the flag.

    The flag bytes are ASCII-printable, so .decode() is the right call and will succeed if decryption is correct. If decryption produced wrong output (wrong d or wrong N), long_to_bytes will return non-ASCII bytes and .decode() will raise UnicodeDecodeError - that error is a signal to recheck the factorization, not to remove the decode call.

    Learn more

    Standard RSA decryption: m = c^d mod N where d = e^-1 mod phi(N) and phi(N) = (p-1)(q-1). Python 3.8+ supports pow(e, -1, phi) for modular inverse directly.

    To prevent Pollard's p-1 attack, RSA key generation must ensure that p-1 (and q-1) have at least one large prime factor - ideally close to the size of p itself. Such primes are called safe primes: p is safe if p = 2q + 1 where q is also prime. OpenSSL's genrsa command generates safe primes by default for this reason.

Interactive tools
  • RSA CalculatorDecrypt RSA ciphertexts, factor n from the sum of primes, or generate key parameters. Handles arbitrarily large BigInt values.

Flag

Reveal flag

picoCTF{...}

gen.py constructs primes p, q where p-1 and q-1 are smooth (each factor is a product of 16-17 bit primes plus 1). Sieve primes up to ~200000, build the prime-power exponent ladder, and Pollard's p-1 finds p in seconds. Standard RSA decryption then yields the flag.

Key takeaway

RSA security depends on the difficulty of factoring large semiprimes, but the structure of p-1 can create a shortcut. When p-1 is B-smooth (all prime factors are small), Pollard's p-1 algorithm factors N in polynomial time by exploiting Fermat's little theorem with a low smoothness bound. Secure RSA implementations use safe primes where p-1 has a large prime factor, preventing this class of algebraic factoring attack. The same smoothness condition also underlies the elliptic curve method (ECM) and the number field sieve, making prime generation quality a critical step in any RSA implementation.

Related reading

Want more picoCTF 2022 writeups?

Useful tools for Cryptography

What to try next