Stream Ciphers in CTFs: LFSR, Vigenere, and Keystream Reuse

Listen for the keystream

Open the source for picoctf-2026-shift-registers and the first thing you see is a comment saying the author just learned about Linear Feedback Shift Registers (LFSRs) in school. The seed is eight bits. The flag is XORed with the LFSR output. That is the entire challenge.

The first time I saw an LFSR in a CTF I went straight to Wikipedia, read about Galois fields and tap polynomials, and lost forty minutes. Then I noticed the seed was 8 bits. There are 256 of those. The whole challenge collapses into a for seed in range(256) loop.

That keeps happening. A challenge looks like serious cryptography, the source mentions GF(2) or polynomial division or some 1990s cipher you have never heard of, and the actual attack is one of three things: the key is too short, the keystream got reused, or the cipher is linear and one matrix solve recovers everything. Stream ciphers in CTFs lose for these three reasons, almost always, across years of picoCTF.

A stream cipher tells you what is wrong with it. The ciphertext is the diagnostic.

This guide is the unifying frame. AES for CTF covered block ciphers and the bugs that come from misusing modes. This is the companion for the other half of CTF crypto: stream ciphers, classical keystream constructions, and everything that is not AES. I will use the term "stream cipher" loosely throughout, the same way picoCTF authors use it. Vigenere is technically a polyalphabetic substitution; OTP is technically not a cipher at all when used correctly. They behave like stream ciphers when they fail, which is the only state we see them in here.

You will leave with three working attacks, a fingerprint table that picks the right one in under a minute, and a clean mental model that connects an 8-bit LFSR, a 6-character Vigenere key, and a reused one-time pad as the same shape of bug.

Stream ciphers in one minute

A stream cipher is a function that turns a short key into an arbitrarily long pseudorandom byte sequence (the keystream), then XORs the keystream with the plaintext to produce the ciphertext.

ciphertext = plaintext  XOR  keystream
plaintext  = ciphertext XOR  keystream

That is the whole shape. Every stream cipher in picoCTF is some variation on "here is how I produce the keystream." Ciphers differ in how complicated that keystream generator is, how much state it has, and how it eats the key. The attack always reduces to the same question: can I recover the keystream from the ciphertext alone?

Three things make that easy:

Short key. If the key is shorter than the message, it has to repeat. Repetition leaks structure.
Reused keystream. If the same keystream encrypts two messages, XORing the ciphertexts cancels the keystream and leaves a XOR of two plaintexts, which is solvable.
Linear keystream generator. If the function that makes the keystream is linear over GF(2), a tiny number of output bits recovers the entire generator.

A CTF stream cipher almost always trips at least one of these. Often two. The sections below are organised around which one trips.

Key insight: The unifying frame: a stream cipher tells you which of the three failures it has. Look at the ciphertext. Is it the same length as a paragraph of English with the alphabet shifted around? You are looking at substitution-style problems. Is it raw bytes with high entropy? You are looking at XOR with something. Does the source of the keystream generator fit on one screen? It is probably linear.

Single-byte XOR: the floor

Before the multi-byte attacks, the floor: single-byte XOR. The whole keystream is one byte, repeated. There are 256 of them. Try them all.

from itertools import cycle
 
ct = bytes.fromhex("1c0111001f010100061a024b53535009181c")
for k in range(256):
    pt = bytes(c ^ k for c in ct)
    if pt.startswith(b"picoCTF{") or b"the" in pt:
        print(k, pt)

That snippet solves a meaningful slice of every CTF's easy-crypto bracket. It is the same engine as Cryptopals set 1 challenge 3 and you will find some flavour of it in picoctf-2019-vault-door-5 and dozens of beginner crypto sets. The reason it matters here is not that the attack is hard; it is that every other attack in this article is a generalisation of it. Repeating-key XOR is single-byte XOR run multiple times in parallel. LFSR cracking is single-byte XOR with a smarter way of guessing the byte. Two-time-pad recovery is single-byte XOR with the alphabet replaced by "letters that produce real English when XORed against your candidate."

Hold onto this floor. The rest of the article is teaching you what to do when there is more than one byte to guess.

Repeating-key XOR and Vigenere: short key, long message

Vigenere is the prototype of every stream-cipher-with-a-short-key bug. The key is a word, say SOLVECRYPTO. The plaintext is a paragraph. To encrypt, you walk the plaintext one character at a time and shift each letter by the next key character's position in the alphabet, looping the key when it runs out. Repeating-key XOR is the same scheme with bytes and XOR instead of letters and addition.

The cipher is a stream cipher. The keystream is the key, repeated forever. The bug is that the key is shorter than the message. Once you know how short, you can attack each repeating column independently with the single-byte technique above. The hard part is figuring out the key length.

Step 1: find the key length

Two classical tricks. Kasiski examination: look for repeated trigrams or longer substrings in the ciphertext. If the same plaintext bigram falls under the same part of the key twice, it produces the same ciphertext. The distance between matching ciphertext substrings is a multiple of the key length. Take the GCD of those distances.

Index of Coincidence (IoC): the probability that two random characters from the text are equal. For English plaintext, IoC is about 0.0667. For uniform random bytes, IoC is about 1/26 = 0.0385. A Vigenere ciphertext with the wrong key length looks more uniform; with the right key length, splitting the ciphertext into L columns and computing the IoC of each column gives you values close to the English IoC. Sweep L from 2 to 20, plot the average column IoC, look for the spike.

def ioc(text):
    n = len(text)
    if n < 2:
        return 0.0
    counts = {}
    for c in text:
        counts[c] = counts.get(c, 0) + 1
    return sum(v * (v - 1) for v in counts.values()) / (n * (n - 1))
 
def best_keylen(ct, max_len=20):
    letters = [c for c in ct.lower() if c.isalpha()]
    scores = []
    for L in range(1, max_len + 1):
        cols = [letters[i::L] for i in range(L)]
        avg = sum(ioc(col) for col in cols) / L
        scores.append((L, avg))
    return sorted(scores, key=lambda x: -x[1])[:5]

Step 2: solve each column

Once L is locked, every L-th character was encrypted under the same key character. That column is a Caesar cipher. Run frequency analysis on it: the most common letter in English is e, so the most common letter in the column probably maps to e, fixing the shift for that column. Do that for all L columns and you have the key. Apply, recover plaintext, find the flag.

For raw-bytes repeating-key XOR (no alphabetic structure), replace "most common letter is e" with a chi-square score against expected English byte frequencies. Try each of 256 key bytes for the column, score, keep the best. This is Cryptopals set 1 challenge 6 line for line.

picoCTF receipts

You will see this attack on:

picoctf-2022-vigenere : explicit Vigenere with a key. dcode.fr or any Kasiski tool ends it in seconds.
picoctf-2019-easy1 : a Vigenere-style tabula recta lookup with the key handed to you. The lesson is structural, not cryptanalytic.
picoctf-2019-java-script-kiddie : repeating-key XOR with the key generated client-side. Read the JS, recover the key.
picoctf-2024-custom-encryption : a homebrew XOR-and-multiply cipher that reduces to repeating-key XOR after the multiplication is undone.

Vigenere's real lesson is that the cipher dies the moment the key is shorter than the message. That is a much wider category than "Vigenere". Any cipher with a periodic keystream falls to this same one-two: find the period, attack each phase independently. Keep that in your head; the LFSR section is a different way of saying the same thing.

Watch out: Frequency analysis assumes the plaintext has structure. If the plaintext is itself a random string (an API key, a password, a UUID), there is nothing to be most-common, and IoC plus Kasiski both go quiet. CTF authors rarely do this; CTF flags have the format picoCTF{...} embedded somewhere, which is structure enough to bootstrap.

LFSR: linear feedback, free keystream

Now the centerpiece. A Linear Feedback Shift Register is a register of n bits where each clock tick shifts everything one position and the new bit at one end is computed as an XOR of selected positions in the register. Those selected positions are the tap polynomial. Output the bit that fell off the other end. That is one keystream bit. Repeat.

Picture a 4-bit LFSR with taps at positions 0 and 3, starting state 0001:

tick  state    output
  0   0001       1
  1   1000       0
  2   0100       0
  3   0010       0
  4   1001       1
  5   1100       0
  6   1110       0
  7   1111       1
  8   0111       1
  9   1011       1
 10   0101       1
 11   1010       0
 12   1101       1
 13   0110       0
 14   0011       1
 15   0001       1   <- back to start

That 4-bit register cycles through all 15 non-zero states before repeating, which is the maximum-length sequence (m-sequence) for n=4. With n=8, the cycle is 255. Pick the right tap polynomial and you get a sequence that looks random, has nice statistical properties, runs in two XOR gates of hardware, and was used as the basis of real ciphers for decades.

It is also completely broken once you know how to look at it.

Why LFSRs lose: linearity

Every output bit is a linear combination of the initial state bits over GF(2). That means: collect 2n output bits, and you have 2n linear equations in n unknowns (the initial state). Solve the system. Done. The Berlekamp-Massey algorithm is the elegant version of this; for small n you do not even need it.

For shift-registers specifically, with n=8 and a known plaintext prefix, the brute force is shorter than the algebra:

from itertools import islice
 
def lfsr_keystream(seed, taps, n_bits):
    state = seed
    for _ in range(n_bits):
        bit = state & 1
        yield bit
        # XOR the tap positions for the new high bit
        new_bit = 0
        for t in taps:
            new_bit ^= (state >> t) & 1
        state = (state >> 1) | (new_bit << 7)
 
ct = bytes.fromhex(open('output.txt').read().strip())
TAPS = [0, 2, 3, 4]   # read these out of chall.py
 
for seed in range(1, 256):  # seed 0 is degenerate
    bits = list(islice(lfsr_keystream(seed, TAPS, len(ct) * 8), len(ct) * 8))
    ks = bytes(
        sum(b << i for i, b in enumerate(bits[8*j:8*j+8]))
        for j in range(len(ct))
    )
    pt = bytes(c ^ k for c, k in zip(ct, ks))
    if pt.startswith(b'picoCTF{'):
        print(seed, pt)
        break

That is the whole solution to picoctf-2026-shift-registers. Read the taps and the bit-output convention out of the provided chall.py (the bit-pack endianness is the only fiddly part), drop them in, run, get the flag in microseconds. The full challenge writeup walks through it step by step.

Berlekamp-Massey: the algebraic version

When the state size is bigger than 8 bits you cannot brute-force, but you also do not have to. The Berlekamp-Massey algorithm recovers the minimum-length LFSR (both the tap polynomial and the initial state) from 2n consecutive keystream bits in O(n^2) time. You do not have to know the taps. You do not have to know the state size. You feed it bits and it tells you both.

A clean Python implementation, courtesy of the Mollard reference collapsed to the essentials:

def berlekamp_massey(s):
    n = len(s)
    C, B = [1] + [0] * n, [1] + [0] * n
    L, m, b = 0, 1, 1
    for N in range(n):
        d = s[N]
        for i in range(1, L + 1):
            d ^= C[i] & s[N - i]
        if d == 1:
            T = C[:]
            for i in range(n + 1 - m):
                C[i + m] ^= B[i]
            if 2 * L <= N:
                L, B, m = N + 1 - L, T, 1
            else:
                m += 1
        else:
            m += 1
    return L, C[:L + 1]
 
# Returns (degree, polynomial coefficients).
# Feed it a list of bits; you need at least 2 * L of them.

To use it on a real challenge: extract enough keystream bits (XOR a known-plaintext prefix against the ciphertext), feed them in, get back the polynomial and length. Now you have the generator. Run it forward to recover the rest of the keystream and decrypt.

For a known-plaintext prefix, eight bytes of picoCTF{ gives you 64 keystream bits. That covers any LFSR up to about n=32 with one-shot recovery and gives plenty of margin for 8- to 16-bit registers. If the challenge does not give you a known plaintext prefix, you can still attack the cipher: guess the first few bytes are ASCII (high bit zero), use that to constrain the keystream, and Berlekamp-Massey will tell you whether your guess was self-consistent.

Real LFSR-based ciphers and why they died

A pure LFSR is too weak to ship. Real designs combine multiple LFSRs through a nonlinear function. A5/1, the GSM stream cipher, uses three LFSRs of lengths 19, 22, and 23, with irregular clocking driven by majority logic. E0, the Bluetooth stream cipher, uses four LFSRs combined through a finite state machine. Both were broken anyway: A5/1 in 2003 by Barkan, Biham, and Keller using ciphertext-only attacks plus the GSM error-correcting redundancy, and E0 by Lu, Meier, and Vaudenay in 2004 with a related-key attack.

The pattern is: linearity is too useful to give up entirely (it is fast, cheap, and well-understood), so designers paste a nonlinear filter on top and hope. Cryptanalysts then find that the filter has its own structure, peel it, and the underlying LFSRs spill out. This is mostly historical now; ChaCha20 ships in TLS, and A5/1 has been replaced by A5/3 (KASUMI) and A5/4 (SNOW 3G). But the historical lesson is the one CTFs teach: do not roll your own anything that uses an LFSR as a primary mixing element.

picoCTF receipts

picoctf-2026-shift-registers : 8-bit LFSR, taps in the source, 256-seed brute force solves it. The marquee challenge for this section.
picoctf-2019-seed-spring : not an LFSR, but a closely related "the keystream comes from a predictable PRNG" challenge. Same shape: small-state generator, predictable output, exhaustive search over the seed.

Key insight: The unifying property is not "LFSR". It is small-state keystream generator with a known structure. An 8-bit LFSR has 256 states. A 32-bit LCG has 4 billion (still tractable for offline attacks). A Mersenne Twister has a 624-word state (not directly brute-forceable, but predictable from 624 consecutive 32-bit outputs, which is its own attack). Every CTF in this family is "the keystream generator is too small or too transparent."

Keystream reuse: two ciphertexts, one stream

The cleanest way to break a stream cipher is for the author to break it for you. Reusing a keystream across two messages does that. The math is one line:

c1 = p1 XOR k
c2 = p2 XOR k
c1 XOR c2 = p1 XOR p2     # k cancels

You now have the XOR of two unknown plaintexts. If both plaintexts are English (or any structured language), you can recover both. The trick is called crib dragging: guess a string that might appear in p1, XOR it across c1 XOR c2 at every position, and look for output that resembles English. When it does, you have found both the position of your guess in p1 and the corresponding fragment of p2 at that same position.

The classic crib is the literal string " the " (with surrounding spaces). It is short, frequent, and produces obvious garbage when wrong and obvious English when right. Once you find " the " in p1 at offset 17, you also know p2[17:22]. Extend the crib in both directions one character at a time. Often you can read off both messages by pure visual inspection in 10 minutes.

def drag(c1c2, crib):
    crib = crib.encode()
    for i in range(len(c1c2) - len(crib) + 1):
        window = c1c2[i:i + len(crib)]
        out = bytes(a ^ b for a, b in zip(window, crib))
        if all(32 <= b < 127 for b in out):
            print(f"offset {i}: {out}")
 
c1 = bytes.fromhex("...")
c2 = bytes.fromhex("...")
c1c2 = bytes(a ^ b for a, b in zip(c1, c2))
drag(c1c2, " the ")
drag(c1c2, " and ")
drag(c1c2, "picoCTF{")

That last crib, picoCTF{, is the gift CTF authors keep giving. If either plaintext is a flag, dragging the flag prefix across c1 XOR c2 reveals where the flag sits in one ciphertext, and the corresponding bytes in the other ciphertext fall out for free.

picoCTF receipts

picoctf-2021-easy-peasy : the canonical reused-OTP challenge. The server uses a 40000-byte key as a one-time pad and tracks an offset. Send 40000 bytes of known plaintext to roll the offset back to zero, then ask the server to encrypt the encrypted flag, which now uses the same keystream the flag was encrypted with. XOR cancels the keystream and the original plaintext appears.
picoctf-2020-mini-competition-otp-implementation : same pattern in a different wrapper. The takeaway is identical: an OTP that is reused stops being a one-time pad.

Two-time pad is the bug that taught the discipline why "one-time" is in the name. The famous VENONA project decrypted Soviet diplomatic cables in the 1940s and 1950s by exploiting exactly this: the Soviets reused some of their one-time pad pages, and Allied cryptanalysts spent decades reading the traffic. Stream cipher reuse remains the highest-value bug class in deployed crypto. Microsoft Office's Word 97 implementation reused RC4 keystreams across document versions; researchers were able to recover edits and revisions for years before it was fixed.

Watch out: ChaCha20 and AES-CTR are both stream ciphers in the formal sense. They are perfectly secure as long as the (key, nonce) pair never repeats. Reuse a nonce, and you immediately have a two-time pad with the same crib-dragging weakness as the classroom example above. There is no "modern" in modern crypto when you reuse the keystream.

Fingerprint the cipher in 30 seconds

When a new crypto challenge drops, the first 30 seconds is fingerprinting. The ciphertext tells you almost everything if you know what to look at. This table is the cheat sheet I wish someone had handed me on day one.

Cipher family	What ciphertext looks like	The tell	Attack	Receipt
Caesar / ROT-N	A-Z text, same length as plaintext, word boundaries preserved	IoC near 0.067; 26 candidates	Try all 26 shifts; flag is in one of them	caesar, 13
Substitution	A-Z text, IoC near 0.067, but no shift solves it	Letter frequency mismatched, structure preserved	Frequency analysis; quipqiup.com or hill-climbing	waves-over-lambda, la-cifra-de
Vigenere / repeating-key XOR	A-Z text or random bytes; IoC drops	IoC of full text near 0.04; IoC of every L-th char near 0.067 for some L	Kasiski + IoC for L; per-column frequency for the key	vigenere, custom-encryption
Single-byte XOR	Random-looking bytes, often hex; same length as plaintext	High entropy; 256-key brute force trivial	Try all 256 bytes; score with English frequency	vault-door-5
LFSR	Random bytes; source provided with shift register code	State size N is small (8-32); known taps	Brute force seed (small N) or Berlekamp-Massey (any N)	shift-registers
Predictable PRNG keystream	Random bytes; source uses random.seed() or time()	Seed is small or predictable (epoch second, PID, etc.)	Brute force the seed; replay the PRNG	seed-spring
Two-time pad / reused OTP	Two ciphertexts of similar length, same key offset	Same length, same source, hint that key is reused	XOR ciphertexts; crib drag	easy-peasy, otp-implementation
Block cipher (ECB/CBC/CTR)	Bytes in fixed-size blocks (16 for AES); ECB shows repeated blocks	Length is multiple of block size; structural patterns at block boundaries	Mode-specific (ECB cut-paste, CBC bit-flip, CTR nonce reuse)	AES for CTF

Two minutes with this table replaces an hour of poking. Compute the IoC. Look at the source. Count the bytes. Decide which row you are in. Then read the corresponding section.

Tip: Keep a small Python file with ioc(), chi_square_english(), a single-byte XOR brute-forcer, a Kasiski search, and the crib-drag from this article. Drop it into ~/.ctf/cipher_kit.pyand import it the moment a crypto challenge appears. Half my crypto solves are now "run the kit, look at the output, write five lines to finish."

Where real stream ciphers diverge

The ciphers in this article all lose because they were built to lose. CTF authors choose them because the failure mode is teachable. Real stream ciphers (RC4, ChaCha20, Salsa20) are different in three concrete ways, and the differences are exactly what the broken ones lack.

Big internal state. RC4 has a 256-byte permutation. ChaCha20 has a 16-word state. There is no exhaustive search over the state, and Berlekamp-Massey does not apply because the state-update is not linear over GF(2).
Nonlinear update.ChaCha20's round function uses 32-bit additions, XORs, and rotations (the "ARX" primitive set). Additions are not linear over GF(2); they introduce carry. That single decision moves the cipher out of reach of the LFSR-style algebraic attacks.
Mandatory nonce. ChaCha20 takes a key and a 96-bit nonce. The keystream is a function of both. The nonce is meant to vary; if it does, the keystream never repeats and the two-time-pad attack does not apply. The cipher also fails immediately if the nonce repeats, which is why TLS, WireGuard, and modern protocols all bake nonce uniqueness into the design rather than trusting the user.

You will rarely break ChaCha20 in a CTF. You will sometimes break a ChaCha20 challenge because someone hardcoded the nonce or used a counter that resets. The bug is then a two-time-pad bug, which lives in the previous section. The cipher itself is fine.

One more, briefly. RC4 is the famous-but-deprecated stream cipher used in WEP, early TLS, and a thousand legacy protocols. RC4's keystream has well-known biases: the first 256 bytes leak information about the key, certain byte positions are non-uniform, and related-key attacks broke WEP entirely. RC4 NOMORE recovered HTTP cookies from RC4-protected TLS sessions in 75 hours. RC4 is not present in current TLS. If a CTF puts RC4 in your hands, the standard move is "drop the first 1024 keystream bytes" (the SkipFirst-N defense from the early 2000s) and treat the rest as a generic stream cipher.

The line that separates broken stream ciphers from secure ones is not big or mysterious. It is "state large enough to make exhaustive search impossible, update function nonlinear over GF(2), and a nonce that the protocol enforces uniqueness on." CTF stream-cipher challenges are almost always missing two of those three. Once you see which two, the attack is on the shelf.

Where to go next

The shape of every stream-cipher challenge in this guide is the same: the keystream is shorter, reused, or linear, and the ciphertext shows you which. Three attacks (Kasiski plus per-column frequency for repeating keys, Berlekamp-Massey or seed brute force for linear generators, crib dragging for reused keystreams) cover every receipt I have linked above. They cover most picoCTF stream-cipher challenges going back to 2019.

What this guide deliberately did not cover: AES modes and the bugs that come from misusing them, RSA attacks for the public-key side of the room, hash cracking when the "cipher" turns out to be a hash, and layered encodings when the challenge is base64-inside-hex-inside-ROT13 wearing a cipher costume. Each of those is its own animal. The sibling posts walk through them.

If you only take one thing from this article, take this: when you next open a crypto challenge, do not start with the algebra. Start with the ciphertext. Compute the IoC. Look at the source. Count the bytes. The cipher will tell you what it is. Listen.

Now go solve shift-registers.

Sources and further reading

Cryptopals Crypto Challenges, Set 1 : single-byte XOR, repeating-key XOR, and the canonical Kasiski-plus-IoC pipeline. Every CTF cryptographer should solve this set once.
Berlekamp-Massey algorithm (Wikipedia) : the cleanest free reference, with pseudocode and worked examples.
Barkan, Biham, and Keller, "Instant Ciphertext-Only Cryptanalysis of GSM Encrypted Communication" (CRYPTO 2003) : the A5/1 break. Worth reading once for the historical picture.
VENONA project (Wikipedia) : the highest-stakes two-time-pad bug in history.
RC4 NOMORE : practical attack on RC4-in-TLS, the work that finally retired RC4 from web crypto.
Index of Coincidence (Wikipedia) : Friedman's 1922 paper, still the cleanest exposition.
quipqiup.com : browser-based hill-climbing solver for monoalphabetic substitutions. Saves time when the cipher is clearly substitution and the key is not in the source.