x86-64 Assembly for CTF: Reading Disassembly From Scratch

What does this disassembly actually say?

Here is the answer a skimmer can keep: assembly is not a language you read like prose, it is a list of one-line moves on a tiny set of numbered boxes called registers. Each line copies a value, does one piece of arithmetic, compares two values, or jumps somewhere. There is no nesting, no scope, no types. If you can track six or seven registers and one stack pointer on a piece of paper, you can read any x86-64 function that Ghidra or GDB puts in front of you.

You already opened the binary. Ghidra gave you a graph, GDB gave you a wall of mov, lea, cmp, and jne, and it all looked like noise. It is not noise. It is the most literal description of the program that exists, more honest than the decompiler's C, because the CPU runs exactly these instructions and nothing else. This guide teaches you to read them from zero.

Assembly has no secrets. Every instruction does one small, fully specified thing. The only skill is patience: one line at a time, watch the registers change.

This is a foundational pillar. The Ghidra guide and the GDB CTF Guide both assume you can read the assembly they display, and the exploitation posts (Buffer Overflow, Shellcode, and ROP without a libc leak) all build on it. By the end you will hand-trace a real picoCTF asm challenge to its exact return value.

Note: We focus on x86-64 (also called amd64), the 64-bit instruction set on almost every desktop and server. The picoCTF asm-series challenges are 32-bit x86, which differs in a few specific ways. We cover x86-64 first, then call out exactly what changes for 32-bit when we trace the challenge, so you can read both.

What are registers, and what is the stack?

A register is a small, fast storage slot inside the CPU. x86-64 gives you sixteen general-purpose registers, each holding 64 bits (8 bytes). That is the entire working memory the CPU can touch instantly. Everything else lives in slower RAM and has to be loaded in and out. When you read assembly you are mostly watching values shuttle between these sixteen boxes.

64-bit	Typical role	32 / 16 / 8-bit name
rax	Return value; scratch; syscall number	eax / ax / al
rdi, rsi, rdx, rcx	First four function arguments	edi / esi / edx / ecx ...
r8, r9	Fifth and sixth arguments	r8d / r9d ...
rbp	Base pointer (frame anchor)	ebp / bp / bpl
rsp	Stack pointer (top of the stack)	esp / sp / spl
rip	Instruction pointer (next instruction)	not directly writable

The smaller names matter. rax is the full 64-bit register; eax is its low 32 bits; ax is the low 16; al is the low 8. They are not separate registers, they are windows onto the same box. When you see mov eax, 5 the CPU writes 5 into the low 32 bits and, on x86-64, zeroes the top 32. So eax and rax are the same storage seen at two widths. Beginners lose hours forgetting this and treating eax as unrelated to rax.

Key insight: The names rdi, rsi, rdx as "argument registers" is a software convention, not a hardware rule. The CPU does not know what an argument is. A calling convention (covered below) is just an agreement everyone compiles against so functions can call each other.

The stack is a region of RAM that grows downward: pushing a value subtracts from rsp, popping adds to it. It is where functions keep local variables, saved registers, and the return address that says where to go when the function finishes. Two instructions move it:

push rax     ; rsp -= 8, then store rax at [rsp]   (grows the stack down)
pop  rax     ; load [rsp] into rax, then rsp += 8  (shrinks the stack up)

Square brackets mean "the memory at this address." [rsp] is the 8 bytes sitting at whatever address rsp currently holds. rsp is a pointer; [rsp] is what it points at. That one distinction, register versus the memory a register addresses, is most of what trips people up in their first week.

How do I read a function prologue and epilogue?

Almost every function begins and ends with the same boilerplate. Once you recognize it you can skip past it and get to the logic. The opening is the prologue:

push   rbp            ; save the caller's frame anchor on the stack
mov    rbp, rsp       ; set rbp to the current stack top: this frame's anchor
sub    rsp, 0x20      ; reserve 0x20 (32) bytes of local variable space

After those three lines, rbp points at a fixed spot for the whole function, and locals are addressed relative to it. You will see mov DWORD PTR [rbp-0x4], edi meaning "store the 32-bit value in edi into the local variable 4 bytes below the frame anchor." Negative offsets from rbp are locals; positive offsets are (in 32-bit) incoming arguments. DWORD PTR just says the access is 4 bytes wide (DWORD = 4, QWORD = 8, WORD = 2, BYTE = 1).

The closing boilerplate is the epilogue:

leave                 ; equivalent to: mov rsp, rbp ; pop rbp
ret                   ; pop the return address into rip and jump there

leave tears down the frame by restoring rsp and rbp to what the caller had. ret pops the return address the call instruction pushed and resumes the caller. Whatever is in rax at ret is the function's return value. That is the single most useful fact for the asm challenges: to find what a function returns, find what is in rax when it hits ret.

Tip: Compilers built with optimization (and many modern ones by default) omit the frame pointer and address locals straight off rsp. If you do not see push rbp ; mov rbp, rsp, the function is frame-pointer-omitted and locals live at [rsp+N] instead of [rbp-N]. The logic is identical; only the anchor changed.

How do mov, the arithmetic instructions, and lea work?

The workhorse is mov dst, src: copy src into dst. It does not move, it copies, and the destination is written on the left (in Intel syntax, which we use here). The source can be a number (an immediate), a register, or memory; the destination can be a register or memory, but not two memory operands at once.

mov rax, 0x10        ; rax = 0x10            (immediate into register)
mov rax, rbx         ; rax = rbx            (register into register)
mov rax, [rbx]       ; rax = memory at rbx  (load 8 bytes from RAM)
mov [rbx], rax       ; memory at rbx = rax  (store 8 bytes to RAM)

The arithmetic instructions modify their destination in place:

add rax, rbx         ; rax = rax + rbx
sub rax, 5           ; rax = rax - 5
imul rax, rbx        ; rax = rax * rbx   (signed multiply)
xor rax, rax         ; rax = 0           (the standard way to zero a register)
and rax, 0xff        ; rax = rax & 0xff  (keep the low byte)
shl rax, 3           ; rax = rax << 3    (multiply by 8)
inc rax              ; rax = rax + 1

xor rax, rax deserves a note: anything XORed with itself is zero, so this is the compact idiom for "set this register to 0." You will see it constantly. When you spot it, just read it as rax = 0.

Now the instruction that confuses every beginner: lea, Load Effective Address. It looks like a memory access but it never touches memory. It computes an address and stores the address itself, not the contents at that address.

mov rax, [rbx+rcx*4+8]   ; rax = the VALUE stored in memory at rbx+rcx*4+8
lea rax, [rbx+rcx*4+8]   ; rax = the ADDRESS rbx+rcx*4+8 itself (no memory read)

Because the bracket expression can scale and add, compilers love lea as a fast calculator. lea rax, [rdi+rdi*2] computes rdi * 3 in one instruction with no multiply unit. So when you see lea, ask: is the compiler taking the address of a variable or array element, or is it just doing arithmetic? Both are common. The bracket form is [base + index*scale + displacement], where scale is 1, 2, 4, or 8.

Note: A pure arithmetic lea like lea eax, [rdi+0x3] is exactly the add you would expect: eax = edi + 3. Keep that in your pocket, because the asm challenge we trace later uses precisely this pattern to produce its answer.

How does control flow work? cmp, test, and the conditional jumps

Assembly has no if or while. It has comparisons that set invisible flag bits, and jumps that read those flags to decide whether to branch. Two instructions do the comparing.

cmp a, b computes a - b, throws the result away, and keeps only the flags it set. If a == b the Zero Flag is set. If a < b the Sign and Carry flags reflect it. test a, b does the same but with a bitwise AND. The overwhelmingly common idiom test rax, rax ANDs a register with itself, which sets the Zero Flag if and only if the register is zero. Read it as "is rax zero?"

A conditional jump immediately after the comparison turns the flags into a branch:

Jump	Taken when (after cmp a, b)	Signed?
je / jz	a == b (Zero Flag set)	either
jne / jnz	a != b (Zero Flag clear)	either
jg / jnle	a > b	signed
jl / jnge	a < b	signed
jge / jle	a >= b / a <= b	signed
ja / jb	a > b / a < b	unsigned
jmp	always (unconditional)	n/a

The signed versus unsigned split matters. jg and jl treat the values as signed (they can be negative); ja and jb treat them as unsigned (a stands for "above," b for "below"). Pick the wrong interpretation and a comparison against a value with the high bit set will flip on you. For the asm challenges, watch which mnemonic the compiler emitted and trust it: it knows the original C type.

So a C if like the one on the left compiles to the assembly on the right:

// if (x > 10) y = 1; else y = 2;
    cmp  DWORD PTR [rbp-0x4], 0xa   ; compare x to 10
    jle  .else_branch               ; if x <= 10, go to else
    mov  DWORD PTR [rbp-0x8], 1     ; y = 1
    jmp  .done
.else_branch:
    mov  DWORD PTR [rbp-0x8], 2     ; y = 2
.done:

Notice the compiler inverted the test: the C says x > 10, but the assembly jumps away when x <= 10. That is normal. The branch guards the path you do notwant to fall into. Read the jump as "skip the next block if the condition for entering it fails," and the inversion stops being confusing.

Where do function arguments live? The System V x86-64 calling convention

When one function calls another, how does the second one find its arguments? On 64-bit Linux (and macOS) the answer is the System V AMD64 ABI, the contract every compiler on the platform obeys. Memorize this one table and most function calls become readable:

Argument	Register	Example: func(a, b, c)
1st	rdi	a
2nd	rsi	b
3rd	rdx	c
4th	rcx
5th	r8
6th	r9
7th and beyond	on the stack	pushed right to left
return value	rax	what the caller reads back

The mnemonic most people use is "Diane's silk dress costs 89 dollars": the first letters give di, si, d, c, 8, 9, which maps to rdi, rsi, rdx, rcx, r8, r9. So a block like this reads off cleanly:

mov    edi, 0x1            ; arg1 = 1
lea    rsi, [rip+0x2004]   ; arg2 = address of a string (a format or label)
mov    edx, 0x10           ; arg3 = 0x10
call   some_function       ; some_function(1, &string, 0x10)
; ... after the call, rax holds the return value

When you reach a call, glance backward to see which argument registers were just set, and you have reconstructed the call's arguments. When you reach a ret, look at rax for the answer. This is also the backbone of the exploitation posts: a ROP chain is just you setting rdi, rsi, and rdx by hand before forcing a call, and shellcode sets rax to a syscall number and loads the same argument registers.

Warning: The syscall ABI is similar but not identical. A raw syscall instruction takes its number in rax and its arguments in rdi, rsi, rdx, then r10 (not rcx), r8, r9. The fourth argument moving from rcx to r10 is the one difference that bites people writing shellcode.

The authoritative source is the System V AMD64 ABI document itself, maintained at the x86-64 psABI project. You do not need to read it to solve challenges; the table above is the working subset.

Worked example: tracing a picoCTF asm challenge by hand

Time to do it for real. The picoCTF asm series hands you a small assembly function and asks what it returns for a given input. picoCTF 2019 asm1 asks: what does asm1(0x345) return? We will trace it to the exact value, by hand, no tools.

Note: The asm series is 32-bit x86 using the cdecl convention, not the 64-bit System V ABI we just covered. The difference that matters here: in 32-bit cdecl the argument is not in a register, it is pushed on the stack and read at [ebp+0x8] after the prologue. The return value still comes back in eax. Everything else (cmp, the conditional jumps, lea) reads identically to x86-64. We will flag the 32-bit-specific lines as we hit them.

The function has this shape. Read it top to bottom:

asm1:
    push   ebp
    mov    ebp, esp                 ; prologue: ebp now anchors the frame
    cmp    DWORD PTR [ebp+0x8], 0x3b9   ; compare the argument to 0x3b9
    jg     part_a                   ; if arg > 0x3b9, jump to part_a
    cmp    DWORD PTR [ebp+0x8], 0x342   ; compare the argument to 0x342
    jne    part_b                   ; if arg != 0x342, jump to part_b
    mov    eax, DWORD PTR [ebp+0x8]
    add    eax, 0x60                ; (this path: arg + 0x60)
    jmp    part_done
part_a:
    mov    eax, DWORD PTR [ebp+0x8]
    sub    eax, 0x12                ; (this path: arg - 0x12)
    jmp    part_done
part_b:
    mov    eax, DWORD PTR [ebp+0x8]
    add    eax, 0x3                 ; (this path: arg + 3)
part_done:
    pop    ebp
    ret                             ; return eax

Now trace it with the actual input. Our argument is 0x345, which is 837 in decimal. Keep a running note of two things: where execution is, and what is in eax.

Step	Instruction	What happens with arg = 0x345
1	push ebp / mov ebp, esp	Prologue. The argument now sits at [ebp+0x8].
2	cmp [ebp+0x8], 0x3b9	Compare 0x345 to 0x3b9. 0x345 < 0x3b9.
3	jg part_a	arg is NOT greater, so the jump is not taken. Fall through.
4	cmp [ebp+0x8], 0x342	Compare 0x345 to 0x342. They are not equal.
5	jne part_b	arg != 0x342 is true, so we jump to part_b.
6	mov eax, [ebp+0x8]	eax = 0x345.
7	add eax, 0x3	eax = 0x345 + 3 = 0x348.
8	pop ebp / ret	Return eax = 0x348.

The answer is 0x348. We never ran the program. We followed two comparisons, took the branch each one dictated, did one addition, and read eax at ret. That is the entire method, and it scales to functions ten times this size: the work is always "which branch, then what is in the return register."

Tip: Always verify a hand trace by compiling and running it. For the asm series: gcc -m32 -no-pie -o test test.S then call the function from Python with ctypes.CDLL('./test').asm1(0x345) and print it in hex. If your trace and the CPU disagree, the CPU is right, and finding where they diverge is the most efficient way to learn. The GDB CTF Guide shows how to single-step the same function and watch the flags change after each cmp.

The later challenges scale the same skill up. picoCTF 2019 asm2 adds a loop (a backward conditional jump), so you trace the loop body until the exit condition fires instead of just falling through. picoCTF 2019 asm3 works with multiple arguments and sub-register widths, where you must respect that al and ax are windows onto eax. picoCTF 2019 asm4 walks a string and computes an offset, so you track a pointer and an accumulator together. None of them need a new concept. They need the same patient trace.

Why does the same code look different? AT&T vs Intel syntax

You will meet the same instruction written two ways depending on the tool, and the difference is purely cosmetic, but it reverses the operand order, so it must be known cold. The two syntaxes are Intel (what Ghidra, most Windows tools, and the snippets in this post use) and AT&T (the default for objdump and many Linux GDB setups). Same machine code, different printing.

Trait	Intel	AT&T
Operand order	mov dst, src	mov src, dst (reversed)
Registers	rax	%rax (percent prefix)
Immediates	5	$5 (dollar prefix)
Memory	[rbp-0x4]	-0x4(%rbp)
Size suffix	DWORD PTR [rax]	movl (%rax) (l suffix)

The same line, both ways:

Intel:   mov   eax, DWORD PTR [ebp+0x8]   ; eax = the argument
AT&T:    movl  0x8(%ebp), %eax            ; same thing, source on the left

The single rule that saves you: in Intel, the destination is on the left (it reads like dst = src); in AT&T, the destination is on the right. If you only remember one thing about AT&T, remember that the operands are flipped. The picoCTF asm .S files are usually AT&T because they come from gcc -S. To make GDB show you Intel instead, run set disassembly-flavor intel, and to make objdump do it, add -M intel.

# objdump in Intel syntax
objdump -d -M intel ./binary | less
# GDB in Intel syntax (put this in ~/.gdbinit to make it permanent)
set disassembly-flavor intel

Key insight: There is no "correct" syntax. Pick one, set your tools to it everywhere, and stop translating in your head. Most CTF players standardize on Intel because Ghidra and the major exploitation tutorials use it, which keeps your eyes trained on one form.

Quick reference

The reading method, every time

Find the prologue. Skip it. Note where locals and arguments live (32-bit: arg at [ebp+0x8]; 64-bit: args in rdi, rsi, rdx, ...).
Walk one instruction at a time, tracking each register's value on paper.
At every cmp or test, decide whether the following jump is taken, and follow the path that is actually executed.
At ret, read rax (or eax). That is the return value.
Verify by compiling and running, or by single-stepping in GDB.

Instruction cheat sheet

mov  dst, src     ; copy src into dst (Intel: dst on the left)
lea  dst, [expr]  ; dst = the ADDRESS expr, not the memory at it
add/sub/imul      ; dst = dst (+ - *) src
xor  rax, rax     ; rax = 0   (the zeroing idiom)
test rax, rax     ; sets Zero Flag if rax == 0
cmp  a, b         ; compute a - b, keep only the flags
push/pop r        ; move r onto/off the stack (rsp -= 8 / rsp += 8)
call f / ret      ; push return address & jump / pop it & return
je/jne            ; jump if equal / not equal
jg/jl jge/jle     ; signed greater / less (and -or-equal)
ja/jb             ; unsigned above / below
jmp               ; jump unconditionally

Calling convention at a glance

System V x86-64 args:  rdi, rsi, rdx, rcx, r8, r9  (then the stack)
Return value:          rax
syscall:               number in rax; args rdi, rsi, rdx, r10, r8, r9
32-bit cdecl args:     all on the stack, read at [ebp+0x8], [ebp+0xc], ...
32-bit return value:   eax

That is the whole job. Assembly looked like a wall because you were trying to read it like a paragraph; read it like a checklist, one line at a time with your registers on paper, and the wall turns into a recipe you can follow with your eyes closed.

Reading assembly is not a talent, it is a checklist you run one line at a time until the return register tells you the answer.