What does this disassembly actually say?
Here is the answer a skimmer can keep: assembly is not a language you read like prose, it is a list of one-line moves on a tiny set of numbered boxes called registers. Each line copies a value, does one piece of arithmetic, compares two values, or jumps somewhere. There is no nesting, no scope, no types. If you can track six or seven registers and one stack pointer on a piece of paper, you can read any x86-64 function that Ghidra or GDB puts in front of you.
You already opened the binary. Ghidra gave you a graph, GDB gave you a wall of mov, lea, cmp, and jne, and it all looked like noise. It is not noise. It is the most literal description of the program that exists, more honest than the decompiler's C, because the CPU runs exactly these instructions and nothing else. This guide teaches you to read them from zero.
Assembly has no secrets. Every instruction does one small, fully specified thing. The only skill is patience: one line at a time, watch the registers change.
This is a foundational pillar. The Ghidra guide and the GDB CTF Guide both assume you can read the assembly they display, and the exploitation posts (Buffer Overflow, Shellcode, and ROP without a libc leak) all build on it. By the end you will hand-trace a real picoCTF asm challenge to its exact return value.
What are registers, and what is the stack?
A register is a small, fast storage slot inside the CPU. x86-64 gives you sixteen general-purpose registers, each holding 64 bits (8 bytes). That is the entire working memory the CPU can touch instantly. Everything else lives in slower RAM and has to be loaded in and out. When you read assembly you are mostly watching values shuttle between these sixteen boxes.
| 64-bit | Typical role | 32 / 16 / 8-bit name |
|---|---|---|
| rax | Return value; scratch; syscall number | eax / ax / al |
| rdi, rsi, rdx, rcx | First four function arguments | edi / esi / edx / ecx ... |
| r8, r9 | Fifth and sixth arguments | r8d / r9d ... |
| rbp | Base pointer (frame anchor) | ebp / bp / bpl |
| rsp | Stack pointer (top of the stack) | esp / sp / spl |
| rip | Instruction pointer (next instruction) | not directly writable |
The smaller names matter. rax is the full 64-bit register; eax is its low 32 bits; ax is the low 16; al is the low 8. They are not separate registers, they are windows onto the same box. When you see mov eax, 5 the CPU writes 5 into the low 32 bits and, on x86-64, zeroes the top 32. So eax and rax are the same storage seen at two widths. Beginners lose hours forgetting this and treating eax as unrelated to rax.
rdi, rsi, rdx as "argument registers" is a software convention, not a hardware rule. The CPU does not know what an argument is. A calling convention (covered below) is just an agreement everyone compiles against so functions can call each other.The stack is a region of RAM that grows downward: pushing a value subtracts from rsp, popping adds to it. It is where functions keep local variables, saved registers, and the return address that says where to go when the function finishes. Two instructions move it:
push rax ; rsp -= 8, then store rax at [rsp] (grows the stack down)pop rax ; load [rsp] into rax, then rsp += 8 (shrinks the stack up)
Square brackets mean "the memory at this address." [rsp] is the 8 bytes sitting at whatever address rsp currently holds. rsp is a pointer; [rsp] is what it points at. That one distinction, register versus the memory a register addresses, is most of what trips people up in their first week.
How do I read a function prologue and epilogue?
Almost every function begins and ends with the same boilerplate. Once you recognize it you can skip past it and get to the logic. The opening is the prologue:
push rbp ; save the caller's frame anchor on the stackmov rbp, rsp ; set rbp to the current stack top: this frame's anchorsub rsp, 0x20 ; reserve 0x20 (32) bytes of local variable space
After those three lines, rbp points at a fixed spot for the whole function, and locals are addressed relative to it. You will see mov DWORD PTR [rbp-0x4], edi meaning "store the 32-bit value in edi into the local variable 4 bytes below the frame anchor." Negative offsets from rbp are locals; positive offsets are (in 32-bit) incoming arguments. DWORD PTR just says the access is 4 bytes wide (DWORD = 4, QWORD = 8, WORD = 2, BYTE = 1).
The closing boilerplate is the epilogue:
leave ; equivalent to: mov rsp, rbp ; pop rbpret ; pop the return address into rip and jump there
leave tears down the frame by restoring rsp and rbp to what the caller had. ret pops the return address the call instruction pushed and resumes the caller. Whatever is in rax at ret is the function's return value. That is the single most useful fact for the asm challenges: to find what a function returns, find what is in rax when it hits ret.
rsp. If you do not see push rbp ; mov rbp, rsp, the function is frame-pointer-omitted and locals live at [rsp+N] instead of [rbp-N]. The logic is identical; only the anchor changed.How do mov, the arithmetic instructions, and lea work?
The workhorse is mov dst, src: copy src into dst. It does not move, it copies, and the destination is written on the left (in Intel syntax, which we use here). The source can be a number (an immediate), a register, or memory; the destination can be a register or memory, but not two memory operands at once.
mov rax, 0x10 ; rax = 0x10 (immediate into register)mov rax, rbx ; rax = rbx (register into register)mov rax, [rbx] ; rax = memory at rbx (load 8 bytes from RAM)mov [rbx], rax ; memory at rbx = rax (store 8 bytes to RAM)
The arithmetic instructions modify their destination in place:
add rax, rbx ; rax = rax + rbxsub rax, 5 ; rax = rax - 5imul rax, rbx ; rax = rax * rbx (signed multiply)xor rax, rax ; rax = 0 (the standard way to zero a register)and rax, 0xff ; rax = rax & 0xff (keep the low byte)shl rax, 3 ; rax = rax << 3 (multiply by 8)inc rax ; rax = rax + 1
xor rax, rax deserves a note: anything XORed with itself is zero, so this is the compact idiom for "set this register to 0." You will see it constantly. When you spot it, just read it as rax = 0.
Now the instruction that confuses every beginner: lea, Load Effective Address. It looks like a memory access but it never touches memory. It computes an address and stores the address itself, not the contents at that address.
mov rax, [rbx+rcx*4+8] ; rax = the VALUE stored in memory at rbx+rcx*4+8lea rax, [rbx+rcx*4+8] ; rax = the ADDRESS rbx+rcx*4+8 itself (no memory read)
Because the bracket expression can scale and add, compilers love lea as a fast calculator. lea rax, [rdi+rdi*2] computes rdi * 3 in one instruction with no multiply unit. So when you see lea, ask: is the compiler taking the address of a variable or array element, or is it just doing arithmetic? Both are common. The bracket form is [base + index*scale + displacement], where scale is 1, 2, 4, or 8.
lea like lea eax, [rdi+0x3] is exactly the add you would expect: eax = edi + 3. Keep that in your pocket, because the asm challenge we trace later uses precisely this pattern to produce its answer.How does control flow work? cmp, test, and the conditional jumps
Assembly has no if or while. It has comparisons that set invisible flag bits, and jumps that read those flags to decide whether to branch. Two instructions do the comparing.
cmp a, b computes a - b, throws the result away, and keeps only the flags it set. If a == b the Zero Flag is set. If a < b the Sign and Carry flags reflect it. test a, b does the same but with a bitwise AND. The overwhelmingly common idiom test rax, rax ANDs a register with itself, which sets the Zero Flag if and only if the register is zero. Read it as "is rax zero?"
A conditional jump immediately after the comparison turns the flags into a branch:
| Jump | Taken when (after cmp a, b) | Signed? |
|---|---|---|
| je / jz | a == b (Zero Flag set) | either |
| jne / jnz | a != b (Zero Flag clear) | either |
| jg / jnle | a > b | signed |
| jl / jnge | a < b | signed |
| jge / jle | a >= b / a <= b | signed |
| ja / jb | a > b / a < b | unsigned |
| jmp | always (unconditional) | n/a |
The signed versus unsigned split matters. jg and jl treat the values as signed (they can be negative); ja and jb treat them as unsigned (a stands for "above," b for "below"). Pick the wrong interpretation and a comparison against a value with the high bit set will flip on you. For the asm challenges, watch which mnemonic the compiler emitted and trust it: it knows the original C type.
So a C if like the one on the left compiles to the assembly on the right:
// if (x > 10) y = 1; else y = 2;cmp DWORD PTR [rbp-0x4], 0xa ; compare x to 10jle .else_branch ; if x <= 10, go to elsemov DWORD PTR [rbp-0x8], 1 ; y = 1jmp .done.else_branch:mov DWORD PTR [rbp-0x8], 2 ; y = 2.done:
Notice the compiler inverted the test: the C says x > 10, but the assembly jumps away when x <= 10. That is normal. The branch guards the path you do notwant to fall into. Read the jump as "skip the next block if the condition for entering it fails," and the inversion stops being confusing.
Where do function arguments live? The System V x86-64 calling convention
When one function calls another, how does the second one find its arguments? On 64-bit Linux (and macOS) the answer is the System V AMD64 ABI, the contract every compiler on the platform obeys. Memorize this one table and most function calls become readable:
| Argument | Register | Example: func(a, b, c) |
|---|---|---|
| 1st | rdi | a |
| 2nd | rsi | b |
| 3rd | rdx | c |
| 4th | rcx | |
| 5th | r8 | |
| 6th | r9 | |
| 7th and beyond | on the stack | pushed right to left |
| return value | rax | what the caller reads back |
The mnemonic most people use is "Diane's silk dress costs 89 dollars": the first letters give di, si, d, c, 8, 9, which maps to rdi, rsi, rdx, rcx, r8, r9. So a block like this reads off cleanly:
mov edi, 0x1 ; arg1 = 1lea rsi, [rip+0x2004] ; arg2 = address of a string (a format or label)mov edx, 0x10 ; arg3 = 0x10call some_function ; some_function(1, &string, 0x10); ... after the call, rax holds the return value
When you reach a call, glance backward to see which argument registers were just set, and you have reconstructed the call's arguments. When you reach a ret, look at rax for the answer. This is also the backbone of the exploitation posts: a ROP chain is just you setting rdi, rsi, and rdx by hand before forcing a call, and shellcode sets rax to a syscall number and loads the same argument registers.
syscall instruction takes its number in rax and its arguments in rdi, rsi, rdx, then r10 (not rcx), r8, r9. The fourth argument moving from rcx to r10 is the one difference that bites people writing shellcode.The authoritative source is the System V AMD64 ABI document itself, maintained at the x86-64 psABI project. You do not need to read it to solve challenges; the table above is the working subset.
Worked example: tracing a picoCTF asm challenge by hand
Time to do it for real. The picoCTF asm series hands you a small assembly function and asks what it returns for a given input. picoCTF 2019 asm1 asks: what does asm1(0x345) return? We will trace it to the exact value, by hand, no tools.
[ebp+0x8] after the prologue. The return value still comes back in eax. Everything else (cmp, the conditional jumps, lea) reads identically to x86-64. We will flag the 32-bit-specific lines as we hit them.The function has this shape. Read it top to bottom:
asm1:push ebpmov ebp, esp ; prologue: ebp now anchors the framecmp DWORD PTR [ebp+0x8], 0x3b9 ; compare the argument to 0x3b9jg part_a ; if arg > 0x3b9, jump to part_acmp DWORD PTR [ebp+0x8], 0x342 ; compare the argument to 0x342jne part_b ; if arg != 0x342, jump to part_bmov eax, DWORD PTR [ebp+0x8]add eax, 0x60 ; (this path: arg + 0x60)jmp part_donepart_a:mov eax, DWORD PTR [ebp+0x8]sub eax, 0x12 ; (this path: arg - 0x12)jmp part_donepart_b:mov eax, DWORD PTR [ebp+0x8]add eax, 0x3 ; (this path: arg + 3)part_done:pop ebpret ; return eax
Now trace it with the actual input. Our argument is 0x345, which is 837 in decimal. Keep a running note of two things: where execution is, and what is in eax.
| Step | Instruction | What happens with arg = 0x345 |
|---|---|---|
| 1 | push ebp / mov ebp, esp | Prologue. The argument now sits at [ebp+0x8]. |
| 2 | cmp [ebp+0x8], 0x3b9 | Compare 0x345 to 0x3b9. 0x345 < 0x3b9. |
| 3 | jg part_a | arg is NOT greater, so the jump is not taken. Fall through. |
| 4 | cmp [ebp+0x8], 0x342 | Compare 0x345 to 0x342. They are not equal. |
| 5 | jne part_b | arg != 0x342 is true, so we jump to part_b. |
| 6 | mov eax, [ebp+0x8] | eax = 0x345. |
| 7 | add eax, 0x3 | eax = 0x345 + 3 = 0x348. |
| 8 | pop ebp / ret | Return eax = 0x348. |
The answer is 0x348. We never ran the program. We followed two comparisons, took the branch each one dictated, did one addition, and read eax at ret. That is the entire method, and it scales to functions ten times this size: the work is always "which branch, then what is in the return register."
gcc -m32 -no-pie -o test test.S then call the function from Python with ctypes.CDLL('./test').asm1(0x345) and print it in hex. If your trace and the CPU disagree, the CPU is right, and finding where they diverge is the most efficient way to learn. The GDB CTF Guide shows how to single-step the same function and watch the flags change after each cmp.The later challenges scale the same skill up. picoCTF 2019 asm2 adds a loop (a backward conditional jump), so you trace the loop body until the exit condition fires instead of just falling through. picoCTF 2019 asm3 works with multiple arguments and sub-register widths, where you must respect that al and ax are windows onto eax. picoCTF 2019 asm4 walks a string and computes an offset, so you track a pointer and an accumulator together. None of them need a new concept. They need the same patient trace.
Why does the same code look different? AT&T vs Intel syntax
You will meet the same instruction written two ways depending on the tool, and the difference is purely cosmetic, but it reverses the operand order, so it must be known cold. The two syntaxes are Intel (what Ghidra, most Windows tools, and the snippets in this post use) and AT&T (the default for objdump and many Linux GDB setups). Same machine code, different printing.
| Trait | Intel | AT&T |
|---|---|---|
| Operand order | mov dst, src | mov src, dst (reversed) |
| Registers | rax | %rax (percent prefix) |
| Immediates | 5 | $5 (dollar prefix) |
| Memory | [rbp-0x4] | -0x4(%rbp) |
| Size suffix | DWORD PTR [rax] | movl (%rax) (l suffix) |
The same line, both ways:
Intel: mov eax, DWORD PTR [ebp+0x8] ; eax = the argumentAT&T: movl 0x8(%ebp), %eax ; same thing, source on the left
The single rule that saves you: in Intel, the destination is on the left (it reads like dst = src); in AT&T, the destination is on the right. If you only remember one thing about AT&T, remember that the operands are flipped. The picoCTF asm .S files are usually AT&T because they come from gcc -S. To make GDB show you Intel instead, run set disassembly-flavor intel, and to make objdump do it, add -M intel.
# objdump in Intel syntaxobjdump -d -M intel ./binary | less# GDB in Intel syntax (put this in ~/.gdbinit to make it permanent)set disassembly-flavor intel
Quick reference
The reading method, every time
- Find the prologue. Skip it. Note where locals and arguments live (32-bit: arg at [ebp+0x8]; 64-bit: args in rdi, rsi, rdx, ...).
- Walk one instruction at a time, tracking each register's value on paper.
- At every
cmportest, decide whether the following jump is taken, and follow the path that is actually executed. - At
ret, readrax(oreax). That is the return value. - Verify by compiling and running, or by single-stepping in GDB.
Instruction cheat sheet
mov dst, src ; copy src into dst (Intel: dst on the left)lea dst, [expr] ; dst = the ADDRESS expr, not the memory at itadd/sub/imul ; dst = dst (+ - *) srcxor rax, rax ; rax = 0 (the zeroing idiom)test rax, rax ; sets Zero Flag if rax == 0cmp a, b ; compute a - b, keep only the flagspush/pop r ; move r onto/off the stack (rsp -= 8 / rsp += 8)call f / ret ; push return address & jump / pop it & returnje/jne ; jump if equal / not equaljg/jl jge/jle ; signed greater / less (and -or-equal)ja/jb ; unsigned above / belowjmp ; jump unconditionally
Calling convention at a glance
System V x86-64 args: rdi, rsi, rdx, rcx, r8, r9 (then the stack)Return value: raxsyscall: number in rax; args rdi, rsi, rdx, r10, r8, r932-bit cdecl args: all on the stack, read at [ebp+0x8], [ebp+0xc], ...32-bit return value: eax
That is the whole job. Assembly looked like a wall because you were trying to read it like a paragraph; read it like a checklist, one line at a time with your registers on paper, and the wall turns into a recipe you can follow with your eyes closed.
Reading assembly is not a talent, it is a checklist you run one line at a time until the return register tells you the answer.