Description
Analyze a suspicious Windows executable, identify unique strings, and craft a YARA rule that catches the sample when submitted to the remote harness.
Unzip the sample (password: picoctf). The archive contains suspicious.exe.
Triage the file before unpacking. file suspicious.exe confirms it's a PE32 executable; xxd suspicious.exe | head -1 shows the 4d 5a (MZ) magic bytes; strings suspicious.exe | grep -i upx proves it's UPX-packed.
Install strings, socat, and upx locally so you can inspect, decompress, and submit.
Important: upx -d overwrites the file in place, destroying the packed copy. Keep a backup if you want to compare packed vs unpacked strings later (or want to write rules that match the packed sample specifically).
sudo apt install socat upx binutils -yunzip suspicious.zipfile suspicious.exexxd suspicious.exe | head -1strings suspicious.exe | grep -i upxcp suspicious.exe suspicious.exe.bak # back up the packed binary firstupx -d suspicious.exestrings suspicious.exe > file.txtSolution
Walk me through it- Step 1Collect indicatorsRun
stringsbefore and after unpacking. Before: you see "UPX0", "UPX1", and the UPX stub's loader text. After: the original strings appear, including "YaraRules0x100", "NtQuery..." Windows API names, and the phrase "debugger process". These plus theMZmagic become your rule's strings.bash# Packed (before upx -d): only UPX section names and stub text are visible. $ strings suspicious.exe.bak | grep -E 'UPX|Yara|NtQuery|debugger' UPX0 UPX1 UPX! $Info: This file is packed with the UPX executable packer http://upx.sf.net $ # Unpacked (after upx -d): the original program's strings appear. $ strings suspicious.exe | grep -E 'UPX|Yara|NtQuery|debugger' YaraRules0x100 NtQueryInformationProcess IsDebuggerPresent debugger process detected # Diff the two for everything UPX hid: diff <(strings ./suspicious.exe.bak) <(strings ./suspicious.exe) | headLearn more
YARA is an open-source pattern-matching tool originally developed at VirusTotal and now maintained by the security community. It allows malware analysts to write rules that describe file characteristics (byte sequences, string patterns, and logical conditions) and apply those rules to scan files, memory dumps, or network streams. YARA rules are the lingua franca of malware detection and are used in antivirus engines, SIEM platforms, EDR products, and threat hunting tools.
The
stringsutility extracts printable character sequences (default minimum length: 4) from any binary file. It is one of the fastest first-pass analysis tools because it reveals hardcoded URLs, registry keys, API calls, error messages, and embedded data without requiring disassembly. Running it before and after unpacking a UPX-compressed executable reveals two different views: the compressed binary shows minimal strings (mostly the UPX stub), while the unpacked binary exposes the original executable's full string table.UPX (Ultimate Packer for eXecutables) is a legitimate compression tool for reducing binary size, but it is also extremely popular among malware authors for evading signature-based detection. Unpacking with
upx -drecovers the original binary. Some malware modifies the UPX headers to prevent automatic unpacking; in those cases, dynamic unpacking (running the sample in a sandbox until it decompresses itself into memory) or manual analysis with a debugger is required. The hex dumps for CTF guide covers reading PE headers inxxd. - Step 2Write the YARA ruleCombine the indicators into a rule that looks for the
MZheader plus YaraRules0x100 and either UPX or NtQuery, or alternatively the entire "debugger process" string. The two-clauseoris deliberate; explained below.bashrule Rule { strings: $mz = {4D 5A} $name = "YaraRules0x100" $packer = "UPX" $ntquery = "NtQuery" $phrase = "debugger process" wide ascii condition: ($mz and $name and ($packer or $ntquery)) or $phrase }Learn more
A YARA rule consists of three sections: meta (optional metadata like author and date), strings (pattern definitions), and condition (boolean logic combining the patterns). String patterns can be plaintext (
"text"), regular expressions (/regex/), or hex byte sequences ({4D 5A}). Thewidemodifier matches UTF-16LE (two bytes per character, common in Windows strings), andasciimatches standard single-byte encoding; using both together covers both formats.The
MZmagic bytes (0x4D 0x5A) mark the start of every Windows PE (Portable Executable) file: DLLs, EXEs, and SYS files all begin withMZ. Using it as an anchor in the condition ($mz at 0for strict offset matching, or just$mzfor presence anywhere) helps limit false positives to PE files. Combining it with application-specific strings likeYaraRules0x100makes the rule highly specific to this particular sample family.Good YARA rule design balances specificity (low false positive rate) with generality (catching all variants of a malware family). Production threat intelligence teams write rules around behavioral indicators (specific API call sequences, anti-analysis techniques like
NtQueryfor debugger detection) that persist across recompilations, rather than exact byte sequences that change with each build. Resources like YARA documentation, yarGen (automated rule generator), and CAPE sandbox help accelerate rule development from live samples.Why the compound condition
($mz and $name and ($packer or $ntquery)) or $phrase? It gives the rule two independent paths to a hit, which is what makes it robust against the harness's sample variations. The first clause matches a PE file that contains both the unique application stringYaraRules0x100and at least one of the anti-analysis indicators (UPX section names if still packed,NtQueryif unpacked). The second clause,$phrasealone, catches any file containing the literal"debugger process"regardless of structure, which fires on samples where the harness stripped theMZheader or removed the application name. Two paths means a sample only has to satisfy one of them to match, dramatically reducing false negatives compared with a single conjunctive condition. - Step 3Submit via socatSave the rule to
sample.txt(or any filename) and pipe it to the grading service with socat. If it matches all test cases, the server returns the flag.bashsocat -t60 - TCP:standard-pizzas.picoctf.net:59919 < sample.txtLearn more
socat (SOcket CAT) is a multipurpose relay tool that creates bidirectional data channels between various types of endpoints: TCP sockets, UDP, files, stdin/stdout, Unix domain sockets, SSL connections, and more. It is the swiss-army knife of network connectivity, frequently used in CTF challenges for connecting to remote services, setting up listeners, and relaying data between protocols. See networking tools for CTF for the full kit.
The
-t60flag sets a 60-second inactivity timeout, useful here because the grader runs your rule against a sample corpus and complex YARA conditions (especially ones with regex strings or manywide asciimodifiers) can take several seconds per file. Without the timeout, socat's default closes the connection long before the server finishes scoring and you never see the flag. If 60 seconds isn't enough (your rule is heavy, the server is busy, or both), bump it to-t120and resubmit; the grader doesn't penalize a longer wait. The-endpoint means stdin/stdout, so redirecting a file with< sample.txtsends the file contents as input to the TCP connection.In real threat hunting workflows, YARA rules are deployed to scanning infrastructure using tools like YARA-X (the Rust rewrite of YARA), integrated into SIEMs via Velociraptor or THOR for endpoint scanning, or uploaded to sandboxes like VirusTotal and MalwareBazaar to monitor for new samples matching the rule. Writing effective YARA rules is a core competency for malware analysts and threat intelligence teams.
Flag
picoCTF{yara_rul35_r0ckzzz_216...}
Any rule that nails at least one unique string plus the PE header works; the combination above passed all server tests.