Description
I wonder what this really is... The file enc displays as seemingly random Unicode characters.
Setup
Download the enc file.
Solution
- Step 1Identify the encoding schemeOpen enc in a text editor or print it -- it contains Unicode characters with code points well above 127. This is a clue that multiple ASCII bytes were packed together. The encoding is: char = (ord(a) << 8) + ord(b), combining two consecutive ASCII bytes into one Unicode code point.
Learn more
Standard ASCII characters have code points from 0 to 127, fitting in 7 bits. A Unicode code point, however, can represent values up to 1,114,111. The encoding used here treats two ASCII bytes as the high byte and low byte of a 16-bit integer, producing a single Unicode character -- effectively compressing two characters into one.
This is not a standard encoding like UTF-16; it is a custom scheme. Looking at the code points of the characters in enc and noticing they are all in the range 0x2000–0x7e7e (printable ASCII pairs) is the main clue.
- Step 2Reverse the encodingFor each Unicode character c in enc, extract the high byte with (ord(c) >> 8) and the low byte with (ord(c) & 0xff). Convert each byte back to a character and concatenate -- this reconstructs the original ASCII string containing the flag.python3 -c " enc = open('enc').read().strip() print(''.join([chr(ord(c) >> 8) + chr(ord(c) & 0xff) for c in enc])) "
Learn more
The shift-right operator
>> 8drops the lower 8 bits, leaving the original high byte. The bitwise AND& 0xffmasks to only the lower 8 bits, recovering the low byte. These are standard bit manipulation operations for unpacking multi-byte values -- used extensively in binary format parsing and network protocol decoding.The challenge title "Transformation" refers to the encoding transformation applied to the plaintext. Reversing a transformation requires understanding the original operation -- here a straightforward bit-packing scheme with no randomness or key.
Flag
picoCTF{...}
The encoding pairs consecutive ASCII bytes into a single Unicode code point -- reverse by extracting the high byte (>>8) and low byte (&0xff) separately.