7 min read

PicoCTF transformation

PicoCTF transformation

Breaking into things isn't always as easy as it sounds. In this post, we're going to take a peek at an "easy-ish" reverse engineering challenge that requires some knowledge of bit manipulation.

Why would you ever need to know about bit manipulation? If you wanted to encrypt or encode a message to someone, bit manipulation is a great way to start off.

The challenge

The challenge we're working on today is located at https://play.picoctf.org/practice/challenge/104?page=1. The description of the challenge reads as follows:

I wonder what this really is... enc

Hm... that's a little confusing, right? Well... let's take a look inside the enc file that's included in challenge. It seems to be a bunch of Chinese characters.

λ cat enc
灩捯䍔䙻ㄶ形楴獟楮獴㌴摟潦弸弰㑣〷㘰摽%

What about the python code included?

''.join([chr((ord(flag[i]) << 8) + ord(flag[i + 1])) for i in range(0, len(flag), 2)])

To understand what's happening here, let's break it up into smaller pieces and rewrite it so it's not a crazy list comprehension (that can be hard to read):

s = ""
for i in range(0, len(flag), 2):
  # First character
  code1 = ord(flag[i]) << 8
  char1 = chr(code1)
  # Second character
  code2 = code1 + ord(flag[i + 1])
  char2 = chr(code)
  s += char1 + char2

The original version feels complex to read, so breaking it up into English, the enc file is "encrypted-ish" as the following (in English):

For every 2 characters, shift the first unicode character to the left by 8 bits and append that to the string and then take the 2nd unicode number and add it to the first unicode number and append that character to a string. Phew... that's less easy to understand, eh?

What's going on with that << 8 thing anyway?

Short introduction to bit manipulation

Although it sounds scary, bit manipulation is kind of how computers do most things (kinda), but it's an important component of how computers do most of the things they do.

Starting off with the decimal system, like numbers 1, 10, 134, etc. is the decimal system. The decimal system is a base-10 numbering system.

There are other numbering systems as well. The one we're going to look at for this post is the base-2 system: binary

Let's look at the basics of binary. Using one byte (four bits -- or just four 1s or 0s), let's look at some values:

The number 0 is just four 0s in a row: 0000. The number 1 is three 0s and a 1: 0001. Okay, so what's two?

0010 => 2

How about 3?

0011 => 3

Two more examples, what's 4 and 5?

0100 => 4
0101 => 5

Looking at the decimal system, our usual base-10 system, we can see the same pattern where the placement of the number adds up to the total. Each placement in decimal is an order of 10. So 10010^0 is in the 1s spot, 10110^1 is in the 10s place, 10210^2 is in the 100s place, and so on and so forth:

242 => 200 + 40 + 2

In binary, the placement of each number is an order of the number 2 (base-2). Each placement in the binary system is set at it's 2^0 characteristic.

0001 => 1 + 0 + 0 + 0
0011 => 1 + 2 + 0 + 0
1010 => 0 + 2 + 0 + 8

The first number (in the same direction as binary) is 202^0 => 1. The second is 212^1 => 2, the third is 222^2 => 4. With each successive number, we're just adding to the total.

For instance, how do we express 9?

8 + 1, so we need a 1 in the 8 spot and a 1 in the 1 place: 1001.

Okay, but how about that challenge?

Now that we know what's happening with the underlying numbers, we need to know how computers express the ascii digits (English characters) and unicode digits (way cooler characters, like cyrillic and mandarin characters... even emojis).

  • ASCII characters is a way for computers to store characters as numbers (and it's an old standard)
  • Unicode is a way of storing numbers that express all sorts of different characters

When you see a website that has emojis, it's supporting unicode

This matters because in our challenge, we're taking characters in the Mandarine characters and transforming them from unicode to ascii. In python, we have two different operators that will help us (and are in the hint above):

  • ord() takes a unicode character and returns it's unicode value
  • chr() takes a number and returns the ascii value

In python, converting from ASCII character strings into numbers we can use the ord() function:

python -c "print(ord('A'))" # 65
python -c "print(ord('a'))" # 97

To convert from unicode characters to character values, we have to do a little more work, but we'll use a combination of the ord() function and the chr() function:

λ python -c "print(ord('灩'))" # 28777
λ python -c "print(chr('灩'))" # ERROR TypeError: an integer is required (got type str)

We'll need to somehow convert this unicode character into an ascii-readable character. Using our bit-manipulation technique from above, let's shift the unicode values by 8 over to the left (shrinking the value, because unicode characters are MUCH larger than ascii characters) and then call chr() on the resulting value:

λ python -c "print(ord('灩') >> 8)" # 112
λ python -c "print(chr(ord('灩') >> 8))" # p

Sweet. Now we can take each character in our string and reverse-engineer the original encoding (that's the code we translated above).

Using python, we'll read the file into a variable we'll call flag:

#!/usr/bin/env python3

# Read the file
with open('enc', 'r') as fp:
    flag = fp.read()

Now for each character, we'll convert the first value into an ascii representation:

#!/usr/bin/env python3

# Read the file
with open('enc', 'r') as fp:
    flag = fp.read()

s = ""
for i in range(0, len(flag)):
    f = ord(flag[i]) >> 8
    s += chr(f)

print(s)

Hm... when we run this, the output is too short and doesn't follow the flag pattern starting with picoCTF{}.

λ ./blog.py
pcCF1_isis3do__406d

Ah we forgot that in the original encoding, the second character is added to the ordinal value to create the character encoding. We need to reverse this process and subtract the second character.

#!/usr/bin/env python3

# Read the file
with open('enc', 'r') as fp:
    flag = fp.read()

s = ""
for i in range(0, len(flag)):
    f = ord(flag[i]) >> 8 # first character ascii value
    s += chr(f)
    t = ((ord(flag[i])) - ((ord(flag[i]) >> 8))) # second character ascii value
    s += chr(t)

print(s)

On the line where we're calculating the second value, we're going to get an error. The error we'll get occurs because we're still try to convert a weird (or negative) unicode value. Instead, we need to subtract an ascii value. We'll just need to shift our character back by 8 (unicode to ascii).

#!/usr/bin/env python3

# Read the file
with open('enc', 'r') as fp:
    flag = fp.read()

s = ""
for i in range(0, len(flag)):
    f = ord(flag[i]) >> 8 # first character ascii value
    s += chr(f)
    t = ((ord(flag[i])) - ((ord(flag[i]) >> 8) << 8)) # secondcharacter ascii value
    s += chr(t)

print(s)

Let's run our reverse engineering script:

λ ./blog.py
picoCTF{16_bits_inst34d_of_8_XXXXXXXX} # scrubbed answer

Awesome. We got the flag. That's 20 more points.

Mind sparing a moment?

Is this post confusing? Did I make a mistake? Let me know if you have any feedback and suggestions!