# HTB 2021 Uni CTF Quals - Mechanical madness

Medium hardware

This challenge consists of an assembly program and simulated CPU with the goal of running the program on the CPU. Since we had no way to assemble the program, we first had to write an assembler for it. To help with the reversing we were also given a short program and its bytecode.

``````:start
movl ax, 10
:sub1
movl bx, 1
sub  bx
cmp  ax,  ax
movl bx,  :sub1
jnz
rst
``````
``````v2.0 raw
10000a 100101 10100 130000 100101 e0000 40000
``````

The CPU and some surrounding hardware were implemented as a circuit in logisim evolution. ## Reversing

To start, we tried to decode the instruction by looking at the example program and assembly. Comparing two similar lines gave us an initial idea of the structure.

``````bytecode      assembly       binary
------------------------------------------------------
10000a        movl ax, 10    000100000000000000001010
100101        movl bx, 1     000100000000000100000001

ins  reg  data
--------------
10   00   0a
10   10   01
``````

To learn more about the device, we focused on the other file we got and followed the flow of data inside logisim.

After reading a block of 21 bits from memory (3 bits are discarded), the data is passed to a piece of logic that breaks apart the instruction in 3 parts like we assumed. While we initially thought the other parts to be reg and data, not all instructions use this format. So, we started to think of them as the first and second argument, although they are more commonly called operands when talking about assembly. Then the parts are fed into the CU and we took a look inside. Here we see the IR data line which contains the byescode for the instruction itself. If we look at one of the the multiplexers on the left and follow the trace, we can find out what IR value decodes to what instruction. In this case we can see that the IR value for JMP is 5. Some of the instructions are grouped, so we have to look inside to find out the real encoding. The arithmetic operations are grouped in the ALU. If we go to the ALU view, we see that the instructions are again selected using a multiplexer and that for example, the IR value for SUB is 2. Conditional jumps are also grouped, but selected using a slightly different method, these are using AND and NOT gates. The NOT gates signal a 0 here, so reading up you can see that JG would be encoded as `01000` or 8. We decode the sub-instructions inside the `ALU` and `CJMP` until we cover all instruction used in the program we needed to assemble.

Finally, we used the information we gathered to write a python script that could assemble the instructions. We loaded the resulting bytecode into memory and waited a few minutes without any visible result, then we changed the emulation speed from the default 16Hz to 2048 KHz and slowly got the flag! ## The assembler and assembly

The assembler first prepossesses the assembly to replace labels with absolute addresses and standardizes the operands. After this the assembler only had to put the binary values corresponding to the instructions and operands together and format it like a logisim memory image.

``````registers = {"ax":"0", "bx":"1", "cx":"2", "dx":"3"}

instructions = {'ADD':  '00000', 'SUB':  '00001',
'MUL':  '00010', 'CLR':  '00011',
'RST':  '00100', 'JMP':  '00101',
'LJMP': '00110', 'JLP':  '00111',
'JG':   '01000', 'JGE':  '01001',
'JL':   '01010', 'JLE':  '01011',
'JE':   '01100', 'JZ':   '01101',
'JNZ':  '01110', 'MOV':  '01111',
'MOVL': '10000', 'CALL': '10001',
'RET':  '10010', 'CMP':  '10011',
'PUSH': '10100', 'POP':  '10101',
'DIV':  '10110', 'MMIV': '10111',
'MMOV': '11000', '???':  '11001',
'MSK':  '11010', 'MSKB': '11011'}

def preproces(asm):
p_asm = []
labels = {}
#Step 1: Put instructions in a list and interpret labels
for line in asm:
if line.startswith(":"):
labels[line] = str(len(p_asm))
else:
p_asm.append(line.replace(",", "").strip().split())
#Step 2: Resolve labels and registers
for line in range(len(p_asm)):
for part in range(1, len(p_asm[line])):
if p_asm[line][part].startswith(":"):
for label in labels.keys():
if label in p_asm[line][part]:
p_asm[line][part] = p_asm[line][part].replace(label, labels[label])
elif p_asm[line][part] in registers:
p_asm[line][part] = registers[p_asm[line][part]]
#Step 2: Convert hex values and resolve offsets
for line in range(len(p_asm)):
for part in range(1, len(p_asm[line])):
if not p_asm[line][part].isdigit():
if p_asm[line][part].startswith("0x"):
p_asm[line][part] =  str(int(p_asm[line][part],16))
elif "+" in p_asm[line][part]:
sum = p_asm[line][part].split("+")
p_asm[line][part] = str(int(sum) + int(sum))
elif "-" in p_asm[line][part]:
sum = p_asm[line][part].split("-")
p_asm[line][part] = str(int(sum) + int(sum))
return p_asm

def assemble(asm):
bytecode = []
for line in range(len(asm)):
#Get bits for the instruction
inst = instructions[asm[line].upper()].zfill(8)
for part in range(1, len(asm[line])):
#Get bits for the arguments
inst += "{0:b}".format(int(asm[line][part])).zfill(8)
#Add the hex value to the list, filling unused arguments with zeros
bytecode.append(hex(int(inst.ljust(24, '0'), 2))[2:])
return bytecode

def main():
with open("program.asm", "r") as infile:
p_asm = preproces(asm)
#Print bytecode in logisim evolution memory dump format
print("v2.0 raw")
print(" ".join(assemble(p_asm)))
main()
``````

For people interested in the assembly we needed to assemble, here is what we used as input:

``````:start
movl cx, 10
clr
movl bx,  1
movl dx,  0
mmiv 0x0,  dx
movl bx,  :sub4
call bx,  0
mmov ax, 0x0
movl bx,  :sub5
call bx,  0
:sub1
movl bx,  1
movl dx,  0
push dx,  0
movl bx,  :sub5
call bx,  0
movl bx,  1
sub  bx,  0
cmp  ax, ax
movl bx, :sub1
jnz
movl bx,  :sub4
call bx,  0
:sub2
movl bx,  0
mmiv 0x1, bx
mmiv 0x2, bx
:sub3
pop  ax,  0
movl bx,  1
movl dx,  0
movl bx,  :sub5
call bx,  0
mmov bx, 0x1
msk
mmiv 0x1, bx
mmov bx, 0x2
mskb
mmiv 0x2, bx
movl ax,  0xff
cmp  bx, ax
movl bx, :sub3
jl
movl bx,  0
mmov dx, 0x1
movl cx,  1
movl cx,  0
movl bx,  :sub2
jmp  bx,  0
:sub4
movl ax,  0x05
movl bx,  :sub5
call bx,  0
movl bx,  1
sub  bx,  0
cmp  ax,  ax
movl bx,  :sub4+1
jnz
ret
:sub5
movl cx,  4
movl cx,  0
ret
``````

And this is the output generated by the program:

``````v2.0 raw
10020a 30000 100101 100300 170003 10012e 110100 180000 100137 110100 100101 100300 140300 100137 110100 100101 10100 130000 10010a e0000 10012e 110100 100100 170101 170201 150000 100101 100300 100137 110100 180101 1a0000 170101 180102 1b0000 170201 1000ff 130100 100119 a0000 100100 180301 100201 100200 100116 50100 100005 100137 110100 100101 10100 130000 10012f e0000 120000 100204 100200 120000

``````