Scan of the Month 33 - November 2004 - 0x90.exe Analysis

Table of Contents:

Intro

The purpose of November 2004 SOTM is to analyze a heavily armored binary.

Initial analysis

The file 0x90.exe was downloaded and transferred to the "test machine" were all the reversing took place. The first step of the analysis was to simply look at the binary itself using simple tools like "strings" or a hexeditor. These are the meaningful strings found in the binary

msvcrt.dll
KERNEL32.dll
printf
GetTickCount
GetCommandLineA
ExitProcess
You really thought you would find strings eh? ;-)
Scan of the month coded by Nicolas Brulez / Digital River

Seems that the 0x90.exe author have a lot of sense of humor ^_^''

The absence of strings, the one found are from the import table, is an evidence that the target is probably packed and/or crypted. Using file analysis tools like PeID/GetTyp don't seem to confirm my hypothesis. Maybe the target use a custom packer/encrypter?? To confirm this hypothesis I tried to open the file using a hex editor, Biew, but it shutdown showing a message "Not enough memory. PE initialization". The same happen trying to open the file using the debugger, Ollydbg and reading from Ida Pro board seems that the same happen trying to open 0x90 using Ida.

Both error however showed where was the problem, the PE Header.

Patching the header

To continue the analysis and load the target into our favorite tools we must normalize the PE header. Opening the target in a hex editor (one that doesn't know about PE structure) we see that some fields of the PE header are filled with "uncommon" values; values that confuses common tools as debugger and disassembler. To fully normalize the header many fields should be changed but many of them can be ignored as the most troublesome for the tool and the only one we'll change are:

After applying the above patches the file can be easily opened by Biew and Ollydbg. Opening the file in the disassembler mode of Biew confirmed what we thought; the executable is crypted/packed. This can be readily seen from the amount of "strange code" present, jump into other instruction, fake prefixes, privileged instruction and btw the execution start in a section called 'DATA'.

In the debugger

Loaded the file into OllyDbg we are faced with a classic.

00DE2000 > 60               PUSHAD
00DE2001   E8 00000000      CALL 0x90.00DE2006
00DE2006   5D               POP EBP
00DE2007   8BC5             MOV EAX,EBP
00DE2009   83E8 06          SUB EAX,6
00DE200C   81ED 0620DE00    SUB EBP,0x90.00DE2006

The above code load into the EAX register the code's starting address (0xde2000 in this case) and into the EBP register the difference "delta" from the "normal" expected address. This code is commonly found in virus/packer/crypter stub code that is relocated and need to reference data not placed at absolute addresses.

Stepping after we could see a lot of meaningless code, useless register assignment, jump and the like; that code is just "junk code" used to distract and hinder the analysis. The presence of "junk code" make really difficult to use static analysis tool like disassembler and make "boring" working with a debugger since you have to distinguish between "valid" instruction and junk one. The author however made a big mistake; the code look as follow:

VALID INSTRUCTION BLOCK
PUSHA
.
.
.
A LOT OF JUNK CODE
.
.
.
POPA
NEXT VALID INSTRUCTION BLOCK
PUSHA
.
.
.
A LOT OF JUNK CODE
.
.
.
POPA

The pusha/popa pair save and restore the register status on the stack and are needed to preserve the value of the register between two valid instructions block. This make the "junk code" creation a lot easier for the author since the algorithm used have a simple mean to make "safe/nop code" but make up an evident signature that can be used to distinguish between valid and junk code. So to find the next meaningful instruction one have only to step till the register value are restored by a popa. This seem simple but actually is still a lot of work since the "junk code" blocks are long. We need same kind of automation....

Automating!

OllyDgb is a great debugger; it's free (actually shareware but the author, Oleh Yuschuk, claim is only for copyright reason :) it's stable and have a nice set of feature. It's also have a plugin interface that can be used to extend its feature set. One of the most useful plugin is OllyScript written by SHaG that can be used to automatize the work with Ollydbg using script written in some kind of asmlike language.

Actually what we want to do is really simple; placed on the beginning "pusha" of the "junk code" block, step till the corresponding "popa" is executed and than fill all the block with nop. The following script do the trick.

lin.txt
var cop
var start
var count

eoe exit
eob exit

top:

mov cop, esp
mov start, eip

sti

l:
cmp cop, esp
je endtrash
sti
jmp l

endtrash:
mov count, eip
sub count, start
fill start, count, 90
ret

exit:
ret

I think it's pretty self explanatory; with eip on the "pusha", running the script actually only step (sti) through the junk code till "popa" is executed; the execution of popa is detected via the value of ESP, if ESP is equal to the value previous the execution of "pusha" mean we executed a "popa" and we must stop. The code block executed is then filled (fill) by 0x90 (nop instruction).

Using this script we can move through the code fastly and identify the true behavior of the programm; as a bonus the nopped code can be dumped and inserted in the exe file using a hexeditor eliminating the need to step through junk code more than once and making the executable cleaner and more easily analyzable using for example a disassembler. (a little note we could dump the full exe image as OllyDbg allow that but doing so would dump also some modification that the programm make to "the data"). With the document is available a fully cleaned 0x90.exe file.

0x90 behavior - the first part

After some junk code snipped we meet the first trick:

00DE2289   60               PUSHAD
00DE228A   E8 48000000      CALL 0x90.00DE22D7
00DE228F   8B4C24 0C        MOV ECX,DWORD PTR SS:[ESP+C]
00DE2293   8381 B8000000 02 ADD DWORD PTR DS:[ECX+B8],2
00DE229A   33C0             XOR EAX,EAX
00DE229C   8941 04          MOV DWORD PTR DS:[ECX+4],EAX
00DE229F   8941 08          MOV DWORD PTR DS:[ECX+8],EAX
00DE22A2   8941 0C          MOV DWORD PTR DS:[ECX+C],EAX
00DE22A5   8941 10          MOV DWORD PTR DS:[ECX+10],EAX
00DE22A8   8941 14          MOV DWORD PTR DS:[ECX+14],EAX
00DE22AB   C741 18 55010000 MOV DWORD PTR DS:[ECX+18],155
00DE22B2   8B81 B0000000    MOV EAX,DWORD PTR DS:[ECX+B0]
00DE22B8   50               PUSH EAX
00DE22B9   0FA2             CPUID
00DE22BB   0F31             RDTSC
00DE22BD   2B0424           SUB EAX,DWORD PTR SS:[ESP]
00DE22C0   83C4 04          ADD ESP,4
00DE22C3   3D 00000E00      CMP EAX,0E0000
00DE22C8   77 03            JA SHORT 0x90.00DE22CD
00DE22CA   33C0             XOR EAX,EAX
00DE22CC   C3               RETN
00DE22CD   8381 B8000000 63 ADD DWORD PTR DS:[ECX+B8],63
00DE22D4   33C0             XOR EAX,EAX
00DE22D6   C3               RETN
00DE22D7   33C0             XOR EAX,EAX
00DE22D9   64:FF30          PUSH DWORD PTR FS:[EAX]
00DE22DC   64:8920          MOV DWORD PTR FS:[EAX],ESP
00DE22DF   0FA2             CPUID
00DE22E1   0F31             RDTSC
00DE22E3   33DB             XOR EBX,EBX
00DE22E5   8F03             POP DWORD PTR DS:[EBX]
00DE22E7   64:67:8F06 0000  POP DWORD PTR FS:[0]
00DE22ED   83C4 04          ADD ESP,4
00DE22F0   61               POPAD

During the analysis we'll encounter this code many times. It's simply a check for debugger presence. Before explaining the above code meaning we need to introduce what is called SEH. SEH is the acronym of Structured Exception Handling and it's a way an application running under Windows can cope with code exception (as invalid memory reference, invalid opcode, ecc...). The idea of exception handling is that the application installs a callback function and then, if an exception occurs, the system will call the function to let the application deal with the exception. The system provide to the handler the state of the thread at the exception's moment so it can diagnose the cause of the exception and eventually resume the execution or gracefully abort the program. To install a per thread SEH handler the thread need to modify a pointer in the Thread Information Block (TIB) located at FS:[0] with the address of a ERR structure containing the address of the exception handler and the address of the next ERR structure (SEH can be chained to behave in a way similar to the C++ try/catch block, if an exception cannot be handled by the current handler it's passed to the "containing one").

Coming back to our code it install a SEH handler:

00DE22D9   64:FF30          PUSH DWORD PTR FS:[EAX]
00DE22DC   64:8920          MOV DWORD PTR FS:[EAX],ESP

using a ERR structure builded on the stack by the "CALL 0x90.00DE22D7" and the "PUSH DWORD PTR FS:[EAX]"; the handler address is the "CALL 0x90.00DE22D7" return address so it's first instruction is the "MOV ECX,DWORD PTR SS:[ESP+C]" located after the call. Then it call the RDTSC instruction (Read time-stamp counter into EDX:EAX) and deliberately cause an exception trying to pop into ds:0. The system than call the SEH handler that load into ECX the address of the CONTEXT structure, passed by the system and containing info on the thread state at the time of exception, and modify some value. In detail it add 2 to the EIP value (offset 0xb8 of the CONTEX structure) to skip over the faulting instruction ("POP DWORD PTR DS:[EBX]") it then reset the DRx register (offset 0x4-0x18 in the CONTEX structure) in the attempt to disable some hardware breakpoint, it then access the EAX register (offset 0xb0 of the CONTEX structure) and, after calling again RDTSC, calculate di difference in clock cicle regards to before the exception; if that difference is below 0xe0000 it resume the execution, returning with EAX=0, else before returning it modify again the EIP. This is done to detect the presence of a debugger; in fact running the code under a debugger is more clock cicle intensive than running the code "alone" since the debugger is called by the os in the event of an exception and usually handle the exception for the user (eg: stopping the execution showing the exception address). The value 0xe0000 is of course empirical since in a multitasking environment like Windows there is no way to tell how many clock cicle will pass, in fact some times the above code fail even when running outside a debugger. The above code also contain a bug, the instruction "ADD DWORD PTR DS:[ECX+B8],63" would never alter the EIP value in the CONTEXT structure since the CPUID istruction doesn't preserve the ECX value.

All this just to say that the above code could be skipped without influencing the programm execution.

After a couple more of the above trick the code check for breakpoint in the first 4 byte of some of the imported API:

00DE2950   B8 0D10DE00      MOV EAX,
00DE2C2C   8B40 02          MOV EAX,DWORD PTR DS:[EAX+2]
00DE2EF1   8B00             MOV EAX,DWORD PTR DS:[EAX]
00DE2F5B   8BF8             MOV EDI,EAX
00DE2F5D   B9 04000000      MOV ECX,4
00DE31FA   B8 60060000      MOV EAX,660
00DE31FF   C1E8 03          SHR EAX,3
00DE3202   F2:AE            REPNE SCAS BYTE PTR ES:[EDI]
00DE3204   85C9             TEST ECX,ECX
00DE3206   74 04            JE SHORT 0x90.00DE320C
00DE3208   0F31             RDTSC
00DE320A   50               PUSH EAX
00DE320B   C3               RETN

The address of the above disassembly are not contiguous due the the elimination of the junk code. This snippet of code show another trick the programm use to make analysis more difficult. Immediate constant are frequently obfuscated as 0xcc in this case 0x660 >> 3. This is obviously done to camouflage the "valid" instruction in the junk code.

At this point we understood that this part of the code is only a stub used to decode some other code in fact from now on, using value in some table, part of the 0x90 code are decoded using a xor operation and the execution control is then passed to them.

Stepping through this part of the code is really boring since there are many layer (175); but since the stub are all the same is easy, using OllyScript, to write a script to step through them all:

sc.txt:
var count
var cbase
var boop
var tmpbp
var tmpseh
var i

gmi eip, CODEBASE
mov cbase, $RESULT
log cbase
var csize
gmi eip, CODESIZE
mov csize, $RESULT
log csize

mov count, 0
eoe expt
chk:
run
jmp chk

expt:
inc i
log i
cmp i, af
jne asd
ret
asd:
mov count, esp
mov tmpseh, [count]
cmp tmpseh, 12e000
jb nval
cmp tmpseh, 12fffc
ja nval
add count, 4
sbp:
mov tmpbp, [count]
bp tmpbp
esto
find eip, #3d00000e00#
go $RESULT
sti
mov !cf, 1
bc tmpbp
jmp chk

nval:
add count, 24
jmp sbp
jmp chk

This script simply execute the stubs handling the SEH trick seen above. At the end we find the "real" 0x90 and we can start to analyze it's true behavior.

0x90 behavior - the inner part?

The first thing I expected reaching the "inner" 0x90 was to see "clean" code and no trick but, surprisingly, the code showed the same "style" of the first part; a lot of junk code and SEH trick. Using the script above was pretty easy to step through the code and the first thing I noted was that the code seemed table driven. Value stored in a table are loaded and seem to drive the execution flow, operand loaded and the like; stepping through this kind of code was pretty useless since the code executed seemed unrelated and was a lot difficult to understand. If we want to analyze the behavior of 0x90 we must understand how those tables are structured and used.

Evil Has No Boundaries !

Analyzing the code accessing the tables showed that the "inner" 0x90 is actually some kind of script language interpreter or some kind of little virtual CPU emulator. After a brief initialization (0x90 writes in the memory where are stored the virtual register value the string "Evil Has No Boundaries !") the code starts to load value from a table located at 0x00e1ba3d. This table (CODE) is the script executed by 0x90 or, if you prefer, the code executed by the virtual CPU:

CODE
00E1BA3D  02 02 0A 15 02 51 03 00 01 2E A7 11 53 00 03 66  
00E1BA4D  02 02 4F 13 02 51 02 01 03 02 00 00 02 04 00 02  
00E1BA5D  03 03 04 01 16 00 00 00 02 02 2D 21 CA AD 01 02  
00E1BA6D  01 2F DE 36 02 02 06 20 55 02 04 02 2E 01 00 00  
00E1BA7D  02 08 00 02 01 04 05 45 DC 9B 1D 01 04 04 45 97  
00E1BA8D  51 74 01 05 04 E2 DF 45 AD 01 04 04 EF BE AD DE  
00E1BA9D  01 04 05 6C 6C 65 68 01 05 03 65 41 85 17 01 05  
00E1BAAD  04 69 61 77 41 01 04 04 77 6F 68 73 01 04 05 20  
00E1BABD  73 74 69 01 05 03 20 6F 6E 20 01 04 05 76 69 72  
00E1BACD  64 01 04 04 63 72 65 6D 01 05 04 73 74 75 6E 01  
00E1BADD  05 03 21 21 21 79 01 05 04 21 3F 68 65 01 06 07  
00E1BAED  FF FF FF DF 00 02 45 00 02 46 03 00 3D CE E4 00  
00E1BAFD  02 05 03 04 02 2E 01 00 00 02 04 00 01 04 03 02  
00E1BB0D  00 00 00 02 04 00 02 0B 00 01 00 02 46 00 03 64  
00E1BB1D  00 02 47 00 01 D6 22 08 53 00 03 66 01 04 03 48  
00E1BB2D  85 09 00 02 03 00 01 03 02 01 00 00 03 66 02 0A  
00E1BB3D  00 02 01 04 03 02 00 00 00 02 0A 00 01 01 09 02  
00E1BB4D  05 05 00 86 BB E1 00 47 42 03 00 02 44 00 01 F3  
00E1BB5D  A8 11 53 02 01 23 DE 36 02 00 04 4D 04 00 D7 DC  
00E1BB6D  F6 F2 00 00 11 54 19 37 00 01 DC A8 11 53 02 01  
00E1BB7D  23 DE 36 02 00 04 4D 02 00 02 02 48 13 02 51 01  
00E1BB8D  02 04 01 02 04 01 01 04 04 05 00 00 00 02 03 01  
00E1BB9D  01 05 03 04 00 00 00 01 05 04 5A 00 00 00 01 0A  
00E1BBAD  04 02 04 01 31 01 00 00 02 03 00 02 0A 00 02 01  
00E1BBBD  04 03 02 00 00 00 02 0A 00 01 01 09 02 05 02 04  
00E1BBCD  02 01 05 04 4E 00 00 00 00 01 2D CA 10 53 00 03  
00E1BBDD  66 01 04 03 AC DE 00 00 02 04 00 01 03 02 00 00  
00E1BBED  02 02 45 13 02 51 03 00 01 F3 A8 11 53 00 03 66  
00E1BBFD  02 04 02 01 03 02 00 00 02 04 00 02 03 03 04 01  
00E1BC0D  C2 01 00 00 47

A bit of info on this "virtual architecture". The CPU can address 6 register (which i call r0-r5) whose value are stored by 0x90 in the memory between 0xde8736 and 0xde874a, the CPU also have a "zero flag" located at 0xde874a. The instruction of that virtual CPU are of variable length the first two byte indicate the operation the remaining byte are operand. Instruction are grouped in six groups. The first byte of the opcode indicate the group the second the instruction in the group. The third instruction of the first group (opcode 01 03) have a byte that specify the address type used by the instruction (similar to the modrm byte of x86). 0x90 maintain a table (M) for the groups and 6 table (G0-G5) for each of the instruction groups also use a table (F) for the different 0103 instruction addressing mode. These are the table used with their address inside 0x90:

M
00E1B991  A9 B9 E1 00 BD B9 E1 00 F5 B9 E1 00 25 BA E1 00
00E1B9A1  29 BA E1 00 35 BA E1 00
G0
00E1B9A9  FA A4 DE 00 6A C3 DE 00 B7 73 E1 00 C1 B0 E1 00
00E1B9B9  C1 4C E1 00
G1
00E1B9BD  7B 1A DF 00 2F 44 DF 00 D2 59 DF 00 EF 74 DF 00  
00E1B9CD  A6 81 E1 00 5B A0 E1 00 6D 22 E1 00 28 41 E1 00  
00E1B9DD  ED 16 E0 00 53 53 E0 00 48 B9 DF 00
G2
00E1B9F5  6F E1 DE 00 47 E9 DE 00 A3 DC DF 00 B5 9A E0 00 
00E1BA05  BC D4 E0 00 77 14 E1 00 11 5B E1 00 2B B9 E1 00  
00E1BA15  16 E8 E0 00 C9 C6 E0 00 E7 B5 E0 00 6A 3D E0 00
G3
00E1BA25  DC EF DF 00
G4
00E1BA29  FD 34 E0 00 89 79 E0 00 A7 F8 E0 00
G5
00E1BA35  04 36 E1 00 54 55 E1 00
F
00E1B9E9  CB 7F DF 00 48 90 DF 00 19 A3 DF 00

Analyzing each of the function addressed by the above table, always using the script to step through the junk code, I learned what each instruction was doing. Following is a table showing the all instruction used by the code above; some are absent since, surprisingly, unused even if their code is in 0x90.

Instruction Meaning
00 00 dw dw dw dw push dwdwdwdw ^ 0x3719553F
00 01 dw dw dw dw push dwdwdwdw+0xADD01337
00 02 rn push rn xor 0x47
00 03 rn pop rn xor 0x66
00 04 rn add esp, rn xor 0x45
01 03 rn 02 xor [rn], dwdwdwdw
01 03 rn 01 xor word [rni], rn
01 03 rn 00 xor byte [rni], rn
01 04 rn dw dw dw dw add rn-3, dwdwdwdw
01 05 rn dw dw dw dw sub rn-2, dwdwdwdw
01 06 rn dw dw dw dw and rn-5, dwdwdwdw
01 09 rn1 rn2 add rn1-1, rn2-3
01 0a rn1 rn2 cmp rn1-2, rn2-1
02 00 exit
02 01 dw dw dw dw apicall dwdwdwdw + 0xFEA731DE eax in r0
02 02 dw dw dw dw rn mov rn, dwdwdwdw + 0xAEFDED04
02 03 rn dec rn
02 04 rn inc rn
02 05 rn dw dw dw xor rn, rn
02 06 by wd wd r0 = memchr( r0, by, wdwd )
02 07 int 3
02 08 rn1 rn2 mov rn2, dword [rn1]
02 09 rn bswap rn
02 0a rn1 rn2 mov rn2, byte [rn1]
02 0b rn1 rn2 mov rn2, word [rn1]
03 00 dw dw dw dw if stack1 == stack2 goto dwdwdwdw - 0x31337
04 00 dw dw dw dw jmp dwdwdwdw + 0x0DEADEAD
04 01 dw dw dw dw jne relative dw
04 02 dw dw dw dw je relative dw
05 00 dw dw dw dw call dw dw dw dw
05 01 ret

Using this knowledge it's easy now to understand what the code is trying to do, I've written a small "disassembler" for this CPU; it's output on the code above is:

0x00> mov r3, 0x20e
0x07> push 0xe1ba65
0x0d> pop r0
0x10> mov r2, 0x0
0x17> xor  byte [r0], r2
0x1c> inc r0
0x1f> dec r3
0x22> jne 0x17
0x28> mov r1, 0x5cc80e31
0x2f> r0 = apicall GetCommandLineA
0x35> memchr( r0, 0x20, 0x255 )
0x3a> je 0x132
0x40> mov r2, dword [r0]
0x44> add  r2, 0x1d9bdc45
0x4b> add  r1, 0x74519745
0x52> sub  r2, 0xad45dfe2
0x59> add  r1, 0xdeadbeef
0x60> add  r2, 0x68656c6c
0x67> sub  r1, 0x17854165
0x6e> sub  r2, 0x41776169
0x75> add  r1, 0x73686f77
0x7c> add  r2, 0x69747320
0x83> sub  r1, 0x206e6f20
0x8a> add  r2, 0x64726976
0x91> add  r1, 0x6d657263
0x98> sub  r2, 0x6e757473
0x9f> sub  r1, 0x79212121
0xa6> sub  r2, 0x65683f21
0xad> and  r2, 0xdfffffff
0xb4> push r2
0xb7> push r1
0xba> cmpandgoto 0xc9
0xc0> xor r3, r3
0xc3> je 0x132
0xc9> inc r0
0xcc> add  r0, 0x2
0xd3> inc r0
0xd6> mov r1, word [r0]
0xda> push r1
0xdd> pop r2
0xe0> push r0
0xe3> push 0xd8360d
0xe9> pop r0
0xec> add  r0, 0x98548
0xf3> dec r0
0xf6> xor  word [r0], r2
0xfb> pop r0
0xfe> mov r2, byte [r0]
0x102> add  r0, 0x2
0x109> mov r1, byte [r0]
0x10d> add  r2, r1
0x111> call 0x149
0x117> dec r3
0x11a> push r3
0x11d> push 0xe1bc2a
0x123> r0 = apicall printf
0x129> add esp, 8
0x12c> jmp 0xe42a43ed
0x132> push 0x12e
0x138> push 0xe1bc13
0x13e> r0 = apicall printf
0x144> add esp, 8
0x147> exit
0x149> mov r1, 0x4c
0x150> inc r1
0x153> inc r1
0x156> add  r1, 0x5
0x15d> dec r1
0x160> sub  r1, 0x4
0x167> sub  r2, 0x5a
0x16e> cmp  r2, r1
0x172> jne 0x132
0x178> dec r0
0x17b> mov r2, byte [r0]
0x17f> add  r0, 0x2
0x186> mov r1, byte [r0]
0x18a> add  r2, r1
0x18e> inc r2
0x191> sub  r2, 0x4e
0x198> push 0xe0dd64
0x19e> pop r0
0x1a1> add  r0, 0xdeac
0x1a8> inc r0
0x1ab> xor  byte [r0], r2
0x1b0> mov r3, 0x49
0x1b7> push 0xe1bc2a
0x1bd> pop r0
0x1c0> inc r2
0x1c3> xor  byte [r0], r2
0x1c8> inc r0
0x1cb> dec r3
0x1ce> jne 0x1c3
0x1d4> ret

The above "disassembly" is fixed to show the correct API name; these name can be obtained by simply breakpoint on the apicall function (we know it's address from the tables). Saying the truth the above disassembly refer to the "final version" of the code in fact as you can see from the first few instructions the "virtual CPU" decode, with a xor, part of the instruction it's going to execute.

So what's the behavior of 0x90? It access the command line (via GetCommandLineA api) and find the first argument passed to the programm (not 100% true it find the first space and use what's after, it so fail if in the path of 0x90 are present space) it add/subtract some value from the first 4 character of the first argument and compare the value obtained with one calculated adding and subtracting values from a constant, if they are not equal it print the string "Please Authenticate!" and exit. So we know 0x90 is waiting some kind of string/password on the command line and we can calculate the first four character of that password inverting the sub/address operation made on the constant. I've written a small programm that do the above (a calculator is actually enough) the first 4 character of the password are '1D3N'. The "virtual CPU" then xor the word at 0xe1bb54 with the argument 5-6 character; since the word at 0xe1bb54 is part of the code that the virtual CPU will execute this operation prevent "correct" execution if the user supplied a wrong argument to 0x90. At moment we don't have enough information to calculate the value of the 5-6 character. It then add the 5 and 7 character then add the result with some constant and compare it against another constant, exit if the they are different. The execution continue adding the 6 and 8 character then the result is added with some constant and the resulting byte is put in xor with the last but one byte of the CODE table and, incremented by 1, with the byte between 0xe1bc2a-0xe1bc71.

Now we have enough information to infer the value of the 5-8 character of the password the program is expecting on the command line. One thing to note is that it never directly compare the password against know value and it use the character password to decode part of it's own data and code; this way it's impossibile to find the correct password in memory or patch the programm to continue the execution even if the password is wrong.

Since it xor the sum of the 6 and 8 character, added with some constant, with the last but one character of the CODE table and since the last instruction of that table must be a "ret" (as that part of code is reached using a call and it's the only way the programm can continue execution) we know the expected value of the sum. Similar consideration can be taken in account trying to determine the value of the 5 and 7 character. Small snippet of code available with this document show how to determine that value.

At the end we know that 0x90 is expecting as argument the string '1D3NEAcN'. Using that string the programm decode and print the string:

Welcome...
Exploit for it doesn't matter 1.x Courtesy of Nicolas Brulez

and exit.

Conclusion

After printing that string the program exit gracefully so at the end its only purpose was to print a string if supplied with the correct "password".

Questions

1. Identify and explain any techniques in the binary that protect it from being analyzed or reverse engineered.

The binary's PE header has invalid value in many fields (most notably the value of NumberOfRvaAndSizes and the PhysicalSize of the section named Nicolas) that confuse/crash many analysis tools. Junk code is used to hinder analysis and SEH are used to detect debugger presence and to make "not easy to step" code jump. The main code is encoded by many xor encryption layer. In the code immediate constant, addresses are often obfuscated to camouflage the valid code in the junk one and to prevent static reference analysis.

2. Something uncommon has been used to protect the code from beeing reverse engineered, can you identificate what it is and how it works?

The main code is actually an interpreter or a virtual CPU emulator; the behavior of the executable is controlled by a script/virtual CPU code. A, mostly complete, analysis of the virtual CPU instruction code/architecture is available above.

3. Provide a means to "quickly" analyse this uncommon feature.

A set of script for OllyDbg/OllyScript and a small disassembler for that virtual architecture were developed to quickly analyze 0x90 behaviors.

4. Which tools are the most suited for analysing such binaries, and why?

The presence of junk code and obfuscated constant/address make the use of static analysis tool like disassembler difficult. Live analysis tools like debugger, file/api monitor is best suited for such kind of binaries. Once reversed how the virtual CPU work, dedicated tools like disassembler or emulator can be used to simplify the analysis. However once "cleaned out" of the junk code, using some script/tool for example, even interactive disassembler (PVDasm, BDasm, Ida Pro) can be useful in the analysis work.

5. Identify the purpose (fictitious or not) of the binary.

Maybe some kind of password activated backdoor/tools. Actually it only print a string :)

6. What is the binary waiting from the user? Please detail how you found it.

The binary check for the presence of password passed as argument to the program. (there is however a bug on how it identify the argument) The password is '1D3NEAcN' it print a "greeting" string if the password is correct. It was found simply reading the disassembly of the virtual CPU code.

Bonus Question: What techniques or methods can you think of that would make the binary harder to reverse engineer?

Complexity is the only way to deter reverse engineering. One of the error that made the analysis easier was the easy identifiable structure of the junk code and the repetitivity of the "trick" used. This repetitivity made easy the making of tools to automatize the analysis. I don't believe that the technique used in 0x90 will become anywhere common as making such a binaries involve quite a in deep knowledge of assembler language and x86 architecture and common malware is written in much more high level languages.

About the author

I'm Nicola Cuomo a student of Computer Science (Informatica) at Universita' Federico II di Napoli. I'm currently, still -_-, working on my thesis about the formal analysis of security protocol. I'm reachable at ncuomo(at)studenti.unina.it and you can see my poor web page at http://studenti.unina.it/~ncuomo/.

If you have read this document please forgive my quasi-english.

Acknowledgement

Wim Koomen, Thijs Bosschert and Bas Bosschert who inspired, unknowingly, the style this document use :P

www.honeynet.org and Nicolas Brulez for the funny challenge

Oleh Yuschuk for OllyDbg and SHaG for OllyScript

Tools used

Script/Tools source code

When reading the scripts and tools source code below keep in mind that they are just quick hacks made during the analysis of 0x90.