Scan of the Month 33

Author : Kostya Kortchinsky, kostya [dot] kortchinsky [at] renater [dot] fr

Table of contents

It is highly recommended to read the analysis before going to the answers to the challenge questions.

Analysis
    Introduction
    Rough information on the file
    Going into the details
    Automation of the tasks
    The "p-code" engine
    Breaking the password protection
    Conclusion
Answers

Analysis

Introduction

The purpose of the 33rd scan of the month is to analyze a binary, recovered on a Windows XP system, qualified as "heavily armored". This document will try to uncover and describe the various techniques used to protect the binary from investigation, as well as the behavior of the binary itself. Several methods will be applied in order to achieve that goal, starting from a 'clueless' point of view, then going into the details.

The operating system chosen for this analysis is an up to date Windows XP Pro SP2 system, with the Cygwin environment and a bunch of other freeware tools.

Rough information on the file

After having downloaded the binary from the Honeynet Project websites, a few steps are followed in order to get some details about the file provided. First of all, an integrity check is performed in order to ensure that the downloaded binary is correct :
--
$ md5sum 0x90.exe
7daba3c46a14107fc59e865d654fefe9 *0x90.exe
--
A file format analyzer will inform us on the the nature of the file :
--
$ file 0x90.exe
0x90.exe: MS-DOS executable (EXE), OS/2 or MS Windows
--
Thus we are now sure that the file is indeed a Windows executable (Portable Executable [1]), and should gather some more information about its structure and attributes. To fulfill this, I will use a PE editor [2] :
Figure 1 : LordPE output
PE Editor
Subsystem
Characteristics
Section Table

The binary is designed for the console, the location of its entrypoint (precisely 0x2000, and in section DATA) leads us to think that the program was modified in some way. E0000020 is IMAGE_SCN_CNT_CODE | IMAGE_SCN_MEM_EXECUTE | IMAGE_SCN_MEM_READ | IMAGE_SCN_MEM_WRITE, meaning the section contains code, can be read, written and executed. C0000040 is IMAGE_SCN_CNT_INITIALIZED_DATA | IMAGE_SCN_MEM_READ | IMAGE_SCN_MEM_WRITE meaning the section contains initialized data, can be read and written : it is the typical layout of a packed executable.

Going on with some routine checks, we will be looking into the binary thanks to a strings scanner and an hexadecimal editor [3]
--
$ strings 0x90.exe
MZP
This program must be run under Win32
DarthPE
CODE
DATA
NicolasB
.idata
[...]
msvcrt.dll
KERNEL32.dll
printf
GetTickCount
GetCommandLineA
ExitProcess
You really thought you would find strings eh? ;-)
Scan of the month coded by Nicolas Brulez / Digital River
--
We won't learn much more with that, except that it indeed looks crippled and has only a few visible linked DLLs and imports (described in details on figure 2 thanks to [4]).
Figure 2 : Imports
Imports Viewer
Imports Viewer

We take our chance with a packer detector [4], but it doesn't recognize any common packer (note the entropy check, quite reliable) :
Figure 3 : PEiD output
Extra Information

Launching the program, we get a message inviting us to authenticate :
--
$ ./0x90
Please Authenticate!
--
Trying several parameters wont make any changes.

Going into the details

We now have some general indications on what to expect from the binary, it is high time to have a closer look at the execution flow. In order to do that, I will use a debugger [5]. For the first try, I unchecked all the "Ignore (pass to program) following exceptions" checkboxes in the 'Exceptions' tab of the 'Debugging options' dialog of OllyDbg, so that I will then break whenever an exception occurs. When opening the given file (shortcut F3), an error message box appears stating :
Figure 4 : Error
Error

Well, it doesn't really matter since we know the entry point location of the binary (0xDE2000), I just have to press Ok, set a breakpoint at the address and run the program (shortcut F9), I will land at the very beginning of the program.

We now open the 'Run trace' window, and trace into the program (shortcut ctrl-F11).
Figure 5 : Run trace
Run trace

After only 175 instructions (see figure 5), an exception occurs due to the following set of commands :
--
00DE22E3   33DB             XOR EBX,EBX
00DE22E5   8F03             POP DWORD PTR DS:[EBX]
--
Since the address 0x00000000 isn't in a writeable address space, the write error was likely to happen and was done on purpose. A few commands before, the author has taken care of installing his own exception handler thanks to those lines :
--
00DE22D7   33C0             XOR EAX,EAX
00DE22D9   64:FF30          PUSH DWORD PTR FS:[EAX]
00DE22DC   64:8920          MOV DWORD PTR FS:[EAX],ESP
--
When a debugger is attached to a process, it will catch an occuring exception, thus preventing the execution flow to follow the course envisaged by the author. It is a very common anti-debugging trick that can be dealt with quickly in OllyDbg : check the box 'Memory access violation' (see figure 6) in the dialog described earlier and all these kind of exceptions will now be passed directly to the program.
Figure 6 : Debugging options
Debugging
options

What is a bit less common is the presence of a couple of instructions :
--
00DE22DF   0FA2             CPUID
00DE22E1   0F31             RDTSC
--
The Intel manual says about RDTSC, "This instruction loads the current value of the processor’s time-stamp counter into the EDX:EAX registers. The time-stamp counter is contained in a 64-bit MSR. The high-order 32 bits of the MSR are loaded into the EDX register, and the low-order 32 bits are loaded into the EAX register. The processor increments the time-stamp counter MSR every clock cycle and resets it to 0 whenever the processor is reset." The RDTSC instruction is often used in a benchmarking context, but can also be used in software protection to check for the time elapsed between two portions of code. If it is too high, the program is likely being debugged. Let's dig into that. In the exception handling function for the current context, we find :
--
00DE22B9   0FA2             CPUID
00DE22BB   0F31             RDTSC
00DE22BD   2B0424           SUB EAX,DWORD PTR SS:[ESP]
00DE22C0   83C4 04          ADD ESP,4
00DE22C3   3D 00000E00      CMP EAX,0E0000
00DE22C8   77 03            JA SHORT 0x90.00DE22CD
--
So if there is more than 0xE0000 clock cycles between the first RDTSC and the second one in the exception handler, something happens : just try, the program crashes. In order to avoid this, set a condition to stop trace (shortcut ctrl-T), check 'Command is one of' and add CMP EAX,0E0000 in the edit box (see figure 7).
Figure 7 : Condition to pause run trace
Condition to
pause run trace

Now let's run trace again, when it breaks at the command, we change EAX to 0 and get rid of the test :)

This really lame method (trace, break, change eax, trace) will do for a little while, but it soon becomes really annoying.

Automation of the tasks

Another great thing about OllyDbg is the scripting engine that has been developped as a plugin for the debugger [6]. It will help a lot in order to do some basic automation of the tasks needed to trace through the program. For a first try, we will set our usual trace condition, and program a simple loop that will change EAX to 0 and run a trace again. On a convenient purpose, we will add the breakpoint on the entry point of the program :
-- 0x90.osc --
bp 0DE2000 // break at EP
run
bc eip     // clear the breakpoint
loop:
ti         // trace into - I am not sure how to use ticnd for my purpose
mov eax, 0 // reset EAX
jmp loop   // and loop
--
After running that script, OllyDbg does what it is supposed to do, but then traces a lot without breaking. How come ? Let's pause the trace (shortcut F12), and check in the trace log what takes so long. It takes a second to see that we landed into a XORing loop with a pretty high number of iterations, that corresponds to a layer of decryption of the binary.
--
00E2635B   3008             XOR BYTE PTR DS:[EAX],CL
00E2635D   40               INC EAX
00E2635E   49               DEC ECX
00E2635F   85C9             TEST ECX,ECX
00E26361  ^75 F8            JNZ SHORT 0x90.00E2635B
--
The number of commands that OllyDbg can process when running a trace is quite low [7], which explains why it takes so long. Nonetheless, we can quickly get the address of the last comparison before the XOR : 0xDE4BDC.

In order to study what we traced through, we dump the trace generated by OllyDbg to a file (up to 0xDE4BDC). Basically, what we have is a set of valid instructions lost in a flood of crap. Fortunately we can identify the valid blocks of code thanks to the pair of commands POPAD and PUSHAD. Thanks to a small perl script, we can extract the core instructions of this part. Basically, what is done here is a check on the 4 first bytes of 3 addresses for the presence of the byte 0xCC. This byte - opcode for the INT3 instruction - is used by debuggers to set a breakpoint on a command. These addresses (0x7C812C8D, 0x77C1186A, 0x7C81CAA2) are respectively the ones of the APIs GetCommandLineA, printf and ExitProcess. Once again, this is a common anti-debugging trick : one can't put a breakpoint on the API or the program will crash. An easy way to counter that is to put the breakpoint more than 4 bytes away from the beginning of the function, what we will do from now :) (including the GetTickCount one)

To deal with our previous issue (the large loop) and go on with our analysis, a solution would be to set a breakpoint after the JNZ, run, and trace after you reached the breakpoint and you get directly to the next CMP EAX,0E0000 at 0xE263B2, but our automation scheme falls appart. Moreover, looking at the XOR mechanism, we see that it decrypts the code to be executed in a few cycles, another anti-reverse technique preventing from doing whatever we want with the process.

Going on manually a bit, we realize that this process is repeated again and again - going backward into the code, and have to do some automation again. The good thing is that when we reach a CMP EAX,0E0000, the next one has already been decrypted by the XOR loop. So our new method will imply searching the memory for the next sequence 3D00000E00 (being CMP EAX,0E0000), setting a breakpoint on the address, running, set EAX to 0 when we reach the breakpoint, clear the breakpoint and loop. Since we can't search backward, we will start 400 bytes before actual EIP and search forward. We will also adapt this method to the first part of the code :
-- 0x90.osc --
bp 0DE2000
run
bc eip

var count
mov count, 0
part1:
inc count
findop eip, #3D00000E00#
bp $RESULT
run
bc eip
mov eax, 0
cmp eip, 0DE4BDC
jb part1
log count

bp 0E263B2
run
bc eip
mov eax, 0

var address
mov count, 0
part2:
inc count
mov address, eip
sub address, 400
findop address, #3D00000E00#
cmp $RESULT, eip
je end2
bp $RESULT
run
bc eip
mov eax, 0
jmp part2
end2:
log count
--
The last address matching our criterias is 0xE1C4B7. In addition we now know that there are 8 blocks for the first part, and 175 for the second one (meaning at least 175 XOR decryption layers). Tracing from there (with the usual condition), we find two more comparisons at 0xDE8692 and 0xDE86FF for which with have to clear EAX again.

The "p-code" engine

Starting at offset 0xDE9C9E lies what I call the "p-code" engine. Quite similarly to its visual basic and pascal counterparts, this engine can perform various medium level tasks by reading parameters from a sequence of bytes. This sequence, beginning at address 0xE1BA3D, identifies the task to perfom on two bytes and possibly a parameter (or more) on one to four bytes. Precisely, the first byte is an index in a table of tables at 0xE1B991. The second byte is the index of the address of the task in the table previously referenced

Here is a commented example of the execution of the first sequence of 7 bytes of p-code, 0x02 0x02 0x0A 0x15 0x02 0x51 0x03 :
; First index is 0x02
00DE9C9E Main     MOVZX EAX,BYTE PTR DS:[ESI]               ; EAX=00000002
[...]
; Table is 0xE1B9F5
00DE9F5D Main     MOV EDI,DWORD PTR DS:[EAX*4+E1B991]       ; EDI=00E1B9F5
[...]
; Second index is 0x02
00DEA213 Main     MOVZX EAX,BYTE PTR DS:[ESI+1]             ; EAX=00000002
[...]
; Task address is 0xDFDCA3 - this code is located in an exception handler
00DEA4D2 Main     MOV EAX,DWORD PTR DS:[ECX+B0]             ; EAX=00000002
00DEA4D8 Main     MOV EDI,DWORD PTR DS:[ECX+9C]             ; EDI=00E1B9F5
00DEA4DE Main     MOV EAX,DWORD PTR DS:[EDI+EAX*4]          ; EAX=00DFDCA3
[...]
; First argument is 0x5102150A - 4 bytes - this code is located in the task
00DFDCAD Main     MOV EAX,DWORD PTR DS:[ESI+2]              ; EAX=5102150A
[...]
; Some calculation is done here
00DFDF5E Main     ADD EAX,AEFDED04                          ; EAX=0000020E
[...]
; Second argument is 0x03 - 1 byte
00DFE20E Main     MOVZX EDI,BYTE PTR DS:[ESI+6]             ; EDI=00000003
[...]
; Result is saved in a table
00DFE4DC Main     MOV DWORD PTR DS:[EDI*4+DE8736],EAX
; Command is 7 byte long, go to next command in sequence
00DFE4E3 Main     ADD ESI,7                                 ; ESI=00E1BA44
The aim of the first part of the sequence of bytes is to decrypt the last part of that same sequence, which will be executed later.

Here is the detail of the function table :

Index0123456789ABCD
00E1B9A9000DEA4FA00DEC36A00E173B700E1B0C100E14CC1
00E1B9BD100DF1A7B00DF442F00DF59D200DF74EF00E181A600E1A05B00E1226D00E1412800E016ED00E0535300DFB94800DF7FCB00DF904800DFA319
00E1B9F5200DEE16F00DEE94700DFDCA300E09AB500E0D4BC00E1147700E15B1100E1B92B00E0E81600E0C6C900E0B5E700E03D6A
00E1BA25300DFEFDC
00E1BA29400E034FD00E0798900E0F8A7
00E1BA35500E1360400E15554

Anyway the trace generated by this p-code engine will be dealt with our script as easily :). Going on tracing through the code, we reach the breakpoint we put on GetCommandLineA which happen to be called from 0xDEF94D.

Breaking the password protection

From now on, we can trace through the program and use our script to extract the core instructions for analysis. It is quite obvious that some kind of treatment is applied to the command line to ensure that it verifies various expectations : it is an evolved password check. For the reader to follow, a full trace of this part has been included (post GetCommandLineA). Let's detail the checks.

First of all, the process searches a space character (0x20) in the command line thanks to the command REPNE SCAS BYTE PTR ES:[EDI] at offset 0xE1658B. The part of the command line located after the space will be refered to hereafter as the password. Then the four first bytes of the password are loaded as a 32 bit number, I will call it X, that undergoes several arithmetical operations to give a number Y:
X + 0x1D9BDC45 - 0xAD45DFE2 + 0x68656C6C - 0x41776169 + 0x69747320 + 0x64726976 - 0x6E757473 - 0x65683F21 = Y
that can be summerize to :
X + 0x914D3068 = Y
Concurrently, another number Z is calculated :
0x5CC80E31 + 0x74519745 + 0xDEADBEEF - 0x17854165 + 0x73686F77 - 0x206E6F20 + 0x6D657263 - 0x79212121 = 0xDF807499 = Z 
At offset 0xDFF51B (this script will lead you there), a check verifies that Y is equal to Z meaning that X + 0x914D3068 = 0xDF807499, and so X = 0x4E334431. The first four characters of the password are then '1D3N'.

If the first 4 bytes of the password pass the verification, several other checks will be carried out on the next 4 bytes - that we will name C, D, E and F. The first test (at address 0xDFC98A) verifies that :
C + E - 0x5A = 0x4E
so the first condition on the second part of the password is E = 0xA8 - C. We can update our password accordingly and continue tracing. Then a value X is generated according to the following formula :
(D + F + 1 - 0x4E) ^ 0x47 = X
This byte X is in fact the first byte of a p-code instruction, the second byte being 0x01. According to our table, this byte is such as 0 <= X <= 5. Furthermore, we know that the command will not have any parameter since it is located at the end of the sequence before the 'Please Authenticate!' string, and that it must change ESI for execution flow to continue. After a few tries, we find that for X = 5, the task executes a POP ESI (at 0xE1555E) without taking any parameter and then perfectly fits our needs :) The second condition that has to be matched is F = (0x47 ^ 0x05) + 0x4E - 1 - D, in short F = 0x8F - D.

The previous instruction introduces a last condition since the p-code sequence pointer (ESI) now points to a memory location (OxE1BB54) looking like [Y Z 0x03 0x00 0x02 0x44 ...] with Y = C ^ 0x47 and Z = D ^ 0x42. We now have to find C and D so that this part of the sequence will contain valid p-code opcodes. This file should be compiled and run to generate the passwords for all the couples (X, Y). One solution would be to try them all, another one would be to look a bit closer at the p-code sequence and see that the length of the opcode is apparently 3 with a 1 byte parameter 0x03, and look into the trace for an opcode looking like that. We observe that the couples (2, 3) and (2, 4) verify this (as well as the previously unused one (2, 9)).

We can deduce a bunch of valid passwords :
1D3NEAcN
1D3NEFcI
1D3NEKcD
...

and give them to the program :
--
$ ./0x90 1D3NEKcD
Welcome...
Exploit for it doesn't matter 1.x Courtesy of Nicolas Brulez
--
When all checks are passed, the program does an ExitProcess with 0 as error code, usually meaning everything went well. We are done :)

Conclusion

The binary provided for this Scan of the Month challenge illustrates a good number of ways to protect a program from analysis, reverse-engineering and execution through a password check. Some of them are already widely deployed in the software protection environment (encryption of the binary, anti-debugger checks, obfuscation of the code), others are not that common (p-code engine). Anyway, there is an obvious need for protection of sensitive binaries in the exploit community and this kind of technique is likely to be used widely in a near future since simple executable packers have become pretty useless.

Despite the heavy protection of the binary, the reverse-engineering of the code and breaking of the password has merely taken a few hours during a week-end.

References

[1] Portable Executable and Object File Format Specification, http://www.microsoft.com/hwdev/download/hardware/PECOFF.doc
[2] LordPE Deluxe, http://y0da.cjb.net/ (freeware)
[3] PSPad, http://www.pspad.com/ (freeware)
[4] PEiD, http://peid.has.it/ (freeware)
[5] OllyDbg, http://home.t-online.de/home/Ollydbg/ (freeware)
[6] OllyScript, http://ollyscript.apsvans.com/ (freeware)
[7] OllyDbg Run trace tutorial, http://home.t-online.de/home/Ollydbg/Tut_rtr.htm

Answers

  1. Identify and explain any techniques in the binary that protect it from being analyzed or reverse engineered.

    Several protection techniques have been uncovered and detailed in the analysis, I will summerize them here :
  2. Something uncommon has been used to protect the code from beeing reverse engineered, can you identificate what it is and how it works?

    From my point of view (which may not be shared by everybody), the uncommon feature used to protect the code from being reverse-engineered is the p-code engine. This engine offers a limited set of medium level tasks (37 according to what I found), and reads its opcodes from a byte sequence partially crypted, that will decrypt itself. Each opcode is composed of at least 2 bytes that allow to lookup the function in a bidimensionnal table, followed optionnaly by parameters (one or more) on 1 to 4 bytes. Each task will set the pointer on the next opcode itself, allowing loops and jumps. This kind of design makes the analysis quite painful - very much like its Visual Basic equivalent. The tasks themselves look a lot like the code previously studied, with obfuscation and exception triggering.

  3. Provide a means to "quickly" analyse this uncommon feature.

    Using OllyDbg, its tracing capability, and a small perl script to strip off garbage portions of code, this feature can be reduced to a linear set of "core" instructions that only takes a few dozens minutes to analyze. Another possibility, put I think a lot more demanding would be to program a short emulator by going through each task exhaustively, dumping the opcode sequence and running it through the emulator. It is to be underlined that many parameters depend on the address space of the process, thus turning the task even harder. Personnaly I dropped the idea quickly, and didn't even bother to go through all the unused tasks in the bidimensionnal table.

  4. Which tools are the most suited for analysing such binaries, and why?

    Because of the various tricks used by the developper, I have decided to proscribe disassembler and other various deadlisting analysis tools to focus only on live study thanks to a debugger, OllyDbg. The use of a scripting plugin, OllyScript, is essential to automate the tracing of the process in memory, due to the various repetitions of code sequence and the layer layout. Some PE information gathering and programming tools might also be needed. The method chosen and the tools used have proven quite effective in that case, and in my opinion "live" study tools should be prefered, except for gathering basic information on the file.

  5. Identify the purpose (fictitious or not) of the binary.

    The binary "emulates" a password-protected fictitious malware : protected from disassembly since the actual code of the exploit is ecnrypted and cannot be retrieved "immediately", protected from debugging thanks to various live checks, protected from execution because of a password needed to be launched. Thanks to the analysis, we can affirm that it is in fact a vulnerability exploitation tool : that kind of files is usually worth being heavily armored since it may contain information on an undisclosed vulnerability that the hacker doesn't want to be turned public if the file is recovered.

  6. What is the binary waiting from the user? Please detail how you found it.

    As guessed after a simple execution of the program, the binary is waiting for a password from the user to run. It is retrieved by the process thanks to the API GetCommandLineA and undergoes a series of reversible checks in 2 parts. The first part deals with the first 4 characters of the password treated as a 32-bit number (dword), the second part deals with the next 4 characters treated as 4 8-bit numbers (byte). The first half of the password can be determined uniquely, whereas some p-code opcodes are generated from the second half as explained in the analysis, introducing several possibilities. Finally, a working password (I insist on the fact that it is not unique) can be calculated and used to launch the exploit :

    1D3NEKcD

Bonus question : What techniques or methods can you think of that would make the binary harder to reverse engineer?

Among the things that come to my mind regarding the possibilities to harden the protection are the following :