Scan 33: Analyze an Unknown Binary

Michael T. Ford
mike@rndsoftware.com
December 3rd, 2004

Contents

Analysis

In analyzing an unknown binary, I use a six-step process. The steps are:
  1. Verify
  2. Scan
  3. Trace
  4. Poke
  5. Debug
  6. Interact

1. Verify

In the first step, I acquire the binary and verify that it is the binary I want to analyze. In this case, I downloaded the binary off of the Honeynet Project web site and checked its MD5 hash against the one published on the web page.

gizmo% md5sum 0x90.exe 
7daba3c46a14107fc59e865d654fefe9  0x90.exe

The ".exe" file extension suggests that this is a Windows executable. We'll find out for sure in the next step.

2. Scan

In step two, I scan the file without running it to see what information is available about the binary. I start out with the Unix "file" command to determine that the file is executable and find out what platform it runs on. We see that it appears to be a Windows program.

gizmo% file 0x90.exe 
0x90.exe: MS-DOS executable (EXE), OS/2 or MS Windows

After that, I use the strings command to look for text strings throughout the binary. In this case, I only found a few strings.

DarthPE
NicolasB
msvcrt.dll
KERNEL32.dll
printf
GetTickCount
GetCommandLineA
ExitProcess
You really thought you would find strings eh? ;-)
Scan of the month coded by Nicolas Brulez / Digital River

Normally, one would find more strings in a file. I used the PEiD utility to find out if the file had been packed in anyway. By compressing an executable with a tool like UPX, strings wouldn't be seen immediately in a file. When I ran the PEiD tool, it showed that the binary was packed but didn't say with what tool.

The first string, "DarthPE", was found in the header of the executable. Because this binary is a PE executable, this string could indicate that "DarthPE" is the name of the tool used to pack the file. The second string, "NicolasB", appears to be the name of the author of the file.

The next two strings refer to DLL's that are used by the program. Following those are four function names that the program was linked to. We could assume that the program checks the time, outputs some data, and looks for specific command line options.

The next string is just a taunt showing the existence of just a few strings.

3. Trace

Next is the "trace" step. In this step, the program is run in a safe environment, such as an un-networked VMWare installation. I then run tools such as TDIMon, FileMon, RegMon, RegShot, and ProcExplorer to watch for certain behavior from the program as it runs. TDIMon will show any network access made by the program. FileMon will show any file accesses made by the program. RegMon will show any changes or accesses to the Windows Registry by the program, and Process Explorer will show if new tasks are started by this program. None of these programs show anything of interest.

4. Poke

The fourth step is the "poke" step. Here, I run the program again and try providing different options to it on the command line and look for changes in behavior.

C:\temp\>0x90

C:\temp\>0x90 -h
Please Authenticate!

If I ran the program with no options, it didn't produce any output. If I ran it with an option, it seemed to always ask for authentication. I wasn't able to find a way to authenticate.

5. Debug

Next is the "debug" step. In this step, I run the program in a debugger and examine the instruction flow. I also watch for the use of data that was found in any prior step. One problem that I ran into was that two of the debuggers wouldn't work because the data in the header of the executable had been modified to bad values. OllyDbg wouldn't load the file, and gdb crashed.

Using windbg, I was able to step through a lot of the code. The file is almost 300KB in length which makes it hard to step through one instruction at a time. At regular intervals in the code, the program checks for whether it's being debugged. One of these is by generating an exception, and the exception handler it uses checks the time. In stepping through this with the debugger, one can step over this test and branch and be on the way. Next, the program would use an "xor" instruction in a loop to decode a large chunk of the data in memory. The program would then step into this code. This decode-and-check-time cycle was very long. I was able to step over 180 layers down before the code seemed to go into a loop and I wasn't able to find the next function.

6. Interact

Finally, in the interact step, if I was able to figure out all the behavior of the program, I would prove this behavior by interacting with it. In the case of a program with networking, we could develop a client or server for the program to interact with. In the case of this program, I would expect to be able to provide command line options to receive whatever data the program is hiding.

Answers to questions

1. Identify and explain any techniques in the binary that protect it from being analyzed or reverse engineered.

This binary does several things to protect it from being reverse engineered. I'll list three of the items here. Two will be discussed in the next two answers.

a) Exceptions for checking time
b) xor'd code
c) checking for 'cc'

a) Exceptions for checking time

In a great number of locations in the binary, an instruction is generated that will generate an exception. For example:

  xor %ebx,%ebx
  pop [%ebx]

These two instructions set the register %ebx to zero and then dereference memory at that location. Because the task doesn't have a page of memory at address 0, this generates an exception. The exception handler is inline in the code and checks for the number of CPU cycles since the last time the check was done. If this time is too great, the program takes an alternate path.

b) xor'd code

This binary has a large number of layers to it, and each one is encoded with xor. The code will execute a loop which will go through a chunk of memory and decode it. The code path will then continue into that decoded memory.

c) checking for 'cc'

At several places throughout the code, it checks for certain bytes being equal to the hex value 'CC'. This value is significant as it represents the opcode inserted by debuggers for a break. Debuggers do this in order to pause exceution at specified points in the code or to single-step through the code.

2. Something uncommon has been used to protect the code from being reverse engineered, can you identificate what it is and how it works?

The program has been coded in such a way that disassembly may appear to be correct but not actually match the instructions that are being executed. 'jmp' instructions are used to branch forward to the "middle" of another instruction and continue from there. Because instruction decoding will now be from that middle address, the instruction which is decoded will be different than the one a disassembly showed. This appears to be used as a wrapper around the useful code and to some extent hides that code by executing a few instructions here and there amidst the "garbage" code.

3. Provide a means to "quickly" analyse this uncommon feature.

The "garbage" code which hides the useful code performs many operations on registers which would cause needed values in those locations to be lost. In order not to lose that data, the values in the registers are stored on the stack until they are needed. When they are needed, a 'popa' is done to pop all the registers off the stack. After the few useful instructions are performed, a 'pusha' is executed in order to save the values back on the stack. This makes it somewhat easier to analyze the program by searching for occurrences of 'popa' and 'pusha' and examining the instructions between their occurrences.

4. Which tools are the most suited for analysing such binaries, and why?

When examining a binary of this type, a disassembler isn't as useful because it will show a disassembly based on the size of the sequential instructions and not based on the execution flow. For this reason, the tools that are most suited to analyzing such binaries are debuggers. With a debugger, one can step through the instructions and see each instruction disassembled one at a time.

5. Identify the purpose (fictitious or not) of the binary.

>From what I could retrieve from this binary, I believe that it would be ideal in Digital Rights Management. One could store a document or even another program inside of this one. Because the binary is very difficult to reverse engineer, the risk that the contents would be available to unintended recipients would be mitigated to a greater extent.

6. What is the binary waiting from the user? Please detail how you found it.

The binary is waiting for the user to authenticate to it. I found this by running the program and providing different command line options to it. Unfortunately, I wasn't able to provide the required authentication.

References