In the first step, I acquire the binary and verify that it is the binary I want to analyze. In this case, I downloaded the binary off of the Honeynet Project web site and checked its MD5 hash against the one published on the web page.
gizmo% md5sum 0x90.exe 7daba3c46a14107fc59e865d654fefe9 0x90.exe |
The ".exe" file extension suggests that this is a Windows executable. We'll find out for sure in the next step.
In step two, I scan the file without running it to see what information is available about the binary. I start out with the Unix "file" command to determine that the file is executable and find out what platform it runs on. We see that it appears to be a Windows program.
gizmo% file 0x90.exe 0x90.exe: MS-DOS executable (EXE), OS/2 or MS Windows |
After that, I use the strings command to look for text strings throughout the binary. In this case, I only found a few strings.
DarthPE NicolasB msvcrt.dll KERNEL32.dll printf GetTickCount GetCommandLineA ExitProcess You really thought you would find strings eh? ;-) Scan of the month coded by Nicolas Brulez / Digital River |
Normally, one would find more strings in a file. I used the PEiD utility to find out if the file had been packed in anyway. By compressing an executable with a tool like UPX, strings wouldn't be seen immediately in a file. When I ran the PEiD tool, it showed that the binary was packed but didn't say with what tool.
The first string, "DarthPE", was found in the header of the executable. Because this binary is a PE executable, this string could indicate that "DarthPE" is the name of the tool used to pack the file. The second string, "NicolasB", appears to be the name of the author of the file.
The next two strings refer to DLL's that are used by the program. Following those are four function names that the program was linked to. We could assume that the program checks the time, outputs some data, and looks for specific command line options.
The next string is just a taunt showing the existence of just a few strings.
Next is the "trace" step. In this step, the program is run in a safe environment, such as an un-networked VMWare installation. I then run tools such as TDIMon, FileMon, RegMon, RegShot, and ProcExplorer to watch for certain behavior from the program as it runs. TDIMon will show any network access made by the program. FileMon will show any file accesses made by the program. RegMon will show any changes or accesses to the Windows Registry by the program, and Process Explorer will show if new tasks are started by this program. None of these programs show anything of interest.
The fourth step is the "poke" step. Here, I run the program again and try providing different options to it on the command line and look for changes in behavior.
C:\temp\>0x90 C:\temp\>0x90 -h Please Authenticate! |
If I ran the program with no options, it didn't produce any output. If I ran it with an option, it seemed to always ask for authentication. I wasn't able to find a way to authenticate.
Next is the "debug" step. In this step, I run the program in a debugger and examine the instruction flow. I also watch for the use of data that was found in any prior step. One problem that I ran into was that two of the debuggers wouldn't work because the data in the header of the executable had been modified to bad values. OllyDbg wouldn't load the file, and gdb crashed.
Using windbg, I was able to step through a lot of the code. The file is almost 300KB in length which makes it hard to step through one instruction at a time. At regular intervals in the code, the program checks for whether it's being debugged. One of these is by generating an exception, and the exception handler it uses checks the time. In stepping through this with the debugger, one can step over this test and branch and be on the way. Next, the program would use an "xor" instruction in a loop to decode a large chunk of the data in memory. The program would then step into this code. This decode-and-check-time cycle was very long. I was able to step over 180 layers down before the code seemed to go into a loop and I wasn't able to find the next function.
Finally, in the interact step, if I was able to figure out all the behavior of the program, I would prove this behavior by interacting with it. In the case of a program with networking, we could develop a client or server for the program to interact with. In the case of this program, I would expect to be able to provide command line options to receive whatever data the program is hiding.
a) Exceptions for checking time
b) xor'd code
c) checking for 'cc'
a) Exceptions for checking time
In a great number of locations in the binary, an instruction is generated that will generate an exception. For example:
xor %ebx,%ebx pop [%ebx] |
These two instructions set the register %ebx to zero and then dereference memory at that location. Because the task doesn't have a page of memory at address 0, this generates an exception. The exception handler is inline in the code and checks for the number of CPU cycles since the last time the check was done. If this time is too great, the program takes an alternate path.
b) xor'd codeThis binary has a large number of layers to it, and each one is encoded with xor. The code will execute a loop which will go through a chunk of memory and decode it. The code path will then continue into that decoded memory.
c) checking for 'cc'At several places throughout the code, it checks for certain bytes being equal to the hex value 'CC'. This value is significant as it represents the opcode inserted by debuggers for a break. Debuggers do this in order to pause exceution at specified points in the code or to single-step through the code.