Reverse Engineering of the Honeynet’s SOTM33 Binary

Vinay A. Mahadik

1 Introduction:

This report analyzes the “0x90.exe” binary as part of the Honeynet’s “Scan of the Month #33” challenge.

2 Tools Used:

IDA Pro version 4.7
Compuware SoftIce version 3.1
Windows XP Embedded with Enhanced Write Filter (EWF) enabled (not really essential).
APIS32 version 2.5
Hiew Hex Editor version 6.86
PE Tools version 1.5

3 Analysis:

3.1 Preliminary Inspection (Initial Hurdles):

md5sum 0x90.exe gives 7daba3c46a14107fc59e865d654fefe9.

The Unix file utility found the 0x90.exe as MS-DOS/Windows executable. Hiew shows the “MZ” and “PE” signatures present in Portable Executable (PE) headers.

Using Sysinternal’s Unicode/ASCII strings search on it found, among few other strings, the following leads:

msvcrt.dll

KERNEL32.dll

printf

GetTickCount

GetCommandLineA

ExitProcess

You really thought you would find strings eh? ;-)

Scan of the month coded by Nicolas Brulez / Digital River

All these suggest that it’s likely a Win32 executable that uses the msvcrt.dll’s printf, and Win32 API (kernel32.dll)’s GetTickCount, GetCommandLineA, and ExitProcess calls. These strings came from the idata (Imports) section towards the end of the binary, and the rest of the binary didn’t have any meaningful strings displayed implying it’s likely encoded in some way.

Next, we use the IDA Pro disassembler to analyze the binary. IDA confirms the binary as a PE executable. However, it reports a file structure error while loading the 3^rd section:

and hangs for a long time consuming large amounts (100MBytes+) of hard disk space before finally allowing any analysis of the binary:

From the log and iteratively with PE Tools’ PE Editor, it is clear that the 3^rd section has a ridiculous size. Let’s fix the binary offline before attempting IDA on it. We use PE Tools’ PE Editor to check the sections:

We spot the bad raw (disk image) size of the 3^rd section (“NicolasB”) set as 0xEFEFADFF. The Win32 loader must be either ignoring this field or processing it in a different way than IDA while loading the executable. We set the raw size of that section to match its virtual size as is the usual case for sections (from 0xEFEFADFF to 0x00001000):

Let’s load the binary again in IDA now. IDA has no trouble loading it this time. Let’s take a brief look at some of its important PE header characteristics. It’s a good idea to draw out the memory (and disk) images of the exe, say on a piece of paper, to be able to visualize the memory layout of the process and any unpacking it might be involved in. We will not attempt any fancy artwork here in the interest of time. From PE Editor or IDA’s segments view we find:

# sections = 4 (CODE, DATA, NicholasB, .idata)
Image base is 0xDE0000, Image (in memory) size is 0x49000 bytes
CODE is the ‘code’ section, Characteristics: XRW, size of 0x1000 bytes, 0xDE1000-0xDE1FFF
DATA is an initialized data section, RW, size of 0x45000, 0xDE2000-0xE26FFF
NicolasB is also an initialized data section, RW, size of 0x1000, 0xE27000-0xE27FFF
.idata is the Imports (initialized data) section, RW. The raw offset matches the previous section’s, which is suspect. IDA reports that it starts and ends at 0xE28054 and 0xE2806C respectively.

The PE headers also use crafted/unconventional values for several different header fields/flags (not mentioned here for brevity and since they do not matter from this point onwards in the analysis). These are probably ignored by the Win32 loader, but tend to mislead analysis tools such as dumpbin, IDA (in this case, the raw section size field explained above) etc. This was the author’s first attempt at a binary guarded against reversing (not plainly encoded with UPX-like routines say) and so a good portion of the initial analysis went in choosing the right tools (that didn’t crash!). We finally settle on IDA Pro 4.7 for disassembling and user-mode debugging, SoftIce 3.1 to help out IDA’s debugger occasionally, PE Editor and Hiew to edit the binary’s headers/data, & APIS32 for spying on the Win32 API/dll calls being made.

(You can safely skip this paragraph without loss on continuity). For a quick runtime analysis, we choose Windows XP Embedded (90-day evaluation version downloadable from Microsoft’s website). In the previous Honeynet SOTM32 challenge, the binary was checking for the presence of VMWare. Since at this point, we do not know much about 0x90.exe, we can only assume that it attempts to detect VM-like environments. Windows XP Embedded with EWF (Enhanced Write Filter) is our weapon of choice here. A quick Google search reveals that no-one yet (?) has suggested using it as an environment for malware analysis. So, we assume/hope that the 0x90.exe also doesn’t check for it, and perform our run-time analysis with it. Again, a big chunk of the author’s analysis time went in learning XPE compilation, and EWF management. Educational, but in hind sight, not essential for this particular binary as it doesn’t seem to behave differently on different runs(doesn’t save state on the system across reboots) etc. Anyway, EWF essentially redirects hard disks writes to a temporary memory/disk overlay. These overlays (yes, there can be multiple!) can then be committed or reversed. For example, with a memory EWF overlay enabled, the protected-disk images are restored to their original states on every reboot. EWF is just one device driver, we could use custom written ones if malware start detecting and evading EWF say (why would a malware (worm etc) avoid embedded devices anyway). Then anti-reversing would require a generic way of detecting device drivers providing disk-overlay functionalities.

For API spying, there are several tools out there based on various techniques. Some of the most common techniques are described in this Phrack 62 article. APIS32 finds the following calls when 0x90.exe is run with “AAAAAAAAA” as command line argument:

00DEF94D:GetCommandLineA()

00DEF952:GetCommandLineA = 141EE0 (0x90.exe)

00DEF94D:printf()

00DEF952:printf = 16 (0x90.exe)

00DE8731:ExitProcess(DWORD:00000000)

On the cmd console, it prints “Please Authenticate!” and exits. So it appears to be expecting a “password” as a command line argument. It’s likely a password protected binary, probably requiring crypto analysis at some point.

3.2 Analyzing the Binary:

Let’s step through the binary inside the IDA’s debugger.

3.2.1 Delta Offset Calculation:

We find that the binary begins with what’s called a “delta-offset” calculation used quite frequently by viruses:

The call instruction places the address of next instruction (0xDE2006) on the stack. “$+5” leads to loc_DE2006, where the stored address is popped into EBP. Its copied into EAX and 6 (number of instruction-bytes from entry-point (0xDE2000) to the “pop ebp” instruction) is subtracted from EAX. This will thus place the entry-point address into EAX (in this case 0xDE2000) even if the binary is not loaded at the preferred image-base of 0xDE0000. This will likely help the executable use an encoding algorithm that works on the memory image of the binary and is still independent of the loading address base. As we’ll see in the next section, this delta-offset is then saved in the dword at 0xE26441.

3.2.2 ”Pusha-Popa” Code Redundancy/Obfuscation:

A “pusha-popa” code redundancy/obfuscation trick begins immediately after this:

After copying the delta offset into EAX, all the registers are saved with pusha. The code beginning from 0xDE2013 then is redundant. It just moves stuff around, does some arithmetic, etc without any impact on the binary/code execution path. These instructions are then followed with a popa eventually where the CPU state is restored. Then the executable does some useful stuff, and again calls this pusha-popa sequence. This way few bytes of useful code are embedded between several random redundant code bytes. Clearly this is designed to frustrate someone stepping through the code (non-intelligently) in a debugger, or someone statically reversing even the redundant code.

Anti-anti-reversing: We use this simple IDC script to hide the code between two consecutive pusha-popa instructions that do not have a call instruction between them. Fortunately, this simple heuristic works throughout the binary. Here’s what we see with the redundant code removed (code from 0xDE2012 to 0xDE2288 is hidden):

We avoid looping this operation since it leads to a few instances where a jump lands right onto a hidden pusha instruction that tends to hang IDA version 4.7 for several minutes at a time. For those cases, we hide from the instruction following the pusha.

3.2.3 REP/REPE/REPNE Obfuscation:

Scattered across the binary are unconventional uses of rep/repe/repne opcodes.

The Intel IA-32 manuals claim that these can only be used for string instructions (such as movs, ins, outs etc). Their behavior when used with non-string instructions such as mov, jmp etc is not defined. It turns out that these undocumented usages do not generate an Invalid Opcode exception from the hardware (no INT 6). These operate just as if the REPx prefix were not present.

Anti-anti-reversing: We simply ignore the REPx prefixes when used for non-string operations.

3.2.4 Unrecognized/Invalid Opcode Sequences Obfuscation:

As seen above, just like the REPx opcodes, many other sequences also do not decode to any valid instruction. These still do not generate an INT 6 Invalid Opcode exception and are simply skipped by the CPU like NOPs.

Anti-anti-reversing: We simply ignore these invalid opcodes; most of these get hidden when we hide the pusha-popa sections.

3.2.5 SEH-based Anti-Debugging & Code Obfuscation:

We then run into a series of SEH based anti-debugging and code obfuscation tricks that are used periodically throughout the binary. For some background on these concepts, please study the following articles: “Under the Hood” from MSJ and this useful Usenet post on SEH/exception handling. It’s a pity that most good references we can find on these are about breaking software protection written by crackers.

3.2.5.1 Execution Redirection:

The first trick is about redirecting the code’s execution path using a forced exception. Study the following two screenshots:

& the calling code:

call seh_1 and the 1^st three instructions of seh_1 add 0xDE228F as an exception handler for any exception in seh_1. Then it forces a write to 0x0 address. This exception therefore redirects the execution to 0xDE228F. Once the handler returns (normally), execution resumes from 0xDE22E7.

Anti-anti-reversing: By knowing this trick in advance, and placing breakpoints at the right places (e.g. as shown), it is easy to handle this SEH trick during a debugging session. Both IDA and SoftIce have no problems handling these exception detours.

3.2.5.2 Erase Hardware Breakpoints/Manipulate Context:

This trick is again based on (officially) undocumented information about how Win32 thread context information is stored on the stack when it encounters an exception. Briefly, when the kernel redirects the execution of the thread to the handler, the dword at ESP+0x0C is the address of the start of the context information about the thread. This context information is structured and defined in WinNT.h as “struct CONTEXT” . Here’s the definition of fields and their offsets as defined in IDA:

00000000 CONTEXT struc

00000000 ContextFlags dd

00000004 Dr0 dd

00000008 Dr1 dd

0000000C Dr2 dd

00000010 Dr3 dd

00000014 Dr6 dd

00000018 Dr7 dd

0000001C FloatSave FLOATING_SAVE_AREA

0000008C SegGs dd

00000090 SegFs dd

00000094 SegEs dd

00000098 SegDs dd

0000009C Edi dd

000000A0 Esi dd

000000A4 Ebx dd

000000A8 Edx dd

000000AC Ecx dd

000000B0 Eax dd

000000B4 Ebp dd

000000B8 Eip dd

000000BC SegCs dd

000000C0 EFlags dd

000000C4 Esp dd

000000C8 SegSs dd

000000CC ExtendedRegisters db

000002CC CONTEXT ends

Study the following the screenshot:

This explains how the crafted SEH handler for seh_1 clears the debug registers DR0-DR3, and DR6. DR7 is set to 0x155 which enables local breakpoints for the task, and disables global ones (ones set for all tasks). This way the thread will not break on any local/global DRx based breakpoints. The neatness here is that the debug registers DRx were intended by Intel/Win32 to be accessible only from Ring 0. Using this trick then, even an unprivileged user-space (Ring 3) malware can reset all of the 4 debug breakpoints made available by the hardware. Please refer to the Intel IA-32 Manual’s “System Programming Guide” for information on the use of debug registers.

Anti-anti-reversing: There appears to be no way of escaping this anti-debugging trick. We simply can’t trust SoftIce’s BPMs while running over unexplored parts of this binary’s code. This limits us to 0xCC debug trap/single-stepping based debugging only, or via static analysis.

3.2.5.3 Jump/Return Into Middle of Instructions-Sequence:

Extending the above method, the rest of the context information can also be modified for code obfuscation. The binary frequently changes the pre-exception saved EIP value so that the handler would eventually return to a specified address (say return into the middle of an instruction sequence previously statically (mis) recognized by IDA).

Here are the static and run-time disassemblies of the same sequences of instruction bytes at 0xE26372:

Anti-anti-reversing: Since we are doing debugging and disassembling on the fly, and not relying solely on static disassembling, this is only a minor nuisance. We need to be careful about the position of breakpoints based on the code. In the above case, for example, the sub_E263D3 increments the EIP stored on the stack by 1. Thus we land at 0xE26378 instead of at 0xE26377.

3.2.6 Timing Checkpoints (Anti-Tracing/Emulation):

Consider these screenshots again:

Here the seh_1 function uses the rdtsc instruction to fetch the current TSC (Time Stamp Counter) value into the EDX:EAX registers. (low-dword in EAX). The exception then saves this in the thread context and we reach xhandler:

Here, the pre-exception EAX (TSC) value is retrieved from the saved context record and compared with the current TSC’s EAX low dword. If more than 0xE0000 CPU cycles have elapsed, then the binary assumes that it is being traced or run under a (slow) emulator, and jumps to “quit”. At quit, it causes another exception which exits the process (if not caught in a debugger).

Anti-anti-reversing: Again, during active reversing, we only need to jump over the rdtsc based checkpoints. This trick only affects slow automated tracers/emulators that can’t handle rdtsc checks.

Instead of exiting the rdtsc checkpoint, the binary could very well take us to a decoy code where we would end up reversing the decoy. (It doesn’t look like the binary does this, but it seems like an interesting idea worth taking a look sometime! A sort of honeypot technique against reversing.) We avoid any emulators or tracing (of the entire binary) for this reason.

Tracing through these SEH-based anti-reversing tricks then we find in seh_2, at 0xDE2603, it saves the delta-offset it calculated at the entry point into dword 0xE26441 which we label as entry_point. This dword will likely be used by any decryption routine(s) ahead.

3.2.7 Detection of Debug Breaks on API Calls:

The executable searches for any debug breaks in the first 4 bytes of kernel32.dll’s GetCommandLineA function. And likewise in its ExitProcess function and msvcrt.dll’s printf function. If we attempt to intercept these calls using debug breakpoints (in the 1^st 4 bytes), then the code detects that and exits.

Anti-anti-reversing: If the instructions in the first 4 bytes of these function calls are not too disruptive, we simply place a debug trap at the 1^st instruction after the 4^th byte.

3.2.8 Decryption Loops:

We set a breakpoint at 0xDE4C00 where seh_8’s exception handler returns. We execute the binary as “0x90.exe AAAAAAAA” and let the debugger hit the breakpoint. Then we start stepping through the code. We quickly reach the following xor decryption loop:

We find that this decrypts the bytes 0xDE5423 to 0XE2635A using a simple XOR operation on each byte (pointed to by eax). Then, after the decryption, it checks if there’s a debug trap 0xCC at 0xDE541E. If not, it proceeds to location E26372. What follows there is a variant of the anti debugging trick explained above (using thread context information manipulation). Tracing further we reach 0xDE541E:

It is a jump to 0xDE5423 which has just been decoded!

We keep tracing, now into the decrypted inner level. We reach another identical xor decryption routine that decrypts bytes 0xDE59A3-0xE26274. Tracing into this level gets us to another identical loop, and this keeps repeating for several levels it seems. The binary’s trick is effective. Manual tracing like this quickly gets frustrating and tedious. Especially with the obfuscations and anti-reversing traps, it’s like walking in a mine-field, and in this case, without knowing when it is going to end. We give up this approach after tracing through the first few levels.

Anti-anti reversing: For now, we look for another approach to this. The identical repetitive loops strongly point towards some type of recursive debugging/unwrapping technique.

3.2.9 Breaking At GetCommandLineA():

Frustrated by the xor loops, and noting that we haven’t yet called any external functions from the loaded DLLs, we target the GetCommandLineA function (printf and ExitProcess calls would obviously occur only after this). We had previously noted that the binary only checks for the 1^st 4 bytes of GetCommandLineA() function for 0xCC traps. We break the binary at the next instruction (after the 4^th byte in the function) at 0x77E7E2B0 (retn instruction).

As expected, running the entire code under IDA’s debugger triggers the fairly aggressive rdtsc 0xE0000 timing checks. Let’s use SoftIce to speed things up (sitting underneath the OS, that’s the fastest debugging can get for a given machine).

We use Hiew to ‘patch’ the exe at the entry point 0xDE2000 with the hex bytes “EB FE”. These bytes correspond to a reverse short jump to the instruction being pointed to by EIP. In our case, it is interpreted as “jmp 0xDE2000”. The executable will thus enter an infinite loop once it reaches this instructions sequence. The idea is to execute the exe, and somehow pause the process at the entry point. Then use SoftIce, switch to the executable’s context and set a breakpoint on 0x77E7E2B0 (retn instruction inside GetCommandLineA). This trap is beyond the 4 byte search zone of the executable. We release the binary by replacing the bytes at 0xDE2000 to the original bytes “60 E8” and exiting SoftIce (letting execution resume). The executable breaks at 0x77E7E2B0 and SoftIce pops up again. Nice.

Stepping further, we reach 0xDEF952 (“pushad”) where GetCommandLineA returns. EAX now has the pointer to the command line string used to launch the binary. We patch the next 2 bytes to EB-FE (from 60-8D) again and let the binary loop (paused at that point). Then we proceed to do the following:

3.2.10 Dumping Memory Image:

To study what happened before the GetCommandLineA() call, we dump the memory image of the binary at the current break point to another file, img_dump.exe say. There are quite a few utilities out there that allow us to dump memory images to disk (procdump, icedump etc). However, in order to show how simple that is under the hood, I have provided a small Win32 utility (ReadProcMem.cpp) along with this analysis. At the heart of this tool are the Win32 API calls:

HANDLE hProcess = OpenProcess(PROCESS_VM_READ, FALSE, (DWORD)process_pid);

ReadProcessMemory(hProcess, (LPCVOID) base, buf, size, &bytes_r);

where, process_pid is the PID of the process to be read(later dumped to disk with WriteFile etc), base is the image base to start reading from (0xDE0000 in our case), size is the number of bytes to read (image size 0x49000 in our case), and bytes_r are the number of bytes actually read.

We execute the ReadProcMem tool as “ReadProcMem <pid> DE0000 49000” to get the “img_dump.exe” dropped into the current directory. This is the uncompressed version of the 0x90.exe as found when it breaks at GetCommandLineA() call.

3.2.11 Quick Memory Image Forensics:

We load the image (img_dump.exe) into another IDA session. Since we are looking for repeating xor-decryption loops, we look for patterns that identify the decryption routines in the outermost few layers, and then search for that pattern throughout the binary. Each found instance would then indicate a layer unwrapped by the one previous to/outside it. We find one such pattern :

This is the outermost layer. The pattern we’ll use is the “xor edi, edi” “inc edi” one. From our first few runs during manual tracing, we know that this pattern repeats with only the register changing (eax, ebx, ebp etc). We use this xor_inc.idc IDC script to run through the binary, doing binary pattern searches, and mark all locations found in the binary that match this “xor reg, reg; inc reg” pattern as code. Let’s see how that goes.

The script reports 175 matches. The results seem satisfactory; we can see the decryption loops repeating going towards the core/center of the binary. So it likely had 175 such xor decryption loops!

The before and after views of the navigation bar are:

This last image captures the nature of the encryption loops nicely. We can see how the binary’s core is wrapped inside several layers (seen as instruction/function bars) of encryption. The unexplored area shown is thus polymorphic and can’t be analyzed until the layers are removed.

3.2.12 Decrypting The Loops Via SoftIce Scripting:

Let’s say we hadn’t found the way to get right to GetCommandLineA, and unwrapping the layers at run-time while tracing through the code was the only option. How would we do that? We know that:

Each layer destroys the debug registers. So hardware breakpoints are useless.
Each layer wraps a polymorphic code. Using a debug break on the polymorphic code is useless; the 0xCC (INT 3 opcode) will be converted to something meaningless by the time it is reached.
175 layers are simply too much for any reasonable person to be expected to trace through.

We need a debugger that can be scripted. We need debug-actions that will be taken once a breakpoint is reached. The debugger needs to be fast enough not to trigger the rdtsc timing checks.

SoftIce provides these capabilities; it’s reasonably fast and freezes the OS while it’s popped up, its macros provide scripting functionalities, and it allows debug-actions that could include recursive macros. Perfect.

From the section “Decryption Loops”, we know that the “jmp nxt_level” location and its inner-layers counterparts are most intuitive locations to place the breakpoints at. These are really the exit locations from each outer layer into the just decoded layer. However, we know that the layer also checks for a 0xCC debug trap placed at this point. This is done by the “cmp byte ptr [reg], 0CCh” instructions as at 0xE26369. We look for a better place to place our break points at. It has to be after the xor-decryption loop, and before the jump into the next decoded layer. Let’s try the 0xCC comparing instruction location itself. This IDC script cmp_CC.idc creates a file cmp_CC.txt file that finds all locations at which this pattern is found in the binary.

Sorting this cmp_CC.txt file, and taking the difference between every two adjacent locations, we find that these locations are offset by a fixed distance of 0xE6 bytes. So, all we need to do is set a break point at the outermost instruction, and set a recursive breakpoint action that sets the next breakpoint at a distance of 0xE6 into the binary.

Here are the SoftIce macro definitions that trace through the binary unwrapping each layer as the breakpoints are hit:

macro mac1 = “bc * ; bpx 0xE1C8DB ; bpx (eip – 0xE6) do \”mac2;\” ;”

macro mac2 = “mac1 ; x ;”

macro mac3 = “bpx 0xE26369 do \”mac2 ;\” ;”

Then we run macro mac3, and we reach 0xE1C8DB which is just 2 levels away from the 175^th level (the last two seem have a different offset than 0xE6). Tracing into the last 2 levels should be easy, especially since we know the exact jmp nxt_level locations too. Tracing further, we run into the “call GetCommandLineA” instruction as well. This confirms that this call occurs after the various decryption loops have completed.

bpx 0xE1C8DB was necessary (?) cause SoftIce doesn’t really provide conditional debug actions – just conditional debug breakpoints. So, if we set a conditional debug breakpoint with a debug action, when the breakpoint address is reached, even if the condition evaluates to false, although the breakpoint is not triggered, the debug action is still performed! This should be considered as an irritating bug in functionality cause it makes it hard to exit the recursive loop based on eip say. However, with bpx 0xE1C8DB, it then breaks just before the last (eip – 0xE6) point, and we exit the recursive loop of macros. Another point to note is that that the 0xCC laid by the bpx corrupts the inner layers as they are decoded – however exactly one byte is corrupted over the entire run – the one at 0xE1C8DB – the xor-loops used do not propagate the 0xCC change across the binary. So at the end, we only need to guess what that byte is (SoftIce will show the wrong one it found there before placing the last 0xCC). That’s easy given we know what the instruction should be something like “lea reg1, loc_Dexxxx[reg2]”. It should be ‘8D’ instead of ‘12’ in this case.

There, we could unwrap the 173 layers of the binary within a few seconds using this trick. It’ll be interesting to know if the 0xE6-like fixed offset is necessary by design of the binary or was provided so that the binary could indeed be cracked.

3.2.13 Attaching IDA Debugger At GetCommandLineA() Point:

SoftIce is not too good as a disassembler, and we would really like to trace under IDA beyond the GetCommandLineA break point. We first pause the executable as explained in the “Breaking at GetCommandLineA()” section. At this point, we take the same 0x90.idb workspace from before, and instead of debugging from the entry point, use “attach to process” to attach to the running 0x90.exe executable (paused with EB-FE at 0xDEF952). In terms of the debugging API, IDA calls the DebugActiveProcess() API call. More information about what happens then can be found on MSDN. We end up inside ntdll.dll’s DbgBreakPoint() function. Step through that to reach our 0xDEF952 EB-FE location.

Now pop-open SoftIce and restore the 60-8D bytes at that location. Since the thread is now a debugee, it’s suspended and under IDA’s control (so it wont run to ExitProcess() ).

Using a couple of macros then, this way, we can always begin reversing from this point on – side-stepping all the previous anti-reversing tricks.

3.2.14 Cracking The Password:

3.2.14.1 First Step (First 4 Characters):

Now we have reached the point where the binary will likely start checking whether the password entered is correct. Assuming that our command line was “0x90.exe AAAAAAAAAA”, let’s trace the code starting from the return from the GetCommandLineA call.

Eax has the pointer to the command line string currently. This is then moved to the dword at 0xDE8736 which we rename as gcla_ptr. Tracing further, a dword at 0xDE874A is set to 1. We label this as status. Further, due to the frequency and manner in which dwords after gcla_ptr are accessed, we make it a DWORD array of size 5. This will help us track all accesses to the gcla_ptr information by searching for accesses to this array.

We locate all instances of “gcla_ptr” in IDA and set trace-points at each of them. Then we let the binary run to completion. The trace window then gives a good idea of the execution sequence as seen via accesses to the gcla_ptr array. The following is summarizes this information.

The binary repne scasb’s for the space character and gcla_ptr[0] is set to point to “AAAA…” (argv[1]) instead of “0x90.exe AAAA…”.

At 0xE0F030, it moves the 1^st 4 bytes of argv[1] (0x41414141) into gcla_ptr[2].

Then begins a sequence where dword constants are added to gcla_ptr[1], and gcla_ptr[2]. gcla_ptr[1] begins with the value 0x5CC80E31 that exists from the point GetCommandLineA() returned. gcla_ptr[2] begins with 0x41414141 as mentioned. These constants further do not depend on the command line string used. We label 0xE181A6 as add, 0xE1226D as "and" and 0xE1A05B as sub since constants are added, and-ed & subtracted at those locations respectively. 0xE173B7 (stack) pushes gcla_ptr[1,2] on to the stack. 0xDFEFDC (Decision1) compares these values of gcla_ptr[1], and gcla_ptr[2] and proceeds only if they are equal. Now, gcla_ptr[1] is a sum of constants, and will always add up to 0xDF807499. gcla_ptr[2] is 1^st 4 bytes of password added to a constant 0x914D3068 (sum of the rest of them), and anded with 0xDFFFFFFF. Without the and operation, the 1^st 4 bytes would need to be 0xDF807499 – 0x914D3068 = 0x4E334431. 0x4E334431 is the Only number which when added to the constant 0x914D3068 gives 0xDF807499. Anding with 0xDEFFFFFF masks the 3^rd MSBit of the sum, and so the gcla_ptr[2] total could also be 0xFF807499 which would mean the 1^st 4 bytes would need to be 0xFF807499 – 0x914D3068 = 0x6E334431.

Considering the host-based endian-ess then, the 1^st 4 bytes would have to be either 0x3144334E or 0x3144336E which is in ASCII:

“1D3N” or “1D3n”

This gives us the 1^st 4 characters of the password (the 4^th might be either n, or N, or both depending on whether the binary limits it to one later). Till we don’t get the entire password, we can’t be sure if these really are correct. For now, the only confirmation seems to be the fact that these are indeed printable ASCII characters that an user can type on the command line which is likely not a coincidence. Further, these seem to match the usual hacker-speak. Perhaps its “1D3n71fy” or “1D3nt14y” ?? (Identify). Let’s trace further.

3.2.14.2 Second Step (2 More Characters):

We label 0xDFF80F as got_4, and trace further from there.

The same trick as in the previous section again proves pretty handy. We track accesses to gcla_ptr[] using trace debug points in IDA. (I really wish their instructions-trace feature wouldn’t be limited to the first return instruction that comes by – since the rdtsc checks are gone now, the binary is an ideal subject for emulation now. The instructions trace is like API spying at the assembly instructions level. Would be great for a quick analysis.)

The condition to be satisfied turns out to be that the sum of the 5^th character and 7^th character should be 0xA8. Let’s choose 5^th character as ‘t’ which makes the 7^th character ‘4’ (that’s pretty close to what we guessed above!). Indeed using “1D3[n,N]t.4.” does take us out of the loop for the second step.

Let’s mark 0xDFD18C as got_5_7 for now. We have reached “1D3[n,N]t.4.” so far.

3.2.14.3 Third Step (2 More):

Hmm, I am really running short of time. It’s 9AM in the morning, last day of the challenge and I haven’t slept at all L Let’s hope there are not many more steps left.

We find that 6^th and 8^th characters are added together and placed in gcla_ptr[2]. Then some constant bytes are added and subtracted from it. Then it’s xored byte-by-byte with the bytes stored at 0xE1BC27. We mark this location as an array of bytes, and label it secret.

My guess is that by xoring with this certain byte, the secret array reveals some ASCII secret. Let’s use this program to check:

int main()

{

unsigned char secret[] =

{

0x0A, 0x0D, 0x0, 0x14, 0x26, 0x2F, 0x20, 0x2C, 0x2E, 0x26, 0x6D, 0x6d, 0x6d,

0x49, 0x4E, 0x6, 0x3B, 0x33, 0x2F, 0x2C, 0x2A, 0x37, 0x63, 0x25,

0x2C, 0x31, 0x63, 0x2A, 0x37, 0x63, 0x27, 0x2C, 0x26, 0x30, 0x2D,

0x64, 0x37, 0x63, 0x2E, 0x22, 0x2, 0x37, 0x37, 0x26, 0x31, 0x63,

0x72, 0x6D, 0x3B, 0x63, 0x0, 0x2C, 0x36, 0x31, 0x37, 0x26, 0x30,

0x3A, 0x63, 0x2C, 0x25, 0x63, 0x0D, 0x2A, 0x20, 0x2C, 0x2F, 0x22,

0x30, 0x63, 0x1, 0x31, 0x36, 0x2F, 0x26, 0x39, 0x43

};

for (unsigned char c=0x20; c<127; c++)

{

unsigned char * sec_copy = (unsigned char *) calloc(sizeof(secret)+1, 1);

memcpy(sec_copy, secret, sizeof(secret));

for (int i=0; i<sizeof(secret); i++)

{

sec_copy[i] = sec_copy[i] ^ c;

}

printf("for %c, secret = %s\n", c, sec_copy);

}

return 0;

}

Sure enough, for a byte value of character ‘C’ we get:

“INCWelcome...

Exploit for it doesn't maAtter 1.x Courtesy of Nicolas Brulez”

By tracing back our steps then, it should be easy to find the condition on the 6^th and 8^th characters as well. Perhaps also on more characters, or the ones we have already found. Each step will take us closer to the correct password. Like Nicolas mentions though, the correct password at this point doesn’t matter, 1. it should follow perhaps trivially from this point on, or 2. we have a location where we could jump over (or patch) the test (like we did for rdtsc for example) and proceed further into the binary even without the exact password now, 3. the secret string (which could hypothetically contain sensitive information) has already been revealed.

With that, we end our analysis here.

4 Conclusion:

We have seen a systematic reverse engineering of the given binary. The binary contained a myriad of code obfuscation, anti-debugging and anti-disassembling protections. With the right choice of tools, and techniques, the protection mechanisms have been overcome. Nicholas probably has made his point pretty clear by now – such protection can be extended way further using much more advanced crypto, using checksums across various parts of the code to detect debug breaks, intermixing various types of obfuscation tricks that do not follow any pattern (as against pusha-popa across the binary), in this case, using layers that do not have the same size offset (0xE6), more advanced ways of detecting API spying or breakpoints etc. The goal should be to make the level of reversing practically impossible in a reasonable amount of time. Malware could do this to delay AV signature research, IP-protection software could use this to deter software crackers until the next version is released. There is value in researching both reversing and anti-reversing techniques.

5 References:

1. Microsoft Portable Executable and Common Object File Format Specification

2. DebugActiveProcess() API Call

3. Phrack 62 on Win32 API Hooking

4. “Under the Hood” from MSJ on Thread Environment Block Etc

5. Excellent post on Usenet about SEH-based Exception Handling

6. Intel’s IA-32 Manuals