Honeynet Project Scan of the Month - Scan 33 (November 2004)
Submission by Eloy Paris <peloy at chapus dot net>
Fri Dec 3 18:34:47 EST 2004
"I've seen things... things you little people wouldn't believe... Attack ships on fire off the shoulder of Orion bright as magnesium... I rode on the back decks of a blinker and watched c-beams glitter in the dark near the Tanhauser Gate. All those moments... they'll be gone." Roy Batty (Nexus 6. Incept date 2016)
Abstract
This is a submission to the Honeynet Project's November 2004 Scan of the Month.
The analysis is in some parts very detailed. If you are a grader, have several submissions to read and can't go over all the details, or are just a casual reader, feel free to go directly to Section 1, “Questions”, which contains the answers to all the questions of this challenge. The questions (and answers) provide a good summary of the most important aspects of this challenge. However, just reading the answers is not the best way to learn and understand some of the details nor the process I followed to analyze 0x90.exe, so I encourage you to read the whole submission.
Table of Contents
A: The following are the anti-reverse engineering traps and techniques that I found while analyzing 0x90.exe:
Pushing the limits of the Windows Portable Executable (PE) format by tampering with some fields in the PE and section header so disassemblers like IDA Pro and objdump don't have a chance. This technique can be defeated by loading the file in the disassembler as a binary object as opposed to a PE executable.
Use of no-operation blocks: these blocks consisted of big numbers of instructions that served no purpose other than to hide the real instructions. These blocks were everywhere and slowed the analysis.
Use of the RDTSC (read timestamp counter) instruction to measure the number of cycles needed to execute sections of code, and abort the program if it looked like it took too long to execute, maybe due to the presence of a debugger.
Heavy use of Windows' Structure Exception Handling infrastructure. In particular, 1) attempting to kill debuggers from within an exception handler by writing zeroes to the microprocessor's debugging registers (DR0, DR1, DR2, DR3, DR6 and DR7), and 2) changing program flow from within an exception handler by directly manipulating the value of the EIP register saved in the stack as part of the _CONTEXT structure (see winnt.h)
Heavy use of encryption in different sections of the binary. These sections are decrypted on the fly and in different stages, most of them right before the code in an encrypted zone is executed. The huge number of decryption areas represents also a problem to the reverse engineer - in one instance there are 175 encrypted blocks that are being decrypted from different sections of the binary.
Use of a virtual machine to decrypt/unpack some parts of the binary, and to perform the function the program was created for. This might be the most difficult barrier to analysis and reverse engineering of this binary.
Very twisted way of calling library functions from the Windows API: all calls to the Windows API are made from the same place. The call instruction that invokes the API functions is modified right before making the call so the wanted function is called.
Attempts to detect software breakpoints (int 0x3 - opcode 0xcc) at the entry point of Windows API functions and kill the debugger if one is found.
Details about all these techniques, including their impact on the reverse engineering process and how they can be neutralized can be found in Section 2.2, “Anti-Reverse Engineering Techniques and How to Defeat Them”.
A: What sets this binary apart is the use of a virtual machine to do an important portion of the decryption/unpacking process. In fact, the same virtual machine that is used to unpack the binary at run time is also used to perform the actual work of the program. The use of this technique really raises the bar in terms of the resources (patience, skills, creativity and time) required to break the protection of the binary and to understand the purpose in life of the protected program.
More details about the "Brulez" virtual machine can be found in Section 2.3, “ x86 Machine Code Becomes Microcode (a.k.a. "the Machine Within") ”.
A: In my opinion, there is no "quick" way to analyze a program that uses virtual machines to prevent reverse engineering, especially if the virtual machine is unknown and designed specifically to prevent reverse engineering. I believe that to be able to break the protection of a program that uses virtual machines as a shield against reverse engineering, the virtual machines have to be reasonably well understood because the extra level of abstraction that the virtual machine provides requires added complexity and in particular a higher number of real machine instructions per virtual machine instruction.
Quickly analyzing a virtual machine (as opposed to a program that uses virtual machines) is also something that in my opinion is hard to accomplish mainly because it might not be evident that one is dealing with a virtual machine. Experience, as always, will help.
I discovered I was dealing with a virtual machine when I determined that the x86 machine code was following some kind of "script" that was being tracked by the ESI register. After statically (in the disassembler, as opposed to the debugger) following execution of this script for a few x86 instructions I noticed that the program was using a few jump tables that were indexed by bytes in this "script" pointed at by the ESI register. I also noticed that each time that a jump was made through this table, a certain region of memory was being modified. It did not take long to realize that the "script" I was following was just machine code for what I later called "the Brulez Virtual Machine", and that this region of memory being modified was actually reserved for registers of the virtual machine.
The Brulez virtual machine turned out to be very simple and to have a small instruction set. The virtual machine program that was embedded in the binary was also simple, which allowed me to reverse engineer it. However, there was no magic formula or silver bullet - it was a pretty manual process that I don't see an easy way to automate.
Something that would probably help is to have an IDA processor module for the virtual machine being reverse engineer, but this is actually a Catch-22 since you can't have a disassembler for a virtual machine you still do not understand.
A: The most valuable tool to analyze a binary armored with some of the protections against reverse engineering that I outlined in Section 1.3, “ Q3: Provide a means to "quickly" analyze this uncommon feature. ” is a good disassembler. The reason for this is that a heavily armored binary will try hard to kill a debugger and to prevent all attempts to get to the core program, so using a debugger is not the best option. Static code analysis with a disassembler, and especially an "interactive" disassembler like IDA, is, at least to me, the best option. A lot of tasks (like decryption of encrypted sections) could be performed from within the disassembler by using its C-like language.
A: The 0x90.exe executable does only one thing: to print a welcome message if the correct key has been provided by the user.
C:\tmp>0x90 Please Authenticate! C:\tmp>0x90 1D3NEAcN Welcome... Exploit for it doesn't matter 1.x Courtesy of Nicolas Brulez C:\tmp> |
The program does absolutely nothing malicious and can be run without fears.
A: The binary is expecting a key or serial number that authenticates the user. The main indication that the program is expecting some kind of authentication from the user is that when the program is run the message "Please Authenticate!" is displayed, as can be seen in the output shown in Section 1.5, “ Q5: Identify the purpose (fictitious or not) of the binary. ”.
The binary looks for this key in the command line, and there are two possible ways to realize this: the first one is by observing that the Imports Table of 0x90.exe includes an entry for the Windows API function GetCommandLine() (this does not mean much, though, since it could be a reverse engineering trick.) The second way, and the only one that is 100% guaranteed to be effective is to fully reverse engineer the binary and understand where it is looking for the authentication information. This is the approach that I took, and it has the added benefit that it allows to reverse engineer the key or serial format so the program can be run without patching it.
There are four possible keys that the program will accept. These keys are (not including the quotes):
"1D3NEAcN"
"1D3NEFcI"
"1D3NEGcH"
"1D3NEKcD"
To come up with these keys I had to fully understand the Brulez Virtual Machine program embedded in 0x90.exe. This VM program was the one actually checking the validity of the key. Basically, I had to understand what operations were being performed on the key, come up with a set of constraints a key has to satisfy, and then work backwards. The details about this procedure are too lengthy for this section of the report so please refer to Section 2.5, “Discovering Authentication Keys” for full details.
A: These are a few things that can be implemented to increase the time and skills required to break the protection of 0x90.exe:
Get rid of the pusha/popa instructions that enclose no-operation zones. This way it will not be trivial to identify no-operation zones so they can be hidden. Registers could be saved in the stack one by one, or in a special memory area, or only those registers modified by the no-operation zone instructions could be saved.
Get rid of the markers (the 0xdead and 0x31000) that are used to identify each decryption block. Having these markers in there makes it just to easy to automatically decrypt the section of the binary the markers deal with. Each decryption block in the chain should be different so finding a pattern is very hard to do. The idea would be to create a tool that makes creation of the decryption blocks very easy, but that makes identification of them very hard.
Monitor changes in the debugging registers: if during execution the program checks for the value of the debugging registers at two different point and it finds that some values have changed that means someone is messing around with a debugger and the program should bail out.
Implement a more complicated virtual machine: more instructions are needed. Also, the current implementation of the instruction decoder via tables is too easy to figure out. A more manual process should be implemented and decoding of virtual machine instructions should be obfuscated.
Use a virtual machine and write no-operation zones in the virtual machine program. This would keep the reverse engineer busy for a while.
The entire real program could be run in a virtual machine, as opposed to just parts of it. I dread a program written in a high level language like Visual Basic because one high-level language constructs translates into lots of low-level language instructions and unintelligible library calls. So, a high level language can be seen as a virtual machine.
Detect machine virtualization software like VMware since chances are that if someone is running the binary in a virtual machine that user is trying to reverse engineer it.
Use of strong encryption: this is one that would be impossible to break. If the binary were encrypted and only a way-one password could decrypt it then it would be impossible to reverse engineer the program unless it could be caught decrypted in memory while running.
Note that these are just some examples of anti-reverse engineering techniques and that none of them is a silver bullet (except maybe the last one above, encryption) - anyone with sufficient determination will always find a way to break the protection.
What follows is a comprehensive analysis of the 0x90.exe binary.
The 0x90.exe is a Win32 console application. The objdump from the binutils package helped me to get an initial idea of what I was dealing with.
During the initial and very superficial inspection there are two things that I noticed immediately: first, that the strings command is useless. Running this command only gives us the following "useful" information:
(You really thought you would find strings eh? ;-) Scan of the month coded by Nicolas Brulez / Digital River |
This is to be expected, though, since these days 99.9% of malware is packed with some kind of executable packer for two reasons: 1) to prevent the casual user from finding useful information (like the FQDN of an IRC server hosting a botnet), and 2) to minimize the size of the executable.
The second thing that I noticed was that objdump, IDA Pro, and other tools to explore PE files fail miserable to disassemble the binary or to obtain some information about it. As I will show, the reason for this is that the section headers, and the PE header in general, have been tampered with to cause troubles.
Finally, as part of a very superficial and initial analysis, I always like to run the malware to try to obtain as much information as possible about its purpose. Needless to say, running malware is dangerous, and must be done in a sandbox in an air-gapped network and in a machine that can be quickly restored to the state prior to running the malware. I usually use a VMware virtual machine, which does not always work due to the fact that today's malware attempts to detect whether it is running on a virtual machine. So, having said this, I ran the malware and got the following output:
C:\tmp>0x90 Please Authenticate! |
It is important to note that running the malware several times in short succession will cause it to apparently hang or not produce any output at all some times. I will speculate on the reason for this behavior later on.
After a short initial and superficial inspection I proceeded to use the real tools that could potentially help to get this job done: a disassembler and a debugger. Using these tools I encountered a good number of anti-reverse engineering traps of various level of complexity. In this section I will present the traps I found, what their impact was, and how I managed to evade or neutralize them.
What: As I mentioned in Section 2.1, “Initial Inspection”, the 0x90.exe program could not be loaded in a disassembler as a Windows PE executable. The reason for this is some non-sense values in the PE header and section header. These non-sense values obviously do not stop the Windows executable loader for some reason.
Impact on Reverse Engineering: Not being able to disassemble the program you are trying to reverse engineer is a big problem. You can always resort to the debugger, but for really complicated tasks, a debugger alone is not enough: for heavily armored and powerful binaries, powerful reverse engineering tools are called for, so being able to load the file in a disassembler is a must.
Solution: I was short on time so I did not even look at the PE and section headers to see what Nicolas Brulez had done to them. I just loaded the file in IDA Pro using the binary loader (as opposed to the PE loader that was not getting me anywhere.) When using the binary loader I did load the program at offset 0xde0000, which is the image base reported by objdump. objdump also reported the program's entry point (at 0xde2000), so I was set to start analyzing the program.
What: The first thing that I found as soon as I started to trace through 0x90.exe in a debugger (or while looking at the disassembly) was that there were lots of instructions. This by itself is not a problem because programs, after all, are made out of instructions, but when you realize that these instructions are doing nothing then this might represent a problem. These no-operation instructions appeared always in blocks enclosed by a pair of pusha/popa instructions. Here's a short example of one of them:
seg000:00DE2012 pusha seg000:00DE2013 mov al, bl seg000:00DE2015 db 65h, 36h seg000:00DE2015 rep mov ecx, edx seg000:00DE201A repne jmp short loc_DE201E seg000:00DE201D db 0ABh seg000:00DE201E seg000:00DE201E loc_DE201E: ; CODE XREF: seg000:00DE201A seg000:00DE201E mov esi, esi seg000:00DE2020 db 26h seg000:00DE2020 repne lea ebx, es:0D0E47C6Eh seg000:00DE2028 db 65h, 64h seg000:00DE2028 repne mov ah, 51h ; 'Q' seg000:00DE202E jmp short loc_DE2031 seg000:00DE2030 db 27h ; ' seg000:00DE2031 loc_DE2031: ; CODE XREF: seg000:00DE202E seg000:00DE2031 db 2Eh seg000:00DE2031 mov al, ch [...] seg000:00DE227F db 65h seg000:00DE227F mov dh, al seg000:00DE2282 mov ecx, 7D30653Fh seg000:00DE2288 popa |
Note how the code inside this no-operation block is meaningless, especially because 1) nothing is ever written to memory, and 2) only registers are affected, but the block begins with a pusha instruction and ends with a popa instruction, which destroys anything loaded into the registers within the block. Also, note how instruction prefixes like cs:, ds:, repe, repne, are being used for instructions that do not need them. For example, a "mov al, ch" instruction does not need a "repe" prefix, this does not make any sense since this prefix is used by string operations (scansb, movsb, etc.) I am surprised that the processor doesn't produce an illegal instruction trap or something, but it apparently can take all this crap.
Impact on Reverse Engineering: These no-operation blocks are everywhere and their impact is to tremendously slow down the reverse engineering process. In a disassembler is not that big of a problem because I can search for the closing popa of the block, but in a debugger it causes more problems.
Solution: Since this no-operation blocks are very easy to recognize it is possible to filter them out in the disassembler. One way of doing this is to identify most occurrences of these blocks and then tell the disassembler to hide these portions of code. To identify these portions of code I generated a listing of the disassembly and then used the Unix grep in the following way:
grep 'pusha\|popa' listing.lst > addresses.txt |
This gave me a file with the beginning and ending addresses of the no-operation blocks. With this file I then created a small IDC program that just read this input file and then told the debugger to hide all this blocks. The addresses.txt was not perfect but it was a good starting point to hide all these blocks that were just slowing me down.
Here's how the above code segment looks after hiding all the no-operation zones (note the "no-operation zone" comments and how many bytes of sequential memory they cover):
seg000:00DE2009 sub eax, 6 seg000:00DE200C sub ebp, 0DE2006h seg000:00DE2012 ; no-operation zone seg000:00DE2289 ; no-operation zone seg000:00DE22F1 ; no-operation zone seg000:00DE259A seg000:00DE259B ; no-operation zone seg000:00DE2603 mov [ebp+0E26441h], eax seg000:00DE2609 ; no-operation zone seg000:00DE2671 ; no-operation zone seg000:00DE2950 mov eax, 0DE100Dh seg000:00DE2955 ; no-operation zone seg000:00DE2C2C mov eax, [eax+2] seg000:00DE2C2F ; no-operation zone seg000:00DE2EF1 mov eax, [eax] seg000:00DE2EF3 ; no-operation zone seg000:00DE2F5B mov edi, eax seg000:00DE2F5D mov ecx, 4 |
What: It was common to see code like the following:
seg000:00DE22B9 cpuid seg000:00DE22BB rdtsc seg000:00DE22BD sub eax, [esp] seg000:00DE22C0 add esp, 4 seg000:00DE22C3 cmp eax, 0E0000h seg000:00DE22C8 ja short loc_DE22CD seg000:00DE22CA xor eax, eax seg000:00DE22CC retn seg000:00DE22CD loc_DE22CD: seg000:00DE22CD add dword ptr [ecx+0B8h], 63h ; 'c' seg000:00DE22D4 xor eax, eax seg000:00DE22D6 retn |
This code is basically using the rdtsc instruction to measure the number of cycles that it takes to execute a section of code. This information is then used to kill the program if it is detected that too many cycles have passed (0xe0000 in this case.) The program is being killed by adding some random junk to the saved EIP register (this particular code is being executed inside an exception handler.)
Impact on Reverse Engineering: This does not affect the reverse engineering process when analyzing code in the disassembler but can potentially cause problems in the debugger if one is not careful and steps through this code.
Solution: If stepping through code in the debugger just skip over this code by changing the value of the EIP register.
This rdtsc non-sense is what is causing the program to behave erratically (see last paragraph of Section 2.1, “Initial Inspection”) since it gives false positives ("positive" here meaning "you took too long to execute this, just must be stepping through code in a debugger") that then hang the program based on the decision it makes when a positive is detected.
What: The program is taking advantage of Windows' Structure Exception Handling infrastructure to wreck havoc. Basically, the program sets exception handlers at lots of different places in the program and then generates an exception (trying to dereference a NULL pointer.) The exception handler then takes control and does some nasty things. In particular, different types of exception handlers were : 1) trying to kill the debugger from within an exception handler by writing zeroes to the microprocessor's debugging registers (DR0, DR1, DR2, DR3, DR6 and DR7), and 2) changing program flow from within the exception handler by directly manipulating the value of the EIP register saved in the stack as part of the context structure.
Impact on Reverse Engineering: The user of Windows SEH infrastructure is a major problem - it basically make it impossible to debug the program.
Solution: Solving this problem is tricky. The solution that I implemented was to find all the places with exception handlers and patch them so they are not executed. For those exception handlers that were changing program flow I just changed the program flow from within the main program by putting in place jumps to the final destination. The actual patching was done by a small C program (decrypt-0x90.c) that I wrote, although it could have been done easily in IDA's IDC language.
The way to use decrypt-0x90.c is as follow: you run it a give as command-line argument the name of the 0x90.exe you want it to patch for you. This file will be patched and then you can use that to run in the debugger without fearing the exception handlers. You can also load this file in the disassembler and analyze that one instead of the original 0x90.exe. I used both techniques.
What: Hundreds of sections of the binary were "encrypted". They were "encrypted" with single XOR operations, but encrypted nonetheless. The problem was not the encryption method itself, because both the keys used to encrypt and the sections within the file that were encrypted were easy to find. The problem was the big number of these encrypted sections.
It is interesting to delve into the details of how all the encrypted sections of the binary were organized:
The first encryption section was what I call a "chain" of 173 encrypted blocks. Each one of this blocks had the following structure (taking from the C program I wrote to decrypt them):
struct decryption_block { char decrypt_code[155]; char nop_code[8]; char get_size_code[13]; char get_addr_code[14]; unsigned marker1; /* 0xdead */ unsigned code_offsets[4]; unsigned marker2; /* 0x31000 */ unsigned return_offsets[4]; } __attribute__((packed)); |
All blocks had the same structure. The first four arrays are used to account for x86 code that performs four functions: do the actual decryption, do nothing, get the size of the code to decrypt and get the starting address of the block to decrypt. marker1 is a constant marker that is always the 32-bit word 0xdead (yeah, no comment here). Following that are the offsets, from the image base, to the four functions described above. marker2 is another marker that is always 0x31000. Following that marker there are four 32-bit words that are used to calculate the return address from the four functions described above. Functions are called in the following order: get address, do nothing, get size of block to decrypt, and decrypt. The decryption code is very simple and always the same:
void decrypt(char *buffer, unsigned size) { do { *buffer++ ^= size; } while (--size); } |
This first chain had a total of 173 of these decryption blocks. They were located at the very top of the binary, i.e. higher memory addresses, with the first block in the chain at higher memory addresses with each subsequent block at lower memory addresses. Each decryption blocked was responsible for decrypting the code that was going to be executed next.
After the first chain, there was a second chain, and a different address (that's why I call this a different chain). This one only had two decryption blocks, but they had the same structure I presented above.
Impact on Reverse Engineering: These hundreds of encrypted sections made it impractical to use a debugger for several reasons. First, I could not just set a breakpoint at the end of all decryption blocks because I did not know where the encrypted blocks were going to end. Second, because the exception handlers I presented above would have killed the debugger. Third, because, as I later found out, the decryption blocks were so many that stepping through all of them would have been impractical. In addition to the inability to efficiently use a debugger to go through the decryption process, another impact of the heavy use of encryption in this binary on the reverse engineer is that the file is very hard to analyze, even statically in a disassembler. The reason for this is that there isn't a single subroutine that decrypts the entire file, all sections. Instead, decryption was done in very different places, and each time something got decrypted it was to execute it next.
Solution: Solving this problem is also tricky. The solution that I implemented was to find all the places with exception handlers and patch them so they are not executed. For those exception handlers that were changing program flow (the ones inside the virtual machine microcode) put in place jumps to the final destination (the same thing that the actual exception handler had done if an exception had occurred.) The actual patching was done by the same (decrypt-0x90.c) that I described above in Section 2.2.4, “ Heavy Use of Windows' Structured Exception Handling Infrastructure ”. Again, this could have been accomplished without leaving IDA through IDA's IDC language.
decrypt-0x90.c decrypted the first two chains by traversing the decryption block structures in high memory. It did this by starting with the first one and keep going while the 0xdead and 0x31000 markers were in place; something like (important thing to see here is the for() loop):
#define MARKER1 0xdead #define MARKER2 0x31000 void decrypt_chain(struct decryption_block *d, char *buffer) { unsigned decrypt_len, decrypt_addr; for (;d->marker1 == MARKER1 && d->marker2 == MARKER2; d--) { decrypt_len = *(unsigned *) &d->get_size_code[1]; decrypt_addr = *(unsigned *) &d->get_addr_code[2]; decrypt(decrypt_addr, decrypt_len); } } |
Once the two encryption chains have been decrypted I can patch the program to just jump over them, skipping all this mess. I can also use the resulting new binary as the new starting point of our disassembly, which is what I did.
As I will show next, after the two encryption chains were defeated I found a bigger challenge: what I call the "Brulez Virtual Machine". I will go into detail about this virtual machine in the next section but here I just want to point out that the virtual machine program performed decryption of two additional encrypted sections (one section was unconditionally decrypted, the last one only if the user authenticated correctly.) The first decryption performed by the virtual machine was an XOR with 0x53. The second decryption was an XOR with 0x43.
It is important to note that the final approach I used to neutralized most of the encryption was not the first one I tried. Just to give an idea of the wrong approach I originally was following, I was going through the first encrypted sections and learned quickly how to neutralize some of the reverse engineering techniques I have mentioned and that were being employed during the decryption process (no-operation blocks, exception handlers). I managed to get the program under a debugger to decrypt a section. Then I kept going and found another encrypted section. I decrypted that one, continued, and found another encrypted section. I decrypted that one too and continued hoping that it was the last one. I kept going for a few more sections until I realized that depending on the real number of encrypted sections I could be spending an unreasonable amount of time going through the encrypted sections manually so I decided to try a different approach.
What: After defeating the two decryption chains I presented in detail in the previous section I found a worthier challenge in the code that followed the last decryption block of the last chain: a virtual machine that was responsible, through a program written in this virtual machine's machine code, for the decryption of additional sections of the binary, for user authentication, and for performing the actual task the program was created for, i.e. executing printf("Welcome").
Impact on Reverse Engineering: The impact of a virtual machine and the corresponding virtual machine program embedded in a program has a huge impact on a reverse engineering task. I have seen challenges like this when trying to reverse engineer programs written in very high level languages, especially interpreted languages, which can be seen as "virtual machines". The problem is that it is very difficult to follow what the program is doing, neither statically in a disassembler, nor dynamically in a debugger, because the program one is trying to understand becomes microcode for the higher level instructions of the virtual machine. So, one virtual machine instruction takes several host machine (in this case the x86 machine) instructions to complete. In addition, the virtual machine has to deal with decoding of the instructions of the virtual machine, which causes additional complexity and pain to the reverse engineer.
Solution: Once I realized that it was a virtual machine I was dealing with, and that there was virtual machine code embedded in the binary it was just a matter of reverse engineering the virtual machine, understanding this machine's instruction set, and finally understanding the embedded virtual machine program. This understanding of the virtual machine program is what finally allowed to break the binary and understand its purpose.
What: The 0x90.exe binary has a very peculiar way of calling functions from the Windows API, at least in the most sensitive places of the program (inside the virtual machine program): instead of using regular call that goes through the Import Address Table (IAT) to reach Windows API functions, the binary goes to the IAT, fetches the address of the Windows API function that is going to be called, modifies the operand of a call instruction based on what was fetched from the IAT, and then executes the call. This is a very twisted way of calling functions in the Windows API.
Impact on Reverse Engineering: This trick has a low impact on the reverse engineering process. The impact is higher is the reverser is using a debugger and setting breakpoints at the entry points of the Windows API functions: in this case the reverser will see that all calls to the Windows API will come from the same place in the program, which is very confusing. The impact on the reverse engineering process is very low if analysis of the program is being done statically in the disassembler.
Solution: I was not affected by this trick since by the time I encountered this technique I had already understood the instruction set of the virtual machine, which is where the calls to the Windows API are being made (GetCmdLine() and printf().
What: In lots of different places 0x90.exe tries to kill the debugger, or more accurately, the program if it is determined that it is running under a debugger. Some ways the program or debugger is killed if it is determined that it is running under a debugger are: 1) by writing zeroes to the processor's breakpoint-address registers (DR0 through DR3), to the debug status register (DR6) and to the debug control register (DR7) from within all the exception handlers and 2) by jumping to a random address (the output from the RDTSC instruction, actually) in those instances when a debugger has been detected.
If I remember correctly there is only one instance where a debugger is explicitly searched for: at the very beginning of the program the first four bytes at the entry point of three Windows API function calls (GetCmdLine(), printf(), and ExitProcess() ) is searched for the presence of the byte 0xcc, which is corresponds to the instruction "int 3", and that is used to enter a debugger from a program. The code that does this search gets to the entry point of the API function by going through the IAT. Here's an example of this code. See how if a 0xcc byte is found the program jumps to a random address:
seg000:00DE3274 mov eax, 0DE1001h seg000:00DE3279 ; no-operation zone seg000:00DE3555 seg000:00DE3558 mov eax, [eax+2] seg000:00DE355B mov eax, [eax] seg000:00DE355D ; no-operation zone seg000:00DE37F1 mov edi, eax seg000:00DE37F3 ; no-operation zone seg000:00DE3A8F seg000:00DE3A90 mov ecx, 4 seg000:00DE3A95 mov eax, 660h seg000:00DE3A9A shr eax, 3 ; eax == 0xcc here seg000:00DE3A9D ; no-operation zone seg000:00DE3D2E seg000:00DE3D2F repne scasb seg000:00DE3D31 test ecx, ecx seg000:00DE3D33 jz short loc_DE3D39 seg000:00DE3D35 rdtsc seg000:00DE3D37 push eax seg000:00DE3D38 retn |
Impact on Reverse Engineering: This anti-reverse engineering trick can be a real problem if one is not careful. One should prevent the debugging registers from being written to by anything that is not the debugger or control of the debugger will be lost and bad things will happen. This trick is not an issue, however, if static analysis in a disassembler is being performed.
Solution: This anti-reverse engineering measure is easy to prevent: just skip the section of code that is trying to mess up with the debugging registers or that is trying to find a debugger by looking at software breakpoints. One can also use hardware breakpoints, which don't require inserting a 0xcc into the program to cause a debugger trap.
In the previous section I mentioned that one of the biggest anti-reverse engineering mechanisms used in 0x90.exe is the use of what I call "the Brulez virtual machine". In this section I will go in more detail about how I discovered the virtual machine, what it is, how it works, and what it does.
At some point during the execution of 0x90.exe the register ESI was loaded with a certain memory address. The way the ESI register started to be used after that point caught my attention. It caught my attention because execution of the program started to be dependant on the value of the first two bytes at the address ESI was pointing at. These two bytes were always used as indexes into two of several jump tables, and the value of ESI would always change after performing a certain task. It seemed like the data pointed at by ESI was some kind of script that was controlling execution of the program.
So, I noticed the strange behavior but I did not make the connection immediately. Instead, I manually followed execution of 0x90.exe on paper/disassembler (not debugger), taking notes of the values of key registers and memory locations modified by the segments of code reached through the jump tables. The code reached through the jump tables seemed to be doing very specific and short tasks, like push a number to the stack, XOR a byte in memory with the contents of a register, etc. Then, the code would increment ESI by a short number (3, 5, 6, 7 - always depending on the value of the first two bytes pointed at by ESI) and another jump through the same tables would be performed.
If I remember correctly, it was after the ESI register went back to a value I had already seen that I said "oh my Gosh, there is a virtual machine here!". I then realized what I was dealing with and that the memory area pointed at by ESI was actually machine code for a virtual machine that I immediately called "Brulez" for obvious reasons. The next steps for were then to understand the instruction set of this new "microprocessor" so I could understand what the virtual machine program was doing. This is what allowed me to finally defeat the program.
The Brulez virtual machine has a very simple yet powerful enough (for the purposes it was designed) architecture. The processor has five 32-bit general purpose registers, named r0 through r4, and one flag register called nz (non-zero), which only holds two values: 1 if the result of a previous operation was non-zero, or zero if the result of a previous operation was zero. There is not a separate stack pointer (SP) register since the stack of the virtual machine is shared with the stack of the host processor. The host machine's ESI register acts as the Brulez virtual machine program counter (PC) register. Words are stored in memory in little endian byte order.
Brulez virtual machine instructions act on zero or more operands. Some operands are specified explicitly and other are implicit. The data for a source operand can be located in the instruction itself (an immediate operand), a register or a memory location.
Immediate Operands: some instructions use data that is encoded in the instruction itself as a source operand.
Register Operands: source and destination operands can be any of the general purpose registers r0 through r4.
Memory Operands: source and destination operands in memory are always referenced by an address stored in a general purpose register.
The instruction set of the Brulez virtual machine is pretty basic, but enough to accomplish the two tasks the virtual machine needs to: authenticate the user, decrypt two more sections of the binary, and print to standard output a message to indicate success or failure. Here I will list the instructions part of the instruction set classified by categories. In Appendix A, The Brulez Virtual Machine Instruction Set I present the entire instruction set listing opcodes and a short description for each instruction.
Data Transfer Instructions: ld (load), ldi (load immediate), ldbze (load byte zero extend), ldhze (load half-word zero extend), push, pushi (push immediate), pushi2 (push immediate 2), pop, clr (clear)
Arithmetic Instructions: add, addi (add immediate), subi (substract immediate), inc (increment), dec (decrement), ldc (load and compare)
Logical Instructions: xor, xori (xor immediate), ori (or immediate), andi (and immediate)
Control Transfer Instructions: call, ret (return), jmp, jnz (jump if not zero), jz (jump if zero)
String Instructions: strnchr (string character)
Bit and Byte Instructions: bswp (byte swap)
Miscellaneous Instructions: pcbe (pop, compare, and branch if equal), sys (system call), hlt (halt), trap (debugger trap), adjsp (adjust stack pointer)
The microcode (x86 machine code) of the Brulez virtual machine decodes instructions via data and jump tables. There is one data table for the first opcode of an instruction that contains the addresses of different data/jump tables. The second byte of the opcode is the used as an index into this second data/jump table. In one particular case (the xor instruction), there is a third table that is used to index based on the size of the operand (byte, half-word, and word.) Here is how these tables look like:
seg000:00E1B991 l1_jump_table dd offset off_E1B9A9 seg000:00E1B995 dd offset off_E1B9BD seg000:00E1B999 dd offset off_E1B9F5 seg000:00E1B99D dd offset off_E1BA25 seg000:00E1B9A1 dd offset off_E1BA29 seg000:00E1B9A5 dd offset off_E1BA35 seg000:00E1B9A9 ; seg000:00E1B9A9 ; level 2 jump table seg000:00E1B9A9 ; seg000:00E1B9A9 off_E1B9A9 dd offset pushi2 seg000:00E1B9AD dd offset pushi seg000:00E1B9B1 dd offset pushr seg000:00E1B9B5 dd offset pop seg000:00E1B9B9 dd offset adjsp seg000:00E1B9BD ; seg000:00E1B9BD off_E1B9BD dd offset loc_DF1A7B seg000:00E1B9C1 dd offset sub_DF442E+1 seg000:00E1B9C5 dd offset loc_DF59D2 seg000:00E1B9C9 dd offset xor seg000:00E1B9CD dd offset addi seg000:00E1B9D1 dd offset subi seg000:00E1B9D5 dd offset andi seg000:00E1B9D9 dd offset ori seg000:00E1B9DD dd offset xori seg000:00E1B9E1 dd offset add seg000:00E1B9E5 dd offset ldc seg000:00E1B9E9 ; seg000:00E1B9E9 ; this little table is used by the XOR instruction seg000:00E1B9E9 ; seg000:00E1B9E9 off_E1B9E9 dd offset xorb seg000:00E1B9ED dd offset xorh seg000:00E1B9F1 dd offset xorw seg000:00E1B9F5 ; seg000:00E1B9F5 off_E1B9F5 dd offset hlt seg000:00E1B9F9 dd offset sys seg000:00E1B9FD dd offset ldi seg000:00E1BA01 dd offset dec seg000:00E1BA05 dd offset inc seg000:00E1BA09 dd offset clr seg000:00E1BA0D dd offset strnchr seg000:00E1BA11 dd offset trap seg000:00E1BA15 dd offset ldmem seg000:00E1BA19 dd offset bswap seg000:00E1BA1D dd offset ldbze seg000:00E1BA21 dd offset ldhze seg000:00E1BA25 ; seg000:00E1BA25 off_E1BA25 dd offset pcbe seg000:00E1BA29 ; seg000:00E1BA29 off_E1BA29 dd offset jmp seg000:00E1BA2D dd offset jnz seg000:00E1BA31 dd offset jz seg000:00E1BA35 ; seg000:00E1BA35 off_E1BA35 dd offset call seg000:00E1BA39 dd offset ret |
There is no provision for handling invalid instructions - an invalid opcode will just hang the virtual machine.
Since 0x90.exe has an embedded virtual machine it is obvious that there is also an embedded virtual machine program. I have mentioned in previous sections what this virtual machine program does, but in this section I go into the details of how the program works and what it does. Let's start by showing the program, and then I'll explain the purpose of some blocks of code:
You have seen that the program authenticates the user by looking at the command line. In this section I will explain how to find the keys that the program will accept as valid.
As I mentioned in Section 2.3, “ x86 Machine Code Becomes Microcode (a.k.a. "the Machine Within") ”, in lines 21 to 40 the virtual machine program is checking the first four characters of the first argument to the program. In line 40 there is a comparison. Let's see what it is comparing: in one hand I have (0x5cc80e31 + 0x74519745 + 0xdeadbeef - 0x17854165 + 0x73686f77 - 0x206e6f20 + 0x6d657263 - 0x79212121) = 0xdf807499.
In the other hand the program is performing the following operation: (<first four characters of first arg to program> + 0x1d9bdc45 - 0xad45dfe2 + 0x68656c6c - 0x41776169 + 0x69747320 + 0x64726976 - 0x6e757473 - 0x65683f21) & 0xdfffffff. The result of this operation will be compared with 0xdf807499 (see previous paragraph) and if the two numbers are not equal the program bails out. So, we have one equation with one unknown which leads us to finding out the first 4 characters of the key:
(0xdf807499 - (0x1d9bdc45 - 0xad45dfe2 + 0x68656c6c - 0x41776169 + 0x69747320 + 0x64726976 - 0x6e757473 - 0x65683f21) ) & 0xdfffffff = 0x4e334431 = '1D3N' if the number is stored in little endian byte order (which it is since the Brulez processor is little endian.)
The next checks and conditions that the key must satisfy are a bit more challenging, but following a similar procedure to the one I presented above I can find some equations (in the equations that follow the variable arg[] is used to represent an array of characters that contains the first argument to the program. arg[0] is the first character of the first argument to the program, arg[1] is the second character and so on):
arg[4] ^ 0x47 must be equal to 0x2 to satisfy encrypted instruction at 0xe1bb54. This is because I know that the instruction at this address must be 3 bytes long for the rest of the program to be valid (i.e. no illegal instructions, etc.) and it must be an instruction that does not modify the stack pointer or anything else, and there is no other instruction whose first byte satisfy these conditions. This condition leads to arg[4] = 0x45 = 'E'.
To satisfy the load and compare (ldc) instruction inside the subroutine, the following equation must be satisfied: arg[4] + arg[6] = 0xa8. Since I now know arg[4] I can find arg[6]: arg[6] = 0xa8 - 0x45 -> arg[6] = 0x63 = 'c'.
Now, the second byte of the instruction at line 67 must be 3, 4, or 5 so the instruction ends up being "inc r3", "dec r3", "clr r3", or "bswp r3". So, arg[5] ^ 0x42 = {3, 4, 5, 9} -> arg[5] = 3 ^ 0x42 = 'A' or arg[5] = 4 ^ 0x42 = 'F' or arg[5] = 5 ^ 0x42 = 'G' or arg[5] = 9 ^ 0x42 = 'K'.
And finally, to satisfy the condition that the instruction at 0xe1bc11 must decode to a ret instruction (since it is the last instruction in a subroutine), I need: arg[5] + arg[7] + 1 - 0x4e = 0x42 -> arg[5] + arg[7] = 0x8f -> if arg[5] = 'A' -> arg[7] = 'N', or if arg[5] = 'F' -> arg[7] = 'I', or if arg[5] = 'G' -> arg[7] = 'H', or if arg[5] = 'K' -> arg[7] = 'D'.
Putting all the characters I have found in the correct order I get that the four possible keys are:
"1D3NEAcN"
"1D3NEFcI"
"1D3NEGcH"
"1D3NEKcD"
I realize that all this sounds confusing, but it is just additions, subtractions, XORs and a few conditions that have to be satisfied. Just follow the virtual machine program and do the math and you will understand.
After my analysis is done I can have a very accurate view of how the program the program is structured:
Here are a few interesting details and comments about 0x90.exe that did not fit anywhere else:
Q: How was 0x90.exe created? Who writes code that crazy? What language was used?
A: Well, 0x90.exe was obviously created by Nico Brulez. Because of all the low level details that I saw in the program, and all the anti-reverse engineering traps that I had to evade, I believe that the program was written entirely in assembler, i.e. no compiler intervention. However, Nico must have used a tool he wrote to automate some things such as the 175 decryption blocks and the no-operation blocks that were everywhere in the program.
"Evil Has No Boundaries !": right before starting the decryption of the 175 decryption blocks, 0x90.exe initializes the area of memory that will later be used to hold the Brulez virtual machine registers with the string "Evil Has No Boundaries !". This string will not show up when the strings command is run on 0x90.exe because it is created by doing 4-byte writes to the correct memory address. I think the string is there as a testament to the difficulty of breaking this binary due to all the evil anti-reverse engineering traps present in the program ;-)
"Exploit for it doesn't matter" - if you recall this is the message that is printed out after the "Welcome" message once a user authenticates successfully. This message is confusing and I don't have a full explanation for it. One possible explanation is that 0x90.exe has a vulnerability that in theory allows for code execution. One place where this is a possibility is in the authentication code: if you follow carefully Section 2.5, “Discovering Authentication Keys”, you will notice that two letters of the key are used to decrypt (via a simple XOR) two different virtual machine instructions. This means that by carefully choosing the key one could (theoretically) control execution of the program.
Why the name 0x90?: 0x90 is the opcode for the NOP instruction (no-operation) in the x86 instruction set. Other than printing one message if the user has not authenticated, or a different one if the user has authenticated, the program does nothing.
0x90.exe has been defeated. Nothing is unbreakable and anyone with enough time, patience and motivation will always be able to get in; it is only a matter of time. Things can be made complicated and highly involved so most people give up somewhere along the way (or maybe even before starting) but there will always be someone, somewhere, that will take things until their last consequences.
And to finish, one final comment: in my opinion there are two ways to break a program like 0x90.exe: a brute force approach, and the elegant approach. The brute force approach does not require coming up with valid keys to run the program - all that is necessary is to patch the program at a specific location and force it, depending on the situation, to make or not make a jump. The elegant approach consists of understanding how the keys are generated and generating your own keys.
The thing is, both approaches require a decent amount of reverse engineering because you need to decrypt the program so you can patch it (in the case of the brute force method), and for this to happen it is necessary to evade the anti-reverse engineering traps and to understand how the program is encrypted. Once the encryption has been beaten the extra step of understanding the key generation algorithm is the fun part.
This challenge has been a great educational experience. I personally had never seen the use of a virtual machine in a piece of malware. Some of the other anti-reverse engineering tricks were also new to me.
If you have followed this far I hope you have learned something.
This appendix contains a description of every instruction in the instruction set of the Brulez virtual machine. For each instruction the length in bytes is given and if any operands are encoded instructions for decoding them are provided.
0, 0 = pushi2 n. n is encoded. decode by XORing with 0x37195411 (size = 6)
+----+----+----+----+----+----+ | 0 | 0 | # | # | # | # | +----+----+----+----+----+----+ |
0, 1 = pushi n. n is encoded. Add 0xadd01337 to get value (size = 6)
+----+----+----+----+----+----+ | 0 | 1 | # | # | # | # | +----+----+----+----+----+----+ |
0, 2 = push rx. x is encoded. XOR with 0x47 to get x (size = 3)
+----+----+----+ | 0 | 2 | x | +----+----+----+ |
0, 3 = pop rx. x is encoded. XOR with 0x66 to get x (size = 3)
+----+----+----+ | 0 | 3 | x | +----+----+----+ |
0, 4 = adjsp # (size = 3)
Adds # to the stack pointer. # is encoded. XOR with 0x45 to get value.
+----+----+----+ | 0 | 4 | # | +----+----+----+ |
1, 3 = xor (rx), ry (size = 5)
+----+----+----+----+----+ | 1 | 3 | y | Sz | x | +----+----+----+----+----+ |
Sz = size: 0 byte, 1 half word, 2 word
1, 4 = addi rx, #. x is encoded. substract 3 to get x. set zero flag based on result (0 if result 0, 1 if not) (size = 7)
+----+----+----+----+----+----+----+ | 1 | 4 | x | # | # | # | # | +----+----+----+----+----+----+----+ |
1, 5 = subi rx, #. x is encoded. substract 2 to get x. set zero flag based on result (0 if result 0, 1 if not) (size = 7)
+----+----+----+----+----+----+----+ | 1 | 5 | x | # | # | # | # | +----+----+----+----+----+----+----+ |
1, 6 = andi rx, #. x is encoded. substract 5 to get x. set zero flag based on result (0 if result 0, 1 if not) (size = 7)
+----+----+----+----+----+----+----+ | 1 | 6 | x | # | # | # | # | +----+----+----+----+----+----+----+ |
1, 7 = ori rx, #. x is encoded. substract 4 to get x. set zero flag based on result (0 if result 0, 1 if not) (size = 7)
+----+----+----+----+----+----+----+ | 1 | 7 | x | # | # | # | # | +----+----+----+----+----+----+----+ |
1, 8 = xori rx, #. x is *not* encoded. set zero flag based on result (0 if result 0, 1 if not) (size = 7)
+----+----+----+----+----+----+----+ | 1 | 8 | x | # | # | # | # | +----+----+----+----+----+----+----+ |
1, 9 = add rx, ry. x is encoded; substract 3 to get x. y is encoded; substract 1 to get y. set zero flag based on result (0 if result 0, 1 if not) (size = 4)
+----+----+----+----+ | 1 | 9 | y | x | +----+----+----+----+ |
1, 10 = ldc rx, ry. Load and compare. x is encoded; substract 1 to get x. y is encoded; substract 2 to get y. Load rx with contents of ry. reset zero flag if rx is equal to ry. set zero flag otherwise (size = 4)
+----+----+----+----+ | 1 | 10 | y | x | +----+----+----+----+ |
2, 0 = hlt - exit Virtual Machine (size = 2)
+----+----+ | 2 | 0 | +----+----+ |
2, 1 = sys - do a syscall, store 32-bit result in r0. Set flag based on result (0 if result 0, 1 otherwise.) The address of the IAT entry for the system call to make is encoded. Add 0xFEA731DE to obtain real address (size = 6)
+----+----+----+----+----+----+ | 2 | 1 | # | # | # | # | +----+----+----+----+----+----+ |
2, 2 = ldi rx, #. # is encoded. add 0xaefded04 to decode (size = 7)
+----+----+----+----+----+----+----+ | 2 | 2 | # | # | # | # | x | +----+----+----+----+----+----+----+ |
2, 3 = dec rx. set flag (0 if result 0, 1 if 1) (size = 3)
+----+----+----+ | 2 | 3 | x | +----+----+----+ |
2, 4 = inc rx. set flag (0 if result 0, 1 if 1) (size = 3)
+----+----+----+ | 2 | 4 | x | +----+----+----+ |
2, 5 = clr rx. set rx to 0. sets zero flag to zero (size = 3)
+----+----+----+ | 2 | 5 | x | +----+----+----+ |
2, 6 = strnchr(): searches for a specific char in the string pointed by r0. if found leaves in r0 a ptr to the found char. if not found resets the zero flag. only a maximum number of characters is searched (size = 5)
2, 7 = trap. Produce a debugger trap to debug microcode (x86 assembly) (size = 2)
+----+----+ | 2 | 7 | +----+----+ |
2, 8 = ld rx, (ry). load rx with contents of memory address pointed at by the contents of ry (size = 4)
+----+----+----+----+ | 2 | 8 | y | x | +----+----+----+----+ |
2, 9 = bswap rx (size = 3)
+----+----+----+ | 2 | 9 | x | +----+----+----+ |
(buggy microcode since bswap output not being written to register)
2, 10 = ldbze rx, (ry) - load byte zero extend (size = 4)
+----+----+----+----+ | 2 | 10 | y | x | +----+----+----+----+ |
2, 11 = ldhze rx, (ry) - load half word zero extend (size = 4)
+----+----+----+----+ | 2 | 11 | y | x | +----+----+----+----+ |
3, 0 = pcbe (pop, compare and branch if equal). compares the two words in the stack and branch if they are equal. the words are removed from the stack. destination address is encoded. substract 0x31337 to get real address (size = 6)
4, 0 = jmp to encoded dest. Destination is decoded by adding 0xdeadead (size = 6)
+----+----+----+----+----+----+ | 4 | 0 | # | # | # | # | +----+----+----+----+----+----+ |
4, 1 = jnz to encoded dest. Destination is decoded by adding 0xe1ba3e (size = 6)
+----+----+----+----+----+----+ | 4 | 0 | # | # | # | # | +----+----+----+----+----+----+ |
4, 2 = jz to encoded dest. Destination is decoded by adding 0xe1ba41 (size = 6)
+----+----+----+----+----+----+ | 4 | 0 | # | # | # | # | +----+----+----+----+----+----+ |
5, 0 = call addr (size = 6)
+----+----+----+----+----+----+ | 5 | 0 | A | B | C | D | +----+----+----+----+----+----+ |
addr = D << 24 + C << 16 + B <<8 + A
5, 1 = ret (size = 2)
+----+----+ | 5 | 1 | +----+----+ |
The following files where generated during this Scan of the Month:
hide.idc: IDA IDC program to hide sections of code in a program. The addresses of the sections to hide are taken from an input file. Hiding sections of a program that only contain no-operation garbage makes the program a lot easier to follow and analyze.
xor.idc: IDA IDC program to XOR a section of a program with a byte specified by the user. I used this to decrypt from within IDA the two sections of the program that get decrypted by the Brulez virtual machine. One section was decrypted by XORing with 0x53 and the other one by XORing with 0x43. See Section 2.2.5, “Heavy Use of Encryption” for details.
0x90-n2.idb: IDA Pro database that I used until the end when reverse engineering 0x90.exe.
decrypt-0x90.c: C program used to automatically decrypt the two chains that contain a combined total of 175 decryption blocks, patch the program so these chains are not decrypted, and neutralize all the exception handlers wrecking havoc when the program is run under a debugger.
XML sources for this document: this HTML document was generated using DocBook XML. This directory contains all the files used in the generation of this document.
Performance monitoring, a x86 assembly language gem from the the Assembly Gems page.
A Crash Course on the Depths of Win32 Structured Exception Handling: excellent Microsoft Systems Journal article by Matt Pietrek.
An In-Depth Look into the Win32 Portable Executable File Format : another Microsoft Systems Journal article by Matt Pietrek.
The submission was prepared using DocBook XML, and then converted to HTML using Tim Waugh's xmlto XML converter. Editing was done on a Linux desktop using GNU Emacs and the PSGML major mode.
All the anti-reverse engineering work was done on a Linux desktop using a Linux native version of the IDA disassembler. Machine virtualization on the same desktop was used to run the program in a debugger.
Very special thanks to ...
Nick DeBaggis for his insightful comments and suggestions while I was working on this. A couple of findings presented in this document were pointed out by him (sorry for not being more specific dude!)
The fine folks at DataRescue for creating and maintaining IDA Pro, and specially for producing a native Linux version.
Nico Brulez, architect of the Brulez Virtual Machine and of this challenge, for designing such a great educational experience and for providing some fine hours of entertainment.
As always, the Honeynet Project for coming up with these nice and entertaining challenges.