Honeynet Project Scan of the Month - Scan 33 (November 2004)

Honeynet Project Scan of the Month - Scan 33 (November 2004)

Submission by Eloy Paris <peloy at chapus dot net>
Fri Dec 3 18:34:47 EST 2004

"I've seen things... things you little people wouldn't believe... Attack ships on fire off the shoulder of Orion bright as magnesium... I rode on the back decks of a blinker and watched c-beams glitter in the dark near the Tanhauser Gate. All those moments... they'll be gone." Roy Batty (Nexus 6. Incept date 2016)

Abstract

This is a submission to the Honeynet Project's November 2004 Scan of the Month.

The analysis is in some parts very detailed. If you are a grader, have several submissions to read and can't go over all the details, or are just a casual reader, feel free to go directly to Section 1, “Questions”, which contains the answers to all the questions of this challenge. The questions (and answers) provide a good summary of the most important aspects of this challenge. However, just reading the answers is not the best way to learn and understand some of the details nor the process I followed to analyze 0x90.exe, so I encourage you to read the whole submission.

Table of Contents

1. Questions

1.1. Q1: Identify and explain any techniques in the binary that protect it from being analyzed or reverse engineered.
1.2. Q2: Something uncommon has been used to protect the code from being reverse engineered, can you identify what it is and how it works?
1.3. Q3: Provide a means to "quickly" analyze this uncommon feature.
1.4. Q4: Which tools are the most suited for analyzing such binaries, and why?
1.5. Q5: Identify the purpose (fictitious or not) of the binary.
1.6. Q6: What is the binary waiting from the user? Please detail how you found it.
1.7. Bonus: What techniques or methods can you think of that would make the binary harder to reverse engineer?

2. Analysis

2.1. Initial Inspection

2.2. Anti-Reverse Engineering Techniques and How to Defeat Them

2.2.1. Pushing the Limit of PE Headers
2.2.2. No-Operation Blocks
2.2.3. Use of RDTSC to Measure Cycles
2.2.4. Heavy Use of Windows' Structured Exception Handling Infrastructure
2.2.5. Heavy Use of Encryption
2.2.6. Use of Virtual Machines
2.2.7. Windows API Fun
2.2.8. Killing the Debugger

2.3. x86 Machine Code Becomes Microcode (a.k.a. "the Machine Within")

2.3.1. Oh My! There's a Virtual Machine Here!
2.3.2. Brulez Virtual Machine Architecture
2.3.3. Operand Addressing
2.3.4. Instruction Set
2.3.5. Instruction Decoding

2.4. The Brulez Virtual Machine Program Inside 0x90.exe

2.5. Discovering Authentication Keys

2.6. Program Structure

2.7. Other Interesting Details

3. Conclusion

A. The Brulez Virtual Machine Instruction Set

B. Files

C. References

D. Work Methodology

E. Acknowledgements

1. Questions

1.1. Q1: Identify and explain any techniques in the binary that protect it from being analyzed or reverse engineered.

A: The following are the anti-reverse engineering traps and techniques that I found while analyzing 0x90.exe:

Pushing the limits of the Windows Portable Executable (PE) format by tampering with some fields in the PE and section header so disassemblers like IDA Pro and objdump don't have a chance. This technique can be defeated by loading the file in the disassembler as a binary object as opposed to a PE executable.
Use of no-operation blocks: these blocks consisted of big numbers of instructions that served no purpose other than to hide the real instructions. These blocks were everywhere and slowed the analysis.
Use of the RDTSC (read timestamp counter) instruction to measure the number of cycles needed to execute sections of code, and abort the program if it looked like it took too long to execute, maybe due to the presence of a debugger.
Heavy use of Windows' Structure Exception Handling infrastructure. In particular, 1) attempting to kill debuggers from within an exception handler by writing zeroes to the microprocessor's debugging registers (DR0, DR1, DR2, DR3, DR6 and DR7), and 2) changing program flow from within an exception handler by directly manipulating the value of the EIP register saved in the stack as part of the _CONTEXT structure (see winnt.h)
Heavy use of encryption in different sections of the binary. These sections are decrypted on the fly and in different stages, most of them right before the code in an encrypted zone is executed. The huge number of decryption areas represents also a problem to the reverse engineer - in one instance there are 175 encrypted blocks that are being decrypted from different sections of the binary.
Use of a virtual machine to decrypt/unpack some parts of the binary, and to perform the function the program was created for. This might be the most difficult barrier to analysis and reverse engineering of this binary.
Very twisted way of calling library functions from the Windows API: all calls to the Windows API are made from the same place. The call instruction that invokes the API functions is modified right before making the call so the wanted function is called.
Attempts to detect software breakpoints (int 0x3 - opcode 0xcc) at the entry point of Windows API functions and kill the debugger if one is found.

Details about all these techniques, including their impact on the reverse engineering process and how they can be neutralized can be found in Section 2.2, “Anti-Reverse Engineering Techniques and How to Defeat Them”.

1.2. Q2: Something uncommon has been used to protect the code from being reverse engineered, can you identify what it is and how it works?

A: What sets this binary apart is the use of a virtual machine to do an important portion of the decryption/unpacking process. In fact, the same virtual machine that is used to unpack the binary at run time is also used to perform the actual work of the program. The use of this technique really raises the bar in terms of the resources (patience, skills, creativity and time) required to break the protection of the binary and to understand the purpose in life of the protected program.

More details about the "Brulez" virtual machine can be found in Section 2.3, “ x86 Machine Code Becomes Microcode (a.k.a. "the Machine Within") ”.

1.3. Q3: Provide a means to "quickly" analyze this uncommon feature.

A: In my opinion, there is no "quick" way to analyze a program that uses virtual machines to prevent reverse engineering, especially if the virtual machine is unknown and designed specifically to prevent reverse engineering. I believe that to be able to break the protection of a program that uses virtual machines as a shield against reverse engineering, the virtual machines have to be reasonably well understood because the extra level of abstraction that the virtual machine provides requires added complexity and in particular a higher number of real machine instructions per virtual machine instruction.

Quickly analyzing a virtual machine (as opposed to a program that uses virtual machines) is also something that in my opinion is hard to accomplish mainly because it might not be evident that one is dealing with a virtual machine. Experience, as always, will help.

I discovered I was dealing with a virtual machine when I determined that the x86 machine code was following some kind of "script" that was being tracked by the ESI register. After statically (in the disassembler, as opposed to the debugger) following execution of this script for a few x86 instructions I noticed that the program was using a few jump tables that were indexed by bytes in this "script" pointed at by the ESI register. I also noticed that each time that a jump was made through this table, a certain region of memory was being modified. It did not take long to realize that the "script" I was following was just machine code for what I later called "the Brulez Virtual Machine", and that this region of memory being modified was actually reserved for registers of the virtual machine.

The Brulez virtual machine turned out to be very simple and to have a small instruction set. The virtual machine program that was embedded in the binary was also simple, which allowed me to reverse engineer it. However, there was no magic formula or silver bullet - it was a pretty manual process that I don't see an easy way to automate.

Something that would probably help is to have an IDA processor module for the virtual machine being reverse engineer, but this is actually a Catch-22 since you can't have a disassembler for a virtual machine you still do not understand.

1.4. Q4: Which tools are the most suited for analyzing such binaries, and why?

A: The most valuable tool to analyze a binary armored with some of the protections against reverse engineering that I outlined in Section 1.3, “ Q3: Provide a means to "quickly" analyze this uncommon feature. ” is a good disassembler. The reason for this is that a heavily armored binary will try hard to kill a debugger and to prevent all attempts to get to the core program, so using a debugger is not the best option. Static code analysis with a disassembler, and especially an "interactive" disassembler like IDA, is, at least to me, the best option. A lot of tasks (like decryption of encrypted sections) could be performed from within the disassembler by using its C-like language.

1.5. Q5: Identify the purpose (fictitious or not) of the binary.

A: The 0x90.exe executable does only one thing: to print a welcome message if the correct key has been provided by the user.

C:\tmp>0x90
Please Authenticate!
C:\tmp>0x90 1D3NEAcN
Welcome...
Exploit for it doesn't matter 1.x Courtesy of Nicolas Brulez
C:\tmp>

The program does absolutely nothing malicious and can be run without fears.

1.6. Q6: What is the binary waiting from the user? Please detail how you found it.

A: The binary is expecting a key or serial number that authenticates the user. The main indication that the program is expecting some kind of authentication from the user is that when the program is run the message "Please Authenticate!" is displayed, as can be seen in the output shown in Section 1.5, “ Q5: Identify the purpose (fictitious or not) of the binary. ”.

The binary looks for this key in the command line, and there are two possible ways to realize this: the first one is by observing that the Imports Table of 0x90.exe includes an entry for the Windows API function GetCommandLine() (this does not mean much, though, since it could be a reverse engineering trick.) The second way, and the only one that is 100% guaranteed to be effective is to fully reverse engineer the binary and understand where it is looking for the authentication information. This is the approach that I took, and it has the added benefit that it allows to reverse engineer the key or serial format so the program can be run without patching it.

There are four possible keys that the program will accept. These keys are (not including the quotes):

"1D3NEAcN"
"1D3NEFcI"
"1D3NEGcH"
"1D3NEKcD"

To come up with these keys I had to fully understand the Brulez Virtual Machine program embedded in 0x90.exe. This VM program was the one actually checking the validity of the key. Basically, I had to understand what operations were being performed on the key, come up with a set of constraints a key has to satisfy, and then work backwards. The details about this procedure are too lengthy for this section of the report so please refer to Section 2.5, “Discovering Authentication Keys” for full details.

1.7. Bonus: What techniques or methods can you think of that would make the binary harder to reverse engineer?

A: These are a few things that can be implemented to increase the time and skills required to break the protection of 0x90.exe:

Get rid of the pusha/popa instructions that enclose no-operation zones. This way it will not be trivial to identify no-operation zones so they can be hidden. Registers could be saved in the stack one by one, or in a special memory area, or only those registers modified by the no-operation zone instructions could be saved.
Get rid of the markers (the 0xdead and 0x31000) that are used to identify each decryption block. Having these markers in there makes it just to easy to automatically decrypt the section of the binary the markers deal with. Each decryption block in the chain should be different so finding a pattern is very hard to do. The idea would be to create a tool that makes creation of the decryption blocks very easy, but that makes identification of them very hard.
Monitor changes in the debugging registers: if during execution the program checks for the value of the debugging registers at two different point and it finds that some values have changed that means someone is messing around with a debugger and the program should bail out.
Implement a more complicated virtual machine: more instructions are needed. Also, the current implementation of the instruction decoder via tables is too easy to figure out. A more manual process should be implemented and decoding of virtual machine instructions should be obfuscated.
Use a virtual machine and write no-operation zones in the virtual machine program. This would keep the reverse engineer busy for a while.
The entire real program could be run in a virtual machine, as opposed to just parts of it. I dread a program written in a high level language like Visual Basic because one high-level language constructs translates into lots of low-level language instructions and unintelligible library calls. So, a high level language can be seen as a virtual machine.
Detect machine virtualization software like VMware since chances are that if someone is running the binary in a virtual machine that user is trying to reverse engineer it.
Use of strong encryption: this is one that would be impossible to break. If the binary were encrypted and only a way-one password could decrypt it then it would be impossible to reverse engineer the program unless it could be caught decrypted in memory while running.

Note that these are just some examples of anti-reverse engineering techniques and that none of them is a silver bullet (except maybe the last one above, encryption) - anyone with sufficient determination will always find a way to break the protection.

2. Analysis

What follows is a comprehensive analysis of the 0x90.exe binary.

2.1. Initial Inspection

The 0x90.exe is a Win32 console application. The objdump from the binutils package helped me to get an initial idea of what I was dealing with.

During the initial and very superficial inspection there are two things that I noticed immediately: first, that the strings command is useless. Running this command only gives us the following "useful" information:

(You really thought you would find strings eh? ;-)
Scan of the month coded by Nicolas Brulez / Digital River

This is to be expected, though, since these days 99.9% of malware is packed with some kind of executable packer for two reasons: 1) to prevent the casual user from finding useful information (like the FQDN of an IRC server hosting a botnet), and 2) to minimize the size of the executable.

The second thing that I noticed was that objdump, IDA Pro, and other tools to explore PE files fail miserable to disassemble the binary or to obtain some information about it. As I will show, the reason for this is that the section headers, and the PE header in general, have been tampered with to cause troubles.

Finally, as part of a very superficial and initial analysis, I always like to run the malware to try to obtain as much information as possible about its purpose. Needless to say, running malware is dangerous, and must be done in a sandbox in an air-gapped network and in a machine that can be quickly restored to the state prior to running the malware. I usually use a VMware virtual machine, which does not always work due to the fact that today's malware attempts to detect whether it is running on a virtual machine. So, having said this, I ran the malware and got the following output:

C:\tmp>0x90
Please Authenticate!

It is important to note that running the malware several times in short succession will cause it to apparently hang or not produce any output at all some times. I will speculate on the reason for this behavior later on.

2.2. Anti-Reverse Engineering Techniques and How to Defeat Them

After a short initial and superficial inspection I proceeded to use the real tools that could potentially help to get this job done: a disassembler and a debugger. Using these tools I encountered a good number of anti-reverse engineering traps of various level of complexity. In this section I will present the traps I found, what their impact was, and how I managed to evade or neutralize them.

2.2.1. Pushing the Limit of PE Headers

What: As I mentioned in Section 2.1, “Initial Inspection”, the 0x90.exe program could not be loaded in a disassembler as a Windows PE executable. The reason for this is some non-sense values in the PE header and section header. These non-sense values obviously do not stop the Windows executable loader for some reason.

Impact on Reverse Engineering: Not being able to disassemble the program you are trying to reverse engineer is a big problem. You can always resort to the debugger, but for really complicated tasks, a debugger alone is not enough: for heavily armored and powerful binaries, powerful reverse engineering tools are called for, so being able to load the file in a disassembler is a must.

Solution: I was short on time so I did not even look at the PE and section headers to see what Nicolas Brulez had done to them. I just loaded the file in IDA Pro using the binary loader (as opposed to the PE loader that was not getting me anywhere.) When using the binary loader I did load the program at offset 0xde0000, which is the image base reported by objdump. objdump also reported the program's entry point (at 0xde2000), so I was set to start analyzing the program.

2.2.2. No-Operation Blocks

What: The first thing that I found as soon as I started to trace through 0x90.exe in a debugger (or while looking at the disassembly) was that there were lots of instructions. This by itself is not a problem because programs, after all, are made out of instructions, but when you realize that these instructions are doing nothing then this might represent a problem. These no-operation instructions appeared always in blocks enclosed by a pair of pusha/popa instructions. Here's a short example of one of them:

seg000:00DE2012			pusha
seg000:00DE2013			mov	al, bl
seg000:00DE2015			db	65h, 36h
seg000:00DE2015			rep mov	ecx, edx
seg000:00DE201A			repne jmp short	loc_DE201E
seg000:00DE201D			db 0ABh
seg000:00DE201E
seg000:00DE201E	loc_DE201E:				; CODE XREF: seg000:00DE201A
seg000:00DE201E			mov	esi, esi
seg000:00DE2020			db	26h
seg000:00DE2020			repne lea ebx, es:0D0E47C6Eh
seg000:00DE2028			db	65h, 64h
seg000:00DE2028			repne mov ah, 51h ; 'Q'
seg000:00DE202E			jmp	short loc_DE2031
seg000:00DE2030			db  27h	; '
seg000:00DE2031	loc_DE2031:				; CODE XREF: seg000:00DE202E
seg000:00DE2031			db	2Eh
seg000:00DE2031			mov	al, ch
[...]
seg000:00DE227F			db	65h
seg000:00DE227F			mov	dh, al
seg000:00DE2282			mov	ecx, 7D30653Fh
seg000:00DE2288			popa

Note how the code inside this no-operation block is meaningless, especially because 1) nothing is ever written to memory, and 2) only registers are affected, but the block begins with a pusha instruction and ends with a popa instruction, which destroys anything loaded into the registers within the block. Also, note how instruction prefixes like cs:, ds:, repe, repne, are being used for instructions that do not need them. For example, a "mov al, ch" instruction does not need a "repe" prefix, this does not make any sense since this prefix is used by string operations (scansb, movsb, etc.) I am surprised that the processor doesn't produce an illegal instruction trap or something, but it apparently can take all this crap.

Impact on Reverse Engineering: These no-operation blocks are everywhere and their impact is to tremendously slow down the reverse engineering process. In a disassembler is not that big of a problem because I can search for the closing popa of the block, but in a debugger it causes more problems.

Solution: Since this no-operation blocks are very easy to recognize it is possible to filter them out in the disassembler. One way of doing this is to identify most occurrences of these blocks and then tell the disassembler to hide these portions of code. To identify these portions of code I generated a listing of the disassembly and then used the Unix grep in the following way:

grep 'pusha\|popa' listing.lst  > addresses.txt

This gave me a file with the beginning and ending addresses of the no-operation blocks. With this file I then created a small IDC program that just read this input file and then told the debugger to hide all this blocks. The addresses.txt was not perfect but it was a good starting point to hide all these blocks that were just slowing me down.

Here's how the above code segment looks after hiding all the no-operation zones (note the "no-operation zone" comments and how many bytes of sequential memory they cover):

seg000:00DE2009			sub	eax, 6
seg000:00DE200C			sub	ebp, 0DE2006h
seg000:00DE2012	; no-operation zone
seg000:00DE2289	; no-operation zone
seg000:00DE22F1	; no-operation zone
seg000:00DE259A
seg000:00DE259B	; no-operation zone
seg000:00DE2603			mov	[ebp+0E26441h],	eax
seg000:00DE2609	; no-operation zone
seg000:00DE2671	; no-operation zone
seg000:00DE2950			mov	eax, 0DE100Dh
seg000:00DE2955	; no-operation zone
seg000:00DE2C2C			mov	eax, [eax+2]
seg000:00DE2C2F	; no-operation zone
seg000:00DE2EF1			mov	eax, [eax]
seg000:00DE2EF3	; no-operation zone
seg000:00DE2F5B			mov	edi, eax
seg000:00DE2F5D			mov	ecx, 4

2.2.3. Use of RDTSC to Measure Cycles

What: It was common to see code like the following:

seg000:00DE22B9			cpuid
seg000:00DE22BB			rdtsc
seg000:00DE22BD			sub	eax, [esp]
seg000:00DE22C0			add	esp, 4
seg000:00DE22C3			cmp	eax, 0E0000h
seg000:00DE22C8			ja	short loc_DE22CD
seg000:00DE22CA			xor	eax, eax
seg000:00DE22CC			retn

seg000:00DE22CD	loc_DE22CD:
seg000:00DE22CD			add	dword ptr [ecx+0B8h], 63h ; 'c'
seg000:00DE22D4			xor	eax, eax
seg000:00DE22D6			retn

This code is basically using the rdtsc instruction to measure the number of cycles that it takes to execute a section of code. This information is then used to kill the program if it is detected that too many cycles have passed (0xe0000 in this case.) The program is being killed by adding some random junk to the saved EIP register (this particular code is being executed inside an exception handler.)

Impact on Reverse Engineering: This does not affect the reverse engineering process when analyzing code in the disassembler but can potentially cause problems in the debugger if one is not careful and steps through this code.

Solution: If stepping through code in the debugger just skip over this code by changing the value of the EIP register.

Note

This rdtsc non-sense is what is causing the program to behave erratically (see last paragraph of Section 2.1, “Initial Inspection”) since it gives false positives ("positive" here meaning "you took too long to execute this, just must be stepping through code in a debugger") that then hang the program based on the decision it makes when a positive is detected.

2.2.4. Heavy Use of Windows' Structured Exception Handling Infrastructure

What: The program is taking advantage of Windows' Structure Exception Handling infrastructure to wreck havoc. Basically, the program sets exception handlers at lots of different places in the program and then generates an exception (trying to dereference a NULL pointer.) The exception handler then takes control and does some nasty things. In particular, different types of exception handlers were : 1) trying to kill the debugger from within an exception handler by writing zeroes to the microprocessor's debugging registers (DR0, DR1, DR2, DR3, DR6 and DR7), and 2) changing program flow from within the exception handler by directly manipulating the value of the EIP register saved in the stack as part of the context structure.

Impact on Reverse Engineering: The user of Windows SEH infrastructure is a major problem - it basically make it impossible to debug the program.

Solution: Solving this problem is tricky. The solution that I implemented was to find all the places with exception handlers and patch them so they are not executed. For those exception handlers that were changing program flow I just changed the program flow from within the main program by putting in place jumps to the final destination. The actual patching was done by a small C program (decrypt-0x90.c) that I wrote, although it could have been done easily in IDA's IDC language.

The way to use decrypt-0x90.c is as follow: you run it a give as command-line argument the name of the 0x90.exe you want it to patch for you. This file will be patched and then you can use that to run in the debugger without fearing the exception handlers. You can also load this file in the disassembler and analyze that one instead of the original 0x90.exe. I used both techniques.

2.2.5. Heavy Use of Encryption

What: Hundreds of sections of the binary were "encrypted". They were "encrypted" with single XOR operations, but encrypted nonetheless. The problem was not the encryption method itself, because both the keys used to encrypt and the sections within the file that were encrypted were easy to find. The problem was the big number of these encrypted sections.

It is interesting to delve into the details of how all the encrypted sections of the binary were organized:

The first encryption section was what I call a "chain" of 173 encrypted blocks. Each one of this blocks had the following structure (taking from the C program I wrote to decrypt them):

struct decryption_block {
    char decrypt_code[155];
    char nop_code[8];
    char get_size_code[13];
    char get_addr_code[14];
    unsigned marker1; /* 0xdead */
    unsigned code_offsets[4];
    unsigned marker2; /* 0x31000 */
    unsigned return_offsets[4];
} __attribute__((packed));

All blocks had the same structure. The first four arrays are used to account for x86 code that performs four functions: do the actual decryption, do nothing, get the size of the code to decrypt and get the starting address of the block to decrypt. marker1 is a constant marker that is always the 32-bit word 0xdead (yeah, no comment here). Following that are the offsets, from the image base, to the four functions described above. marker2 is another marker that is always 0x31000. Following that marker there are four 32-bit words that are used to calculate the return address from the four functions described above. Functions are called in the following order: get address, do nothing, get size of block to decrypt, and decrypt. The decryption code is very simple and always the same:

void
decrypt(char *buffer, unsigned size)
{
    do {
        *buffer++ ^= size;
    } while (--size);
}

This first chain had a total of 173 of these decryption blocks. They were located at the very top of the binary, i.e. higher memory addresses, with the first block in the chain at higher memory addresses with each subsequent block at lower memory addresses. Each decryption blocked was responsible for decrypting the code that was going to be executed next.

After the first chain, there was a second chain, and a different address (that's why I call this a different chain). This one only had two decryption blocks, but they had the same structure I presented above.

Impact on Reverse Engineering: These hundreds of encrypted sections made it impractical to use a debugger for several reasons. First, I could not just set a breakpoint at the end of all decryption blocks because I did not know where the encrypted blocks were going to end. Second, because the exception handlers I presented above would have killed the debugger. Third, because, as I later found out, the decryption blocks were so many that stepping through all of them would have been impractical. In addition to the inability to efficiently use a debugger to go through the decryption process, another impact of the heavy use of encryption in this binary on the reverse engineer is that the file is very hard to analyze, even statically in a disassembler. The reason for this is that there isn't a single subroutine that decrypts the entire file, all sections. Instead, decryption was done in very different places, and each time something got decrypted it was to execute it next.

Solution: Solving this problem is also tricky. The solution that I implemented was to find all the places with exception handlers and patch them so they are not executed. For those exception handlers that were changing program flow (the ones inside the virtual machine microcode) put in place jumps to the final destination (the same thing that the actual exception handler had done if an exception had occurred.) The actual patching was done by the same (decrypt-0x90.c) that I described above in Section 2.2.4, “ Heavy Use of Windows' Structured Exception Handling Infrastructure ”. Again, this could have been accomplished without leaving IDA through IDA's IDC language.

decrypt-0x90.c decrypted the first two chains by traversing the decryption block structures in high memory. It did this by starting with the first one and keep going while the 0xdead and 0x31000 markers were in place; something like (important thing to see here is the for() loop):

#define MARKER1 0xdead
#define MARKER2 0x31000

void
decrypt_chain(struct decryption_block *d, char *buffer)
{
    unsigned decrypt_len, decrypt_addr;

    for (;d->marker1 == MARKER1 && d->marker2 == MARKER2; d--) {

        decrypt_len = *(unsigned *) &d->get_size_code[1];
        decrypt_addr = *(unsigned *) &d->get_addr_code[2];

        decrypt(decrypt_addr, decrypt_len);
    }
}

Once the two encryption chains have been decrypted I can patch the program to just jump over them, skipping all this mess. I can also use the resulting new binary as the new starting point of our disassembly, which is what I did.

As I will show next, after the two encryption chains were defeated I found a bigger challenge: what I call the "Brulez Virtual Machine". I will go into detail about this virtual machine in the next section but here I just want to point out that the virtual machine program performed decryption of two additional encrypted sections (one section was unconditionally decrypted, the last one only if the user authenticated correctly.) The first decryption performed by the virtual machine was an XOR with 0x53. The second decryption was an XOR with 0x43.

It is important to note that the final approach I used to neutralized most of the encryption was not the first one I tried. Just to give an idea of the wrong approach I originally was following, I was going through the first encrypted sections and learned quickly how to neutralize some of the reverse engineering techniques I have mentioned and that were being employed during the decryption process (no-operation blocks, exception handlers). I managed to get the program under a debugger to decrypt a section. Then I kept going and found another encrypted section. I decrypted that one, continued, and found another encrypted section. I decrypted that one too and continued hoping that it was the last one. I kept going for a few more sections until I realized that depending on the real number of encrypted sections I could be spending an unreasonable amount of time going through the encrypted sections manually so I decided to try a different approach.

2.2.6. Use of Virtual Machines

What: After defeating the two decryption chains I presented in detail in the previous section I found a worthier challenge in the code that followed the last decryption block of the last chain: a virtual machine that was responsible, through a program written in this virtual machine's machine code, for the decryption of additional sections of the binary, for user authentication, and for performing the actual task the program was created for, i.e. executing printf("Welcome").

Impact on Reverse Engineering: The impact of a virtual machine and the corresponding virtual machine program embedded in a program has a huge impact on a reverse engineering task. I have seen challenges like this when trying to reverse engineer programs written in very high level languages, especially interpreted languages, which can be seen as "virtual machines". The problem is that it is very difficult to follow what the program is doing, neither statically in a disassembler, nor dynamically in a debugger, because the program one is trying to understand becomes microcode for the higher level instructions of the virtual machine. So, one virtual machine instruction takes several host machine (in this case the x86 machine) instructions to complete. In addition, the virtual machine has to deal with decoding of the instructions of the virtual machine, which causes additional complexity and pain to the reverse engineer.

Solution: Once I realized that it was a virtual machine I was dealing with, and that there was virtual machine code embedded in the binary it was just a matter of reverse engineering the virtual machine, understanding this machine's instruction set, and finally understanding the embedded virtual machine program. This understanding of the virtual machine program is what finally allowed to break the binary and understand its purpose.

2.2.7. Windows API Fun

What: The 0x90.exe binary has a very peculiar way of calling functions from the Windows API, at least in the most sensitive places of the program (inside the virtual machine program): instead of using regular call that goes through the Import Address Table (IAT) to reach Windows API functions, the binary goes to the IAT, fetches the address of the Windows API function that is going to be called, modifies the operand of a call instruction based on what was fetched from the IAT, and then executes the call. This is a very twisted way of calling functions in the Windows API.

Impact on Reverse Engineering: This trick has a low impact on the reverse engineering process. The impact is higher is the reverser is using a debugger and setting breakpoints at the entry points of the Windows API functions: in this case the reverser will see that all calls to the Windows API will come from the same place in the program, which is very confusing. The impact on the reverse engineering process is very low if analysis of the program is being done statically in the disassembler.

Solution: I was not affected by this trick since by the time I encountered this technique I had already understood the instruction set of the virtual machine, which is where the calls to the Windows API are being made (GetCmdLine() and printf().

2.2.8. Killing the Debugger

What: In lots of different places 0x90.exe tries to kill the debugger, or more accurately, the program if it is determined that it is running under a debugger. Some ways the program or debugger is killed if it is determined that it is running under a debugger are: 1) by writing zeroes to the processor's breakpoint-address registers (DR0 through DR3), to the debug status register (DR6) and to the debug control register (DR7) from within all the exception handlers and 2) by jumping to a random address (the output from the RDTSC instruction, actually) in those instances when a debugger has been detected.

If I remember correctly there is only one instance where a debugger is explicitly searched for: at the very beginning of the program the first four bytes at the entry point of three Windows API function calls (GetCmdLine(), printf(), and ExitProcess() ) is searched for the presence of the byte 0xcc, which is corresponds to the instruction "int 3", and that is used to enter a debugger from a program. The code that does this search gets to the entry point of the API function by going through the IAT. Here's an example of this code. See how if a 0xcc byte is found the program jumps to a random address:

seg000:00DE3274			mov	eax, 0DE1001h
seg000:00DE3279	; no-operation zone
seg000:00DE3555
seg000:00DE3558			mov	eax, [eax+2]
seg000:00DE355B			mov	eax, [eax]
seg000:00DE355D	; no-operation zone
seg000:00DE37F1			mov	edi, eax
seg000:00DE37F3	; no-operation zone
seg000:00DE3A8F
seg000:00DE3A90			mov	ecx, 4
seg000:00DE3A95			mov	eax, 660h
seg000:00DE3A9A			shr	eax, 3 ; eax == 0xcc here
seg000:00DE3A9D	; no-operation zone
seg000:00DE3D2E
seg000:00DE3D2F			repne scasb
seg000:00DE3D31			test	ecx, ecx
seg000:00DE3D33			jz	short loc_DE3D39
seg000:00DE3D35			rdtsc
seg000:00DE3D37			push	eax
seg000:00DE3D38			retn

Impact on Reverse Engineering: This anti-reverse engineering trick can be a real problem if one is not careful. One should prevent the debugging registers from being written to by anything that is not the debugger or control of the debugger will be lost and bad things will happen. This trick is not an issue, however, if static analysis in a disassembler is being performed.

Solution: This anti-reverse engineering measure is easy to prevent: just skip the section of code that is trying to mess up with the debugging registers or that is trying to find a debugger by looking at software breakpoints. One can also use hardware breakpoints, which don't require inserting a 0xcc into the program to cause a debugger trap.

2.3. x86 Machine Code Becomes Microcode (a.k.a. "the Machine Within")

In the previous section I mentioned that one of the biggest anti-reverse engineering mechanisms used in 0x90.exe is the use of what I call "the Brulez virtual machine". In this section I will go in more detail about how I discovered the virtual machine, what it is, how it works, and what it does.

2.3.1. Oh My! There's a Virtual Machine Here!

At some point during the execution of 0x90.exe the register ESI was loaded with a certain memory address. The way the ESI register started to be used after that point caught my attention. It caught my attention because execution of the program started to be dependant on the value of the first two bytes at the address ESI was pointing at. These two bytes were always used as indexes into two of several jump tables, and the value of ESI would always change after performing a certain task. It seemed like the data pointed at by ESI was some kind of script that was controlling execution of the program.

So, I noticed the strange behavior but I did not make the connection immediately. Instead, I manually followed execution of 0x90.exe on paper/disassembler (not debugger), taking notes of the values of key registers and memory locations modified by the segments of code reached through the jump tables. The code reached through the jump tables seemed to be doing very specific and short tasks, like push a number to the stack, XOR a byte in memory with the contents of a register, etc. Then, the code would increment ESI by a short number (3, 5, 6, 7 - always depending on the value of the first two bytes pointed at by ESI) and another jump through the same tables would be performed.

If I remember correctly, it was after the ESI register went back to a value I had already seen that I said "oh my Gosh, there is a virtual machine here!". I then realized what I was dealing with and that the memory area pointed at by ESI was actually machine code for a virtual machine that I immediately called "Brulez" for obvious reasons. The next steps for were then to understand the instruction set of this new "microprocessor" so I could understand what the virtual machine program was doing. This is what allowed me to finally defeat the program.

2.3.2. Brulez Virtual Machine Architecture

The Brulez virtual machine has a very simple yet powerful enough (for the purposes it was designed) architecture. The processor has five 32-bit general purpose registers, named r0 through r4, and one flag register called nz (non-zero), which only holds two values: 1 if the result of a previous operation was non-zero, or zero if the result of a previous operation was zero. There is not a separate stack pointer (SP) register since the stack of the virtual machine is shared with the stack of the host processor. The host machine's ESI register acts as the Brulez virtual machine program counter (PC) register. Words are stored in memory in little endian byte order.

2.3.3. Operand Addressing

Brulez virtual machine instructions act on zero or more operands. Some operands are specified explicitly and other are implicit. The data for a source operand can be located in the instruction itself (an immediate operand), a register or a memory location.

Immediate Operands: some instructions use data that is encoded in the instruction itself as a source operand.
Register Operands: source and destination operands can be any of the general purpose registers r0 through r4.
Memory Operands: source and destination operands in memory are always referenced by an address stored in a general purpose register.

2.3.4. Instruction Set

The instruction set of the Brulez virtual machine is pretty basic, but enough to accomplish the two tasks the virtual machine needs to: authenticate the user, decrypt two more sections of the binary, and print to standard output a message to indicate success or failure. Here I will list the instructions part of the instruction set classified by categories. In Appendix A, The Brulez Virtual Machine Instruction Set I present the entire instruction set listing opcodes and a short description for each instruction.

Data Transfer Instructions: ld (load), ldi (load immediate), ldbze (load byte zero extend), ldhze (load half-word zero extend), push, pushi (push immediate), pushi2 (push immediate 2), pop, clr (clear)
Arithmetic Instructions: add, addi (add immediate), subi (substract immediate), inc (increment), dec (decrement), ldc (load and compare)
Logical Instructions: xor, xori (xor immediate), ori (or immediate), andi (and immediate)
Control Transfer Instructions: call, ret (return), jmp, jnz (jump if not zero), jz (jump if zero)
String Instructions: strnchr (string character)
Bit and Byte Instructions: bswp (byte swap)
Miscellaneous Instructions: pcbe (pop, compare, and branch if equal), sys (system call), hlt (halt), trap (debugger trap), adjsp (adjust stack pointer)

2.3.5. Instruction Decoding

The microcode (x86 machine code) of the Brulez virtual machine decodes instructions via data and jump tables. There is one data table for the first opcode of an instruction that contains the addresses of different data/jump tables. The second byte of the opcode is the used as an index into this second data/jump table. In one particular case (the xor instruction), there is a third table that is used to index based on the size of the operand (byte, half-word, and word.) Here is how these tables look like:

seg000:00E1B991	l1_jump_table	dd offset off_E1B9A9
seg000:00E1B995			dd offset off_E1B9BD
seg000:00E1B999			dd offset off_E1B9F5
seg000:00E1B99D			dd offset off_E1BA25
seg000:00E1B9A1			dd offset off_E1BA29
seg000:00E1B9A5			dd offset off_E1BA35
seg000:00E1B9A9	;
seg000:00E1B9A9	; level	2 jump table
seg000:00E1B9A9	;
seg000:00E1B9A9	off_E1B9A9	dd offset pushi2
seg000:00E1B9AD			dd offset pushi
seg000:00E1B9B1			dd offset pushr
seg000:00E1B9B5			dd offset pop
seg000:00E1B9B9			dd offset adjsp
seg000:00E1B9BD	;
seg000:00E1B9BD	off_E1B9BD	dd offset loc_DF1A7B
seg000:00E1B9C1			dd offset sub_DF442E+1
seg000:00E1B9C5			dd offset loc_DF59D2
seg000:00E1B9C9			dd offset xor
seg000:00E1B9CD			dd offset addi
seg000:00E1B9D1			dd offset subi
seg000:00E1B9D5			dd offset andi
seg000:00E1B9D9			dd offset ori
seg000:00E1B9DD			dd offset xori
seg000:00E1B9E1			dd offset add
seg000:00E1B9E5			dd offset ldc
seg000:00E1B9E9	;
seg000:00E1B9E9	; this little table is used by the XOR instruction
seg000:00E1B9E9	;
seg000:00E1B9E9	off_E1B9E9	dd offset xorb
seg000:00E1B9ED			dd offset xorh
seg000:00E1B9F1			dd offset xorw
seg000:00E1B9F5	;
seg000:00E1B9F5	off_E1B9F5	dd offset hlt
seg000:00E1B9F9			dd offset sys
seg000:00E1B9FD			dd offset ldi
seg000:00E1BA01			dd offset dec
seg000:00E1BA05			dd offset inc
seg000:00E1BA09			dd offset clr
seg000:00E1BA0D			dd offset strnchr
seg000:00E1BA11			dd offset trap
seg000:00E1BA15			dd offset ldmem
seg000:00E1BA19			dd offset bswap
seg000:00E1BA1D			dd offset ldbze
seg000:00E1BA21			dd offset ldhze
seg000:00E1BA25	;
seg000:00E1BA25	off_E1BA25	dd offset pcbe
seg000:00E1BA29	;
seg000:00E1BA29	off_E1BA29	dd offset jmp
seg000:00E1BA2D			dd offset jnz
seg000:00E1BA31			dd offset jz
seg000:00E1BA35	;
seg000:00E1BA35	off_E1BA35	dd offset call
seg000:00E1BA39			dd offset ret

There is no provision for handling invalid instructions - an invalid opcode will just hang the virtual machine.

2.4. The Brulez Virtual Machine Program Inside 0x90.exe

Since 0x90.exe has an embedded virtual machine it is obvious that there is also an embedded virtual machine program. I have mentioned in previous sections what this virtual machine program does, but in this section I go into the details of how the program works and what it does. Let's start by showing the program, and then I'll explain the purpose of some blocks of code:

     1			.org	    0xe1ba3d
     2	
     3			ldi	    r3, 0x20e
     4			pushi	    0xe1ba65
     5			pop	    r0
     6			ldi	    r2, 0x53
     7			; 0xe1ba54
     8	loop1:
     9			xor.b	    (r0), r2
    10			inc	    r0
    11			dec	    r3
    12			jnz	    loop1 ; 0xe1ba54
    13	
    14			;
    15			; Beginning of encrypted data
    16			;
    17			ldi	    r1, 0x5cc80e31
    18			sys	    GetCmdLine
    19			strnchr	    ' ', 0x55
    20			jz	    bad_key ; 0xe1bb6f
    21			ld	    r2, (r0)
    22			addi	    r2, 0x1d9bdc45
    23			addi	    r1, 0x74519745
    24			subi	    r2, 0xad45dfe2
    25			addi	    r1, 0xdeadbeef
    26			addi	    r2, 0x68656c6c
    27			subi	    r1, 0x17854165
    28			subi	    r2, 0x41776169
    29			addi	    r1, 0x73686f77
    30			addi	    r2, 0x69747320
    31			subi	    r1, 0x206e6f20
    32			addi	    r2, 0x64726976
    33			addi	    r1, 0x6d657263
    34			subi	    r2, 0x6e757473
    35			subi	    r1, 0x79212121
    36			subi	    r2, 0x65683f21
    37			andi	    r2, 0xdfffffff
    38			push	    r2
    39			push	    r1
    40			pcbe	    key_okay_1
    41			; tough luck, wrong key
    42			clr	    r3
    43			jz	    bad_key ; 0xe1bb6f
    44	
    45			; 0xe1bb06
    46	key_okay_1:
    47			inc	    r0
    48			addi	    r0, 2
    49			inc	    r0
    50			ldhze	    r1, (r0)
    51			push	    r1
    52			pop	    r2
    53			push	    r0
    54			pushi	    0xd8360d
    55			pop	    r0
    56			addi	    r0, 0x98548
    57			dec	    r0
    58			xor.h	    (r0), r2
    59			pop	    r0
    60			ldbze	    r2, (r0)
    61			addi	    r0, 2
    62			ldbze	    r1, (r0)
    63			add	    r2, r1
    64			call	    0xe1bb86
    65	
    66			; The following instruction had the first byte "encrypted"
    67			inc	    r3 ; r3 is 0 before execution of this instr.
    68			push	    r3
    69			pushi	    welcome ; 0xe1bc2a
    70			sys	    printf
    71			adjsp	    8
    72			jmp	    exit_vm ; 0xe1bb84
    73	
    74			; 0xe1bb6f
    75	bad_key:
    76			pushi2	    0
    77			pushi	    authenticate ; 0xe1bc13
    78			sys	    printf
    79			adjsp	    8
    80	
    81			; 0xe1bb84
    82	exit_vm:
    83			hlt
    84	
    85			; subroutine 0xe1bb86
    86	
    87			ldi	    r1, 0x4c
    88			inc	    r1
    89			inc	    r1
    90			addi	    r1, 5
    91			dec	    r1
    92			subi	    r1, 4
    93			subi	    r2, 0x5a
    94			ldc	    r1, r2
    95			; tough luck - first part of key is correct but
    96			; second part is not
    97			jnz	    bad_key ; 0xe1bb6f
    98	
    99			dec	    r0
   100			ldbze	    r2, (r0)
   101			addi	    r0, 2
   102			ldbze	    r1, (r0)
   103			add	    r2, r1
   104			inc	    r2
   105			subi	    r2, 0x4e
   106			pushi	    0xe0dd64
   107			pop	    r0
   108			addi	    r0, 0xdeac
   109			inc	    r0
   110			xor.b	    (r0), r2
   111			ldi	    r3, 0x49
   112			pushi	    0xe1bc2a
   113			pop	    r0
   114			inc	    r2
   115	
   116			; 0xe1bc00
   117	loop2:		xor.b	    (r0), r2
   118			inc	    r0
   119			dec	    r3
   120			jnz	    loop2	; to 0xe1bc00
   121	
   122			ret
   123	
   124	authenticate	db 'Please Authenticate!', 0Ah, 0Dh, 0
   125	
   126	welcome		db 'Welcome...',0Ah, 0Dh
   127			db 'Exploit for it doesn',27h,'t matter 1.x Courtesy of'
   128			db ' Nicolas Brulez',0

From lines 3 to 12, the virtual machine program is decrypting 0x20e bytes starting at address 0xe1ba65. The decryption is a simple XOR of each byte with 0x53. Please note that the program above has already been decrypted to show what the program does.

From lines 18 to 20 the program call the Windows API function GetCmdLine() and searches for the first space (' ') in it. If no space if found then the program jumps to the label bad_key in line 75 where the message "Please authenticate!" is printed out and the virtual machine exits by executing a hlt instruction.

Note that the program is looking for the first space in the command line, which depending on what directory the program is being run from, and the name of that directory, the program might mistakenly look in part of the directory name for the key. For example, if the program is stored as C:\Documents and Settings\username\Desktop\0x90.exe and it is being run using the full path then the virtual machine program will think that the arguments to the program will start at the word "and" because that is the word that follows the first space. This is because GetCmdLine() returns the program name (including directory it is run from) plus the program's arguments. So, this opens some interesting possibilities, like:

C:\>"blahblah 1D3NEAcN\0x90.exe"
Welcome...
Exploit for it doesn't matter 1.x Courtesy of Nicolas Brulez
C:\>

From lines 21 to 40 the program performs the first check of the key. This checks just deals with the first four bytes of the first argument to the program. If the first part of the key is valid then executing continues in line 46 at the label key_okay_1. Otherwise the program jumps to bad_key.

From lines 50 to 58 the program reads bytes 4 and 5 of the first argument to the program and XORs byte 5 with the byte at line 67. This is to decrypt the instruction that lives there, which was encrypted.

Lines 60 to 64 load bytes 6 and 7 of the first argument to the program, add them leaving them in register r2 and call the subroutine at 0xe1bb86.

The subroutine performs the second and last check of the key and jumps to the label bad_key is the check is not passed. Otherwise it will decrypt the ret instruction at the end of the subroutine by XORing the byte at line 122 with 0x42, decrypt the data at line 126 (the string "Welcome..."), and return.

Finally, at line 67 the program prints the string "Welcome... Exploit for it doesn't matter..." by calling the library function printf() and the execution of the virtual program ends by calling the hlt instruction.

2.5. Discovering Authentication Keys

You have seen that the program authenticates the user by looking at the command line. In this section I will explain how to find the keys that the program will accept as valid.

As I mentioned in Section 2.3, “ x86 Machine Code Becomes Microcode (a.k.a. "the Machine Within") ”, in lines 21 to 40 the virtual machine program is checking the first four characters of the first argument to the program. In line 40 there is a comparison. Let's see what it is comparing: in one hand I have (0x5cc80e31 + 0x74519745 + 0xdeadbeef - 0x17854165 + 0x73686f77 - 0x206e6f20 + 0x6d657263 - 0x79212121) = 0xdf807499.

In the other hand the program is performing the following operation: (<first four characters of first arg to program> + 0x1d9bdc45 - 0xad45dfe2 + 0x68656c6c - 0x41776169 + 0x69747320 + 0x64726976 - 0x6e757473 - 0x65683f21) & 0xdfffffff. The result of this operation will be compared with 0xdf807499 (see previous paragraph) and if the two numbers are not equal the program bails out. So, we have one equation with one unknown which leads us to finding out the first 4 characters of the key:

(0xdf807499 - (0x1d9bdc45 - 0xad45dfe2 + 0x68656c6c - 0x41776169 + 0x69747320 + 0x64726976 - 0x6e757473 - 0x65683f21) ) & 0xdfffffff = 0x4e334431 = '1D3N' if the number is stored in little endian byte order (which it is since the Brulez processor is little endian.)

The next checks and conditions that the key must satisfy are a bit more challenging, but following a similar procedure to the one I presented above I can find some equations (in the equations that follow the variable arg[] is used to represent an array of characters that contains the first argument to the program. arg[0] is the first character of the first argument to the program, arg[1] is the second character and so on):

arg[4] ^ 0x47 must be equal to 0x2 to satisfy encrypted instruction at 0xe1bb54. This is because I know that the instruction at this address must be 3 bytes long for the rest of the program to be valid (i.e. no illegal instructions, etc.) and it must be an instruction that does not modify the stack pointer or anything else, and there is no other instruction whose first byte satisfy these conditions. This condition leads to arg[4] = 0x45 = 'E'.

To satisfy the load and compare (ldc) instruction inside the subroutine, the following equation must be satisfied: arg[4] + arg[6] = 0xa8. Since I now know arg[4] I can find arg[6]: arg[6] = 0xa8 - 0x45 -> arg[6] = 0x63 = 'c'.

Now, the second byte of the instruction at line 67 must be 3, 4, or 5 so the instruction ends up being "inc r3", "dec r3", "clr r3", or "bswp r3". So, arg[5] ^ 0x42 = {3, 4, 5, 9} -> arg[5] = 3 ^ 0x42 = 'A' or arg[5] = 4 ^ 0x42 = 'F' or arg[5] = 5 ^ 0x42 = 'G' or arg[5] = 9 ^ 0x42 = 'K'.

And finally, to satisfy the condition that the instruction at 0xe1bc11 must decode to a ret instruction (since it is the last instruction in a subroutine), I need: arg[5] + arg[7] + 1 - 0x4e = 0x42 -> arg[5] + arg[7] = 0x8f -> if arg[5] = 'A' -> arg[7] = 'N', or if arg[5] = 'F' -> arg[7] = 'I', or if arg[5] = 'G' -> arg[7] = 'H', or if arg[5] = 'K' -> arg[7] = 'D'.

Putting all the characters I have found in the correct order I get that the four possible keys are:

"1D3NEAcN"
"1D3NEFcI"
"1D3NEGcH"
"1D3NEKcD"

I realize that all this sounds confusing, but it is just additions, subtractions, XORs and a few conditions that have to be satisfied. Just follow the virtual machine program and do the math and you will understand.

2.6. Program Structure

After my analysis is done I can have a very accurate view of how the program the program is structured:

2.7. Other Interesting Details

Here are a few interesting details and comments about 0x90.exe that did not fit anywhere else:

Q: How was 0x90.exe created? Who writes code that crazy? What language was used?
A: Well, 0x90.exe was obviously created by Nico Brulez. Because of all the low level details that I saw in the program, and all the anti-reverse engineering traps that I had to evade, I believe that the program was written entirely in assembler, i.e. no compiler intervention. However, Nico must have used a tool he wrote to automate some things such as the 175 decryption blocks and the no-operation blocks that were everywhere in the program.
"Evil Has No Boundaries !": right before starting the decryption of the 175 decryption blocks, 0x90.exe initializes the area of memory that will later be used to hold the Brulez virtual machine registers with the string "Evil Has No Boundaries !". This string will not show up when the strings command is run on 0x90.exe because it is created by doing 4-byte writes to the correct memory address. I think the string is there as a testament to the difficulty of breaking this binary due to all the evil anti-reverse engineering traps present in the program ;-)
"Exploit for it doesn't matter" - if you recall this is the message that is printed out after the "Welcome" message once a user authenticates successfully. This message is confusing and I don't have a full explanation for it. One possible explanation is that 0x90.exe has a vulnerability that in theory allows for code execution. One place where this is a possibility is in the authentication code: if you follow carefully Section 2.5, “Discovering Authentication Keys”, you will notice that two letters of the key are used to decrypt (via a simple XOR) two different virtual machine instructions. This means that by carefully choosing the key one could (theoretically) control execution of the program.
Why the name 0x90?: 0x90 is the opcode for the NOP instruction (no-operation) in the x86 instruction set. Other than printing one message if the user has not authenticated, or a different one if the user has authenticated, the program does nothing.

3. Conclusion

0x90.exe has been defeated. Nothing is unbreakable and anyone with enough time, patience and motivation will always be able to get in; it is only a matter of time. Things can be made complicated and highly involved so most people give up somewhere along the way (or maybe even before starting) but there will always be someone, somewhere, that will take things until their last consequences.

And to finish, one final comment: in my opinion there are two ways to break a program like 0x90.exe: a brute force approach, and the elegant approach. The brute force approach does not require coming up with valid keys to run the program - all that is necessary is to patch the program at a specific location and force it, depending on the situation, to make or not make a jump. The elegant approach consists of understanding how the keys are generated and generating your own keys.

The thing is, both approaches require a decent amount of reverse engineering because you need to decrypt the program so you can patch it (in the case of the brute force method), and for this to happen it is necessary to evade the anti-reverse engineering traps and to understand how the program is encrypted. Once the encryption has been beaten the extra step of understanding the key generation algorithm is the fun part.

This challenge has been a great educational experience. I personally had never seen the use of a virtual machine in a piece of malware. Some of the other anti-reverse engineering tricks were also new to me.

If you have followed this far I hope you have learned something.

A. The Brulez Virtual Machine Instruction Set

This appendix contains a description of every instruction in the instruction set of the Brulez virtual machine. For each instruction the length in bytes is given and if any operands are encoded instructions for decoding them are provided.

0, 0 = pushi2 n. n is encoded. decode by XORing with 0x37195411 (size = 6)

+----+----+----+----+----+----+
|  0 |  0 |  # |  # |  # |  # |
+----+----+----+----+----+----+

0, 1 = pushi n. n is encoded. Add 0xadd01337 to get value (size = 6)

+----+----+----+----+----+----+
|  0 |  1 |  # |  # |  # |  # |
+----+----+----+----+----+----+

0, 2 = push rx. x is encoded. XOR with 0x47 to get x (size = 3)
+----+----+----+ | 0 | 2 | x | +----+----+----+
0, 3 = pop rx. x is encoded. XOR with 0x66 to get x (size = 3)
+----+----+----+ | 0 | 3 | x | +----+----+----+
0, 4 = adjsp # (size = 3)
Adds # to the stack pointer. # is encoded. XOR with 0x45 to get value.
+----+----+----+ | 0 | 4 | # | +----+----+----+

1, 3 = xor (rx), ry (size = 5)

+----+----+----+----+----+
|  1 |  3 |  y | Sz |  x |
+----+----+----+----+----+

Sz = size: 0 byte, 1 half word, 2 word

1, 4 = addi rx, #. x is encoded. substract 3 to get x. set zero flag based on result (0 if result 0, 1 if not) (size = 7)
+----+----+----+----+----+----+----+ | 1 | 4 | x | # | # | # | # | +----+----+----+----+----+----+----+
1, 5 = subi rx, #. x is encoded. substract 2 to get x. set zero flag based on result (0 if result 0, 1 if not) (size = 7)
+----+----+----+----+----+----+----+ | 1 | 5 | x | # | # | # | # | +----+----+----+----+----+----+----+
1, 6 = andi rx, #. x is encoded. substract 5 to get x. set zero flag based on result (0 if result 0, 1 if not) (size = 7)
+----+----+----+----+----+----+----+ | 1 | 6 | x | # | # | # | # | +----+----+----+----+----+----+----+
1, 7 = ori rx, #. x is encoded. substract 4 to get x. set zero flag based on result (0 if result 0, 1 if not) (size = 7)
+----+----+----+----+----+----+----+ | 1 | 7 | x | # | # | # | # | +----+----+----+----+----+----+----+

1, 8 = xori rx, #. x is *not* encoded. set zero flag based on result (0 if result 0, 1 if not) (size = 7)

+----+----+----+----+----+----+----+
|  1 |  8 |  x |  # |  # |  # |  # |
+----+----+----+----+----+----+----+

1, 9 = add rx, ry. x is encoded; substract 3 to get x. y is encoded; substract 1 to get y. set zero flag based on result (0 if result 0, 1 if not) (size = 4)
+----+----+----+----+ | 1 | 9 | y | x | +----+----+----+----+
1, 10 = ldc rx, ry. Load and compare. x is encoded; substract 1 to get x. y is encoded; substract 2 to get y. Load rx with contents of ry. reset zero flag if rx is equal to ry. set zero flag otherwise (size = 4)
+----+----+----+----+ | 1 | 10 | y | x | +----+----+----+----+
2, 0 = hlt - exit Virtual Machine (size = 2)
+----+----+ | 2 | 0 | +----+----+
2, 1 = sys - do a syscall, store 32-bit result in r0. Set flag based on result (0 if result 0, 1 otherwise.) The address of the IAT entry for the system call to make is encoded. Add 0xFEA731DE to obtain real address (size = 6)
+----+----+----+----+----+----+ | 2 | 1 | # | # | # | # | +----+----+----+----+----+----+

2, 2 = ldi rx, #. # is encoded. add 0xaefded04 to decode (size = 7)

+----+----+----+----+----+----+----+
|  2 |  2 |  # |  # |  # |  # |  x |
+----+----+----+----+----+----+----+

2, 3 = dec rx. set flag (0 if result 0, 1 if 1) (size = 3)
+----+----+----+ | 2 | 3 | x | +----+----+----+
2, 4 = inc rx. set flag (0 if result 0, 1 if 1) (size = 3)
+----+----+----+ | 2 | 4 | x | +----+----+----+
2, 5 = clr rx. set rx to 0. sets zero flag to zero (size = 3)
+----+----+----+ | 2 | 5 | x | +----+----+----+
2, 6 = strnchr(): searches for a specific char in the string pointed by r0. if found leaves in r0 a ptr to the found char. if not found resets the zero flag. only a maximum number of characters is searched (size = 5)
2, 7 = trap. Produce a debugger trap to debug microcode (x86 assembly) (size = 2)
+----+----+ | 2 | 7 | +----+----+
2, 8 = ld rx, (ry). load rx with contents of memory address pointed at by the contents of ry (size = 4)
+----+----+----+----+ | 2 | 8 | y | x | +----+----+----+----+
2, 9 = bswap rx (size = 3)
+----+----+----+ | 2 | 9 | x | +----+----+----+
(buggy microcode since bswap output not being written to register)

2, 10 = ldbze rx, (ry) - load byte zero extend (size = 4)

+----+----+----+----+
|  2 | 10 |  y |  x |
+----+----+----+----+

2, 11 = ldhze rx, (ry) - load half word zero extend (size = 4)

+----+----+----+----+
|  2 | 11 |  y |  x |
+----+----+----+----+

3, 0 = pcbe (pop, compare and branch if equal). compares the two words in the stack and branch if they are equal. the words are removed from the stack. destination address is encoded. substract 0x31337 to get real address (size = 6)

4, 0 = jmp to encoded dest. Destination is decoded by adding 0xdeadead (size = 6)

+----+----+----+----+----+----+
|  4 |  0 |  # |  # |  # |  # |
+----+----+----+----+----+----+

4, 1 = jnz to encoded dest. Destination is decoded by adding 0xe1ba3e (size = 6)

+----+----+----+----+----+----+
|  4 |  0 |  # |  # |  # |  # |
+----+----+----+----+----+----+

4, 2 = jz to encoded dest. Destination is decoded by adding 0xe1ba41 (size = 6)

+----+----+----+----+----+----+
|  4 |  0 |  # |  # |  # |  # |
+----+----+----+----+----+----+

5, 0 = call addr (size = 6)

+----+----+----+----+----+----+
|  5 |  0 |  A |  B |  C |  D |
+----+----+----+----+----+----+

addr = D << 24 + C << 16 + B <<8 + A

5, 1 = ret (size = 2)
+----+----+ | 5 | 1 | +----+----+

B. Files

The following files where generated during this Scan of the Month:

hide.idc: IDA IDC program to hide sections of code in a program. The addresses of the sections to hide are taken from an input file. Hiding sections of a program that only contain no-operation garbage makes the program a lot easier to follow and analyze.
xor.idc: IDA IDC program to XOR a section of a program with a byte specified by the user. I used this to decrypt from within IDA the two sections of the program that get decrypted by the Brulez virtual machine. One section was decrypted by XORing with 0x53 and the other one by XORing with 0x43. See Section 2.2.5, “Heavy Use of Encryption” for details.
0x90-n2.idb: IDA Pro database that I used until the end when reverse engineering 0x90.exe.
decrypt-0x90.c: C program used to automatically decrypt the two chains that contain a combined total of 175 decryption blocks, patch the program so these chains are not decrypted, and neutralize all the exception handlers wrecking havoc when the program is run under a debugger.
XML sources for this document: this HTML document was generated using DocBook XML. This directory contains all the files used in the generation of this document.

C. References

Performance monitoring, a x86 assembly language gem from the the Assembly Gems page.
A Crash Course on the Depths of Win32 Structured Exception Handling: excellent Microsoft Systems Journal article by Matt Pietrek.
An In-Depth Look into the Win32 Portable Executable File Format : another Microsoft Systems Journal article by Matt Pietrek.

D. Work Methodology

The submission was prepared using DocBook XML, and then converted to HTML using Tim Waugh's xmlto XML converter. Editing was done on a Linux desktop using GNU Emacs and the PSGML major mode.

All the anti-reverse engineering work was done on a Linux desktop using a Linux native version of the IDA disassembler. Machine virtualization on the same desktop was used to run the program in a debugger.

E. Acknowledgements

Very special thanks to ...

Nick DeBaggis for his insightful comments and suggestions while I was working on this. A couple of findings presented in this document were pointed out by him (sorry for not being more specific dude!)
The fine folks at DataRescue for creating and maintaining IDA Pro, and specially for producing a native Linux version.
Nico Brulez, architect of the Brulez Virtual Machine and of this challenge, for designing such a great educational experience and for providing some fine hours of entertainment.
As always, the Honeynet Project for coming up with these nice and entertaining challenges.