Honeynet Project Reverse Challenge

Matt Messier, Bob Fleck, John Viega

Secure Software, Inc.



We began our analysis by loading the-binary into IDA Pro version 4.20.  This provided us with the disassembly we needed to get started analyzing the binary to determine how it works.  This initial analysis revealed that the binary was statically linked with libc 5.3.12.  We were able to definitively determine the version number of libc from the identification string "@(#) The Linux C library 5.3.12" in the _rodata segment.  Using this information, we obtained a copy of the source for libc 5.3.12 to be used in helping to identify standard C functions in IDA's disassembly.

Because IDA identifies Linux system calls, we began by searching for system calls and matching them up to libc functions.  This gave us some reference points to begin the disassembly of the program itself.  At this time, we did not identify any libc functions other than the simple wrappers around system calls.  At a later point, we did begin to identify libc functions further, however we never completely identified each and every one.  It became obvious through the course of identifying the libc functions that the-binary was not compiled with optimization due to the fact that there was much dead code retained in the resulting binary.

Because IDA did not automatically identify the main() function, we matched up the entry code with the code in libc to identify the location of main().  From here, we began to analyze the disassembly line by line and manually translate the code to C (see the-binary.c)  Initially there were still several function calls made from main() that did not have meaningful names assigned to them, but based on the addresses and libc sources, we were able to easily identify the corresponding libc function where appropriate.

Although main() is a large function, its conversion from assembler to C was straight-forward.  The first part simply performs some setup work to hide itself, become a daemon, and prepare to receive commands from the network.  The second part handles the commands that are received.  IDA recognized a switch jump table containing 12 entries, which turned out to be one case for each supported command. 

As we investigated the case blocks and the functions that they called, we noticed that four of these functions crafted packets and wrote them to raw sockets. Examination of the packets being created led us to determine that this was a DoS tool.

With the exception of the four large functions that actually perform the various DoS attacks, we analyzed and converted to C any unknown functions as we converted from assembler to C in main().  These functions included an encoder and decoder for "hiding" the program's network traffic, a convenience function for calling gethostbyname(), and a pair of functions for sending data packets either back to the client program commanding the daemon or to other components of this DDoS tool.

Once all of main() and its smaller worker functions were converted to C, we began to work on the four larger DoS functions.  We started with the function located at offset 0x08049174, which we eventually renamed to be do_dnsflood_reflect().  This function was a bit more difficult to analyze than main() was, due to its somewhat complex looping structure.  It contains an outer loop with another loop nested within it, and a third loop nested within the second.  This was not immediately obvious from the assembly.  We began to work this out by using the labels in the assembly and goto commands in the C code.  We were then able to rename the labels to something a bit more descriptive.  Finally, we were able to build for loops based on the structure of the labels and goto jumps.

Once the first DoS function was analyzed and converted to C, the other three functions were quickly analyzed and converted as well.  All four functions are similar in structure, though they perform different structures and do also vary slightly in their looping.  After having dealt with the first set of loops in the first function, everything became much more clear.

In performing the conversion from assembler to C, we encountered some oddities, particularly in the second case block (CMD_RELAY_SETUP) in main(), that left us scratching our heads attempting to determine whether we had encountered bad code from the compiler that was used or bad original source code.  Based on what we've seen elsewhere in the code, we've concluded that these oddities are the result of poor programming, and not bad code generated by the compiler.  In general, it seems that the person that wrote this program mostly did so by piecing together code taken from other programs.  The program is full of programming errors, but it is unclear in all cases whether these errors are the program author's or the author (or authors) of the other programs that code for this program was taken from.

In places, attempts were made to strip out unwanted functionality from the code that was taken.  We can see that most of the code was stripped out, but some of it was left in.  The code that performs a TCP SYN flood is a perfect example of this.  In this case, a TCP pseudo header is constructed.  This header contains a stripped down version of a TCP header and is immediately followed by a TCP header.  The TCP header is constructed separately and copied into a buffer that holds the TCP pseudo header and the TCP header.  The copy of the TCP header is used to compute the checksum, but the checksum is never copied into that copy of the structure, and the buffer containing the TCP pseudo header and TCP header is never again referenced.