Table of Contents
Reverse engineering undocumented, untrusted and possibly hostile code is a demanding task. The security community needs more knowledge oh the tools and techniques for reverse engineering and I hope that this challenge would be a good way to introduce more people to this skill set.
Before I start with my analyis I would like to point out a great resource for reverse engineers. Ironically most of the publicly available information about reverse engineering is available at sites dedicated to software piracy and cracking software protection schemes. For years the best resource for crackers was Fravia's website While most of the articles on the site deal with cracking registration codes for shareware programs, the collection of tutorials and well documented techniques for reverse engineering makes it invaluable for the aspiring reverse engineer.
The two main approaches to program analysis are the active approach and the passive approach. The active approach involves running the program and monitoring its interaction with the environment. An excelent tool for doing this under Linux is Fenris by Michal Zalewski. The passive approach involves disassembling the program and figuring out the entire program logic before running it. This approach is slower, but the analysis is more thorough. Which approach you would chose is a matter of personal preference the nature of the program. Since the Honeynet binary is a complex program and we have absolutely no idea about what it does, I decided to go with the passive approach.
The commercially available disassembler is IDA Pro. Unfortunately none of the open source tools come even close to its functionality (but the Bastard Disassembly Environemnt project is worth keeping an eye on). IDA Pro supports many different CPU architectures and file formats. Its analysis engine identifies subroutines, local and global variables, Linux system calls and library routines via fingerprinting. The last feature is very important for the analysis of the Honeynet binary.
A evaluation version of IDA Pro is available for download on the Datarescue website. It supports x86 and ELF files. If you are new to IDA, read the Getting Started Manual and Gij's IDA Tutorial. A lot of information about IDA is also available at Fravia's website. IDA Pro does not have a Linux version, but runs fine under Wine.
FLAIR is a collection of tools for generating library function signatures which are later used by IDA to identify library functions in statically compiled binaries. FLAIR38 used to be available for free download from Datarescue, but the latest version is not. You can still find it with Google.
The FLAIR signature generator processes .PAT files, generated from the libraries with a preprocessor. The original FLAIR package includes preprocessors only for Dos and Windows libraries. Rpat has written an ELF library preprocessor for FLAIR. Read the article in Russian or English. Source included.
We'll start the analysis by running objdump -t on the binary.
$ objdump -t the-binary the-binary: file format elf32-i386 objdump: the-binary: no symbolsThe program is statically linked. The output of strings contains the line
@(#) The Linux C library 5.3.12which indicates the libc version the program is linked with. A quick search on Google shows that Redhat 4.x used this version of libc. Download all libc-devel packages from RedHat 4.x. They will be useful later.
Because the binary is statically linked, the output of strings shows too many strings from libc code. libstrings.pl is a quick and dirty Perl script that runs strings on all .o files from libc.a stores the strings in a hash. Then the scripts prints out all strings found in the binary, prefixing the ones that occur in libc.a with the name of the .o file where they were found. We can filter the output of the script to only see the strings that are part of the program code, not the library functions. There are very few of them, and some look very promising:
[mingetty] /tmp/.hj237349 /bin/csh -f -c "%s" 1> %s 2>&1 TfOjG /sbin:/bin:/usr/sbin:/usr/bin:/usr/local/bin/:. HISTFILE linux TERM /bin/csh -f -c "%s" %c%s
If you open the binary with IDA Pro, it will look almost impossible to analyze. There are hundreds of functions and none of them have any meaningful names. Most of them are library functions and we can identify them with a Libc5 signature file, which we are going to generate with Rpat and FLAIR.
The first step is to use rpm2cpio to extract the libc.a files from RedHat's libc-devel RPM packages. Then download and compile Rpat's UNIX library preprocessor. His article will walk you through the complicated process of building the tool. Then run it on all libc.a files you've got, resolve the collisions and generate the SIG file with FLAIR. Or save yourself the troubles and download the libc5.sig signature file I've generated. If your copy of IDA is in /usr/local/ida, put libc5.sig in /usr/local/ida/sig. Restart IDA and you should be able to apply the new signature to your disassembly.
A detailed step-by-step explanation of disassembling the binary would be too boring to read and tedious to write. Allow me to fast-forward two weeks and just make a few observations that I've made while disassembling.
Sometimes the compiler generates really crappy code. An example is the packet_decode() function, which was very hard to read. Fortunately, the packet_encode() function was very readable and helped me figure out the decoding process.
gcc puts the functions in the binary in the same order as they are in the source file. Related function are often next to each other. Example: packet_encode() and packet_decode().
Not all libc functions were identified by their signatures. When analyzing an unknown function, keep in mind that it might be a library function. Check the cross-references for calls from other libc functions. Look at the source code of these functions and this will help you identify the name of the unknown function.
The libc5 signature could not identify the calls to socket(), bind(), send() and recv(). On linux, all networking functions will contain a call to the sys_socketcall syscall:
mov edx, 1 lea ecx, [ebp+var_C] mov eax, 66h mov ebx, edx int 80h ; LINUX - sys_socketcallThe fastcall calling convention puts the syscall number (0x66) in eax, and the first syscall parameter in ebx. This parameter specifies the network function we are calling and is defined in libc-5.3.12/include/sys/socketcall.h. Number 1 is SYS_SOCKET, so we can identify the function above as socket(). Search the disassembled code for all calls to sys_socketcall and identify the socket functions.
sub_8049174 is a function called by main(). It opens a raw socket and sends some data, so we can be certain that it is not a libc function. At address 08049278 we have a call to sub_80556CC. This function is between other libc function in the binary and calls lots of internal libc functions like __libc_sigprocmask, __sigaction, __libc_alarm, sigsuspend, etc. These two observations lead us to believe that it is also a libc function. We need to identify it. We open the cross-references window and look at each of the functions calling sub_80556CC. Unfortunately none of them are libc functions. If they were, we could figure out what sub_80556CC is just by looking at the source of the function calling it.
We'll try to find a function call with specific arguments in this function and then grep the libc source. A good example is .text:08055728
push 0Eh call __sigactionA quick consultation with the sigaction manpage and /usr/include/linux turns this into a call of sigaction(SIGALRM, ...)
$ grep -r sigaction * | grep SIGALRM libc/posix/sleep.c: if (__sigaction (SIGALRM, &action, &oldaction) < 0) libc/posix/sleep.c: (void) __sigaction (SIGALRM, &oldaction, (struct sigaction *) NULL); libc/posix/sleep.c: (void) __sigaction (SIGALRM, &oldaction, (struct sigaction *) NULL); libc/pwd/lckpwdf.c: if (sigaction(SIGALRM, &act, &oldact) == -1) libc/pwd/lckpwdf.c: sigaction(SIGALRM, &oldact, NULL); libc/pwd/lckpwdf.c: sigaction(SIGALRM, &oldact, NULL);We have to look at these two source files and try to identify the function by the sequence of subroutine calls. The first function I checked was sleep() and the source matched the disassembled code perfectly. We can rename sub_80556CC to 'sleep' and go back to sub_8049174 - the function that called sleep().
During the analysis of one of the subroutines in the binary we came across the global variable dword_8078B14. The cross-reference window reveals that there are a lot of references to this variable. Go to the fourth reference, at .text:0804E647. It is in an unidentified function.
mov ebx, dword_8078B14 test byte ptr dword_807854C, 2 jz short loc_804E682 push eax call sub_80566A4 push eax mov ax, [ebp+arg_E] xchg al, ah and eax, 0FFFFh push eax mov eax, [ebp+arg_10] push eax call inet_ntoa add esp, 4 push eax push esi push offset aRes_sendSS_US ; "res_send: %s ([%s].%u): %s\n" push edi call _IO_fprintf loc_804E682: ; CODE XREF: Aerror+1C mov dword_8078B14, ebxThe string "res_send:" is found in libc/inet/res_send.c, line 134. The function is Aerror():
static void Aerror(file, string, error, address) FILE *file; char *string; int error; struct sockaddr_in address; { int save = errno; if (_res.options & RES_DEBUG) { fprintf(file, "res_send: %s ([%s].%u): %s\n", string, inet_ntoa(address.sin_addr), ntohs(address.sin_port), strerror(error)); } errno = save; }We can identify dword_8078B14 as errno,
The following files and tools were produces during the analysis of the binary:
This is the decompiled source of the binary. It compiles but hasn't been tested. It is a good reference for the program architecture and features.
The traffic decoder can be run on a live network interface or on a tcpdump file.
$ ./decoder -h ./decoder: invalid option -- h the-binary Traffic Decoder Syntax: the-binary [options] -i <iface> Listens on a interface -r <dumpfile> Reads in a tcpdump file
The client can send commands to the backdoor. It supports all functions of the backdoor and has been tested with the real binary.
$ ./client the-binary Client Syntax: the-binary-client <command> [options] To change the IP addresses of the client and the backdoor, edit the source Commands: init: initializes the client address list --type <type> type of address list --ip <a.b.c.d> client ip (if type=2, spiecify 10 addresses) status: returns status information kill: kills the DoS or shell process no parameters exec: execute a command and discard the output --cmd <string> command line exec_output: execute a command and return the output --cmd <string> command line bind_shell: bind a shell on port 23281 no parameters udp_flood: launch udp flood attack --src <a.b.c.d> source ip address --dst <a.b.c.d> destination ip address --hostname <hostname> destination hostname --d_port <port> destination port for the packet icmp_flood: launch icmp ping flood/smurf attack --src <a.b.c.d> source ip address --dst <a.b.c.d> destination ip address --hostname <hostname> destination hostname syn_flood: launch syn flood attack --src <a.b.c.d> source ip address (if not supplied, use a random ip) --dst <a.b.c.d> destination ip address --hostname <hostname> destination hostname --d_port <port> destination port for the SYN packet --sleep_after <number> sleep after number packets have been sent (optional) dns_flood: launch a dns query flood attack --src <a.b.c.d> source ip address (if not supplied, use a random ip) --dst <a.b.c.d> destination ip address --hostname <hostname> destination hostname --s_port <port> source port for the queries (optional) --sleep_after <number> sleep after number packets have been sent (optional) dns_smurf: launch dns smurf attack --ip <a.b.c.d> victim ip address --hostname <hostname> victim hostname --s_port <port> source port for the queries (optional) --sleep_after <number> sleep after number packets have been sent (optional)The client does not display the responses from the backdoor. Use the decoder in sniffing mode to see them.