Analysis

Download and Verification

To begin analysis, I downloaded the gzip'd tar file and verified that the MD5 signature matched the one listed at the download page:

csh% wget http://project.honeynet.org/reverse/the-binary.tar.gz
--11:43:38--  http://project.honeynet.org/reverse/the-binary.tar.gz
           => `the-binary.tar.gz'
Connecting to project.honeynet.org:80... connected!
HTTP request sent, awaiting response... 200 OK
Length: 118,131 [application/x-tar]

    0K -> .......... .......... .......... .......... .......... [ 43%]
   50K -> .......... .......... .......... .......... .......... [ 86%]
  100K -> .......... .....                                       [100%]

11:43:39 (79.84 KB/s) - `the-binary.tar.gz' saved [118131/118131]
csh% md5sum the-binary.tar.gz
857f9f32cbe7a277710d4fa57670316a  the-binary.tar.gz
csh% tar -tzf the-binary.tar.gz
reverse/
reverse/README.html
reverse/the-binary
csh% tar -xzf the-binary.tar.gz
The mystery binary unpacks into the reverse directory.

Identifying the Binary

Since little information was provided about the breakin, the first step in identifying the binary is determining what architecture it was compiled for:

csh% file the-binary
the-binary: ELF 32-bit LSB executable, Intel 80386, version 1, statically linked, stripped

Tip

A stripped executable means that gdb and objdump won't be any help in providing clues (via function names) on what the purpose of the executable is.

Given the architecture (Intel 80386) that the binary targets, I formed the hypothesis that the program was compiled for the fairly popular Linux OS. This could be tested by running the application, except that we are not sure what the proper functioning should be.

WarningWrong Turns
 

During my initial review of the program, I thought it would be useful to find out what memory locations the program used. One way to generate this information is to use gdb:

(gdb) info files
Symbols from "/mnt/hdb1/ufs/thumper/challenge2002/reverse/the-binary".
Local exec file:
        `/mnt/hdb1/ufs/thumper/challenge2002/reverse/the-binary',
        file type elf32-i386.
        Entry point: 0x8048090
        0x08048080 - 0x08048088 is .init
        0x08048090 - 0x080675cc is .text
        0x080675cc - 0x080675d0 is __libc_subinit
        0x080675d0 - 0x080675d8 is .fini
        0x080675d8 - 0x0806c222 is .rodata
        0x0806d228 - 0x080792ac is .data
        0x080792ac - 0x080792b4 is .ctors
        0x080792b4 - 0x080792bc is .dtors
        0x080792bc - 0x0807eb98 is .bss
This information can also be obtained by running
csh% readelf -a reverse/the-binary
or by running
csh% objdump -x reverse/the-binary
which both provide very similar information. Eventually I realized that this was unnecessary because a disassembly via
csh% objdump -d reverse/the-binary
would provide all the address information as part of the disassembly. I discovered these commands by reviewing what programs were packaged in RedHat's binutils RPM:
csh% /bin/rpm -qil binutils

Checking for Strings

I next ran strings on the binary[1], to extract any human-readable strings from it. This technique can often reveal some of the behaviour of a program, because of error or help messages.

csh% strings -a reverse/the-binary > files/strings.txt
[...output stored to strings.txt...]
By picking out certain unique strings, it is also possible to sometimes locate the source code on the web. The following strings, excerpted from strings.txt, struck me as being interesting:

Table 3. Interesting strings extracted from the-binary

QQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQ

This string of Q's doesn't tell us anything about the program, but it's interesting because it's probably not actually a string constant within the program. I believe that it's actually a repeated instruction (possibly a NOP) in the code part.

[mingetty]

Program names are always interesting (mingetty is found on Linux, and possibly other Unix implementation), but this doesn't appear to be part of an exec() call, because of the square brackets wrapping the name. After researching the later strings which date this program to around 1996, I remembered that on old installations of Linux, the mingetty program would make itself appear as "[mingetty]" in the output of ps. Since several of these processes were always running on a machine, this would be a good way for a program to hide itself. Therefore, my suspicion is that we'll find this string used for a similar purpose when we trace the operation of the-binary in more detail.

/tmp/.hj237349

This is most likely a filename, either for output (like password sniffers) or for configuration information.

/bin/csh -f -c "%s" 1> %s 2>&1
/bin/csh -f -c "%s"

These strings invoke a shell to execute commands (the -c argument specifies that the remaining argument is not a shell script). There appear to be no instances of launching a shell with the -i option, which is a hint that this isn't a regular backdoor leading to an interactive shell.

gethostby*.getanswer: asked for "%s", got "%s"
/etc/host.conf
RESOLV_SPOOF_CHECK
gethostbyaddr: %s != %u.%u.%u.%u, possible spoof attempt

These strings are all related to DNS queries through the resolver library. At first, this wasn't fully clear, because I hadn't heard of /etc/host.conf or RESOLV_SPOOF_CHECK before. I searched Google for the last term (which I thought suggested that the program might spoof a DNS server), and turned up an excerpt from Linux Network Administrator's Guide, 2nd Edition, which explains /etc/host.conf as being from an old incarnation of the Linux resolver library. It goes on to explain several other strings which appear in the-binary, and how they affect the function of this of Linux resolver library.

socket(vc)
connect/vc
read(vc)
socket(dg)
connect(dg)

These strings appear to refer to networking system calls (and dg probably stands for datagram); most likely they are part of some error strings. Besides suggesting that the-binary is some kind of network service, they also don't follow the examples from TCP/IP texts and thus a web search on these strings might reveal source code that is related to this program. They might, however, just be part of the resolver library that was statically linked in.

syslog
[truncated]
/dev/console
/dev/log

Files and programs are interesting, but these strings might just be part of a statically linked in library that uses syslog. I noted their appearance for reference in later phases.

@(#) The Linux C library 5.3.12

This string was the most revealing of all, because it adds to the circumstantial evidence that this program was compiled for the Linux platform. (I say "circumstantial", because the attacker could have planted the string in an attempt at misdirection.) A search on Google for the text "The Linux C library", 5.3.13 yields an archive of the announcement made to Usenet:

From: "H.J. Lu" <hjl@lucon.org>
Subject: The Linux C library 5.3.12 is released.
X-Original-Date: Sat, 27 Apr 1996 11:51:54 -0700
Organization: Lu Consulting
Approved: linux-announce@news.ornl.gov (Lars Wirzenius)
Newsgroups: comp.os.linux.announce
Followup-to: comp.os.linux.development.system
Message-ID: <cola-liw-830710327-4348-1@oravannahka.helsinki.fi>
Date: Sun, 28 Apr 96 16:52:07 GMT
[...rest of posting omitted...]
So we're dealing with an attacker using a very old installation of Linux, or (more likely) this program hasn't been recompiled in years.

GCC: (GNU) 2.7.2.l.2

This provides further evidence that the compilation of this binary dates back several years.

Note

At this point, there are several pieces of circumstantial evidence indicating that the-binary was compiled for the Linux platform, probably sometime in 1996 or 1997. From this point forward, we will be assuming that this is the case.

Running the Program

At this point, our team could think of only two ways to proceed. One was disassembly, and the other was using strace while running the program. Brad set off to try to disassemble (and convert to C), while Bo proceeded with running the program.

WarningDon't try this at home (or work)!
 

It can't be stressed enough that you should never run an unknown application on a network that has access to other machines. It might be a worm or virus, and wreak havoc on your network or the general Internet. And you should be prepared to sacrifice the machine that you test on.

From the strings output (and the nature of the Honeynet questions), I already had a fair guess that there was a lot of network activity in this program. I loaded the-binary onto my laptop (which was equipped with a wireless card) and took it to my local library (which is sadly a digital wasteland for networking). The laptop was loaded with a recently updated copy of the redhat-7.2 distribution.

Once at the library, I booted up and checked my networking:

csh% /sbin/ifconfig
lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:231 errors:0 dropped:0 overruns:0 frame:0
          TX packets:231 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 
          RX bytes:125059 (122.1 Kb)  TX bytes:125059 (122.1 Kb)
I expected that any hacked together program probably wouldn't deal with the fact that there wasn't a "normal" interface, so I set one up. Then there was still no default route configured, so I created that as well, thus providing what appeared to be a normally configured machine:
csh# /sbin/ifconfig eth0 15.15.15.15
csh# /sbin/ifconfig
eth0      Link encap:Ethernet  HWaddr 00:07:85:92:1B:EC  
          inet addr:15.15.15.15  Bcast:15.255.255.255  Mask:255.0.0.0
          UP BROADCAST RUNNING  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:46 dropped:0 overruns:0 carrier:0
          collisions:0 
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:241 errors:0 dropped:0 overruns:0 frame:0
          TX packets:241 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 
          RX bytes:126009 (123.0 Kb)  TX bytes:126009 (123.0 Kb)
csh# /bin/netstat -rn
Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
127.0.0.0       *               255.0.0.0       U        40 0          0 lo
15.0.0.0        *               255.0.0.0       U        40 0          0 eth0
csh# /sbin/route add -net default gw 15.15.15.1
csh# /bin/netstat -rn
ernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
127.0.0.0       0.0.0.0         255.0.0.0       U        40 0          0 lo
15.0.0.0        0.0.0.0         255.0.0.0       U        40 0          0 eth0
0.0.0.0         15.15.15.1      0.0.0.0         UG       40 0          0 eth0
And before running the program, I wanted to be sure to capture any packets that were going over the network:
csh# /usr/sbin/tcpdump -s 5000 -w files/packets-1.trc
tcpdump: listening on eth0

With the system now ready, I launched the program with the strace command to trace system calls.

csh# strace -o files/strace-1 -ff reverse/the-binary
Process 9741 attached
Process 9742 attached
Process 9741 detached
A few files of output are created:

strace-1

At first blush, this appears to just be strace doing its work, but it is actually the initial steps of a running instance of the-binary. It's primary purpose appears to be to fork() and exit, having the effect of placing the-binary into the background.

strace-1.9741

From strace-1, we know that this file is the backgrounded instance of the-binary. Oddly enough, it follows the standard daemon coding template of starting a new session, forking, and exiting; this sequence is used in most daemons as part of the backgrounding sequence. The odd thing about this is that the-binary has already forked once, which we discover later is a resistance measure against debugging processes. At any rate, the real work, then, must occur in the child of the fork(): process 9742.

strace-1.9742[.1]

The program takes more standard daemon steps: chdir'ing to the root directory, closing STDIN, STDOUT, and STDERR, and ignoring several signals that might be used to terminate the program. It also gets the system time, for some reason. And most interesting of all, it opens a raw network socket and then blocks while waiting for a packet to arrive. (We can tell that it's blocking because the recv() call never finishes printing. After 15 minutes, the output was still frozen in this manner.)

Note

The observant reader will notice that we did not include files/packets-1.trc in our submission. Only an empty file was generated, so we opted not to include it.

Note

The fact that the program runs confirms the earlier presumption that Linux was the target OS. We can further guess that the motive for statically linking the binary was to increase compatibility with future distributions which might include libraries with updated APIs.

A raw socket is used when using the IP protocol directly to communicate with network services. The familiar ICMP, UDP, and TCP protocols are layered on top of IP, as protocol numbers 0x1, 0x11, and 0x6. In this case, the-binary requests packets of protocol number 0xb, which is not one of the standard protocols. (On a Redhat machine, the standard defines can be found in /usr/include/netinet/in.h. They can be found in a similar location on all modern Unix distributions.)

Note

We can also conclude that the-binary is a network daemon which waits for messages from the attacker. Since there is no timer set, there is no possibility for the daemon to break out of the blocking recv(), until a packet of protocol 0xb is received... which would never happen on a normal network.

WarningWrong Turn
 

Since I was already tracing the program, it seemed like a good idea to generate a packet of protocol 0xb and see how the-binary responds. The program nmap has this functionality (the -sO option), but it turns out not to work on my machine. I started creating sendraw.c to send such packets, but this was not a profitable avenue to pursue without some decompilation.

Disassembling the-binary

Looking for decompilers, I searched Google for decompilers linux. That produced a lot of Java-related hints, so I refined the search to decompilers linux -java. I found Giampiero Caprino's REC at the top of the list. REC is distributed without source, but it was very easy to try out.

   csh% mkdir rec
   csh% cd rec
   csh% wget http://www.backerstreet.com/rec/rec16lx.zip
   wget http://www.backerstreet.com/rec/rec16lx.zip
--11:47:22--  http://www.backerstreet.com:80/rec/rec16lx.zip
           => `rec16lx.zip.1'
Connecting to www.backerstreet.com:80... connected!
HTTP request sent, awaiting response... 200 OK
Length: 311,993 [application/zip]

    0K -> .......... .......... .......... .......... .......... [ 16%]
   50K -> .......... .......... .......... .......... .......... [ 32%]
  100K -> .......... .......... .......... .......... .......... [ 49%]
  150K -> .......... .......... .......... .......... .......... [ 65%]
  200K -> .......... .......... .......... .......... .......... [ 82%]
  250K -> .......... .......... .......... .......... .......... [ 98%]
  300K -> ....                                                   [100%]

11:47:26 (83.66 KB/s) - `rec16lx.zip.1' saved [311993/311993]
   csh% unzip rec16lx.zip
Archive:  rec16lx.zip
   skipping: MAKEFILE                `shrink' method not supported
   skipping: PROTO.LST               `shrink' method not supported
  exploding: COPYRITE                
  exploding: FCNTL.O                 
  exploding: HD.C                    
  exploding: HD.REC                  
  exploding: HD.X                    
  exploding: README                  
  exploding: STDIO.O                 
  exploding: STDLIB.O                
  exploding: STRING.O                
  exploding: UNISTD.O                
  exploding: WINBASE.O               
  exploding: WINGDI.O                
  exploding: WINUSER.O               
  exploding: REC 
   csh% chmod +x REC
   csh% ./REC ~/reverse/the-binary
   /home/silly/reverse/the-binary is an ELF/i386 executable file
Section       Offset   Address   Size
.init         000080  08048080  00008
.text         000090  08048090  1f53c
__libc_subinit   01f5cc  080675cc  00004
.fini         01f5d0  080675d0  00008
.rodata       01f5d8  080675d8  04c4a
.data         024228  0806d228  0c084
.ctors        0302ac  080792ac  00008
.dtors        0302b4  080792b4  00008
...
Reading symbol table...
Validating strings...
Finding references...
Finding procedures...
Done.
Decompiling 080499f4 - 08049d3c (763/781)

When REC finishes decompiling the binary, it clears the screen, and the resulting code is stored in the-binary.rec. Later sections contain examples of the output from REC.

Improving the Decompilation

The output from REC was very helpful, but still left a lot to be desired in terms of figuring out the function of the program, since there were absolutely no human meaningful names anywhere (the symbols had been stripped out). After looking at the decompilation, I decided that the best way to attack the translation was to find the system calls and start naming them. For this to work, it's helpful to know that system calls into the kernel are usually implemented through a software interrupt (or trap), which has small wrapper functions within the C library.

Scanning through the decompilation, I quickly noticed the pattern of short functions such as:

L08057160(A8)
/* unknown */ void  A8;
{
    eax = 6;
    asm("int 0x80");
    edx = eax;
    if(edx < 0) {
        eax = L08056E64( ~edx);
        (restore)edx;
        *eax = edx;
        eax = -1;
    }
}
which is pretty clearly just calling a system call and then fixing up the return value in the case of an error. The system call number is loaded into the eax register, just before the software interrupt. Which system call can generally be found in /usr/include/sys/syscall.h; on my redhat-7.2 system, I had to follow the includes and defines a little to discover that they are actually defined in /usr/include/asm/unistd.h (revealing that this particular function was implementing close()).

Idenfitying some library functions can be a little problematic because sometimes multiple functions can use the same system call. The first several library functions within the-binary are good examples of this. The first one was the SYSCALL_wait4 system call, but potentially any of the various forms of waitpid() and wait4() could be using the same SYSCALL. Then there were a series of functions using SYSCALL_socketcall (which I had to look up the manual page for) which implement various socket related functions (e.g., send(), recv(), bind(), etc.); I decided to skip over these initially because I wanted to focus on decompiling the important routines written by the hacker, not every single library routine that was referenced.

Fortunately, after the SYSCALL_socketcall library routines, I encountered the implementation for close() (which I used as an example, above). I renamed the function from L08057160() to close(), and then did a global search and replace within the rest of the file. This revealed a sequence (in another function) which decompiled into

close(0);
close(1);
close(2);
which matched the output I had seen from strace. It seemed a reasonable guess that function L08048134() (where this sequence was found) would be part of the attacker's code (as opposed to library functions), so I started tracing the code surrounding the series of close()'s. It very quickly happened that many other system calls I had seen during the strace were found, revealing that the-binary contains a central loop to receive packets of IP protocol 0xb which are then dispatched to other functions. After some judicious naming of variables and functions (as well as simplification of some obtuse bits of code), the initial part of the main loop roughly decompiled to:
    for(*(ebp + -17636) = ebp + -4536; 1; sleep?(10000)) {
        int len = recv( sock, buffer, 2048, 0);
        if(*protocol == 11 && *data1 == 2 && len > 200) {
            decode(len - 22, realdata, decbuf);
	    eax = tmpbuf[1] - 1;
            if(eax <= 11) {
                goto *(eax * 4 + 0x804832c)[L0804835c, L080483f0, ...
The state of our decompilation at this point is listed in the-binary.rec-processed.

Note

The core operation of the program is to sit in an infinite loop, receiving packets of IP protocol 0xb and processing them.

Note

After receiving a packet, the program "authenticates" the packet by confirming three properties:

  1. it is protocol 0xb.

  2. the first two bytes of the packet data form the word 0x0002.

  3. the length of the packet is greater than 200.

Note

After authenticating the packet, the packet data is decrypted using a simple algorithm. The second byte of the decrypted buffer is then used as an index into a table of addresses, to which execution is passed.

Attacking the Switch

At this point, we had enough information to answer many of the questions of the Reverse Challenge, but Question 2 remained an unknown. I decided to press forward with decompiling/rewriting the jump table in main() (located at 0x08048134 - 0x08048ECB). As indicated earlier, my initial impression was that this was a table of function pointers used to dispatch execution to separate routines for each command. After looking at the destination addresses, I came to realize that this couldn't be a table of function pointers since the target addresses were all located within main(). Eventually, I realized that C's switch statement would appear this way in assembly.

By carefully locating each address[2] within the decompilation produced by REC, I was able to reconstruct a more natural looking switch statement, as well as improved the readability of a few more functions. The result of thie work can be viewed at the-binary.rec-processed.2.

Other Avenues in Decompilation

The download page for the challenge mentioned (for a short time, at least) fenris as a useful tool. Much of the challenge binary was obviously library code. We'd already identified many of the function calls, but I thought Fenris might help check our work. An additional guide provided help using Fenris with this particular challenge.

The first step for me was to locate the version of libc closest to the one we suspect was used to compile the program.

Searching google for libc 5.3.12, I found an entry at rpmfind.net and a security advisory at attrition.org.

Since I was looking for static libraries, I went with the security advisory, and after some guesswork in an ftp client, I found that the old 5.3.12 libc had moved to ftp://updates.redhat.com/4.2/en/os/i386/libc-static-5.3.12-18.5.i386.rpm. This revision of libc may differ from the one compiled against, because of the security fixes, but I figured that most functions would remain unchanged.

To extract the files, I created a fake local RPM database, since I didn't want these ancient libraries installed on my system:

csh% mkdir -p var/lib/rpm
csh% rpm --root=`pwd` --initdb
csh% rpm --root=`pwd` --nodeps -i libc-static-5.3.12-18.5.i386.rpm
csh% cp usr/lib/lib*.a ~/reverse/fenris

Next, I modified fenris's "getfprints" script to use these libraries, by modifying the TRYLIBS variable, replacing references to /usr/lib/libc.a and /usr/lib/libm.a:

TRYLIBS="/home/silly/reverse/fenris/libc.a /home/silly/reverse/fenris/libm.a \
         /usr/lib/libdl.a \
         /usr/lib/libresolv.a /usr/lib/libreadline.a /usr/lib/libtermcap.a \
         /usr/lib/libssl.a /usr/lib/libBrokenLocale.a \
         /usr/lib/libcrypt.a"

            

Finally, I ran make to build fenris, and then:

csh% ./dress -F support/fn-libc5.dat ../the-binary ../the-binary.dress
dress - stripped static binary recovery tool by <lcamtuf@coredump.cx>
[+] Loaded 65021 fingerprints...
[*] Code section at 0x08048090 - 0x080675cc, offset 144 in the file.
[*] For your initial breakpoint, use *0x8048090
[+] Locating CALLs... 371 found.
[+] Matching fingerprints...
[*] Writing new ELF file:
[+] Cloning general ELF data...
[+] Setting up sections: .init .text .fini .rodata .data .ctors .dtors .bss .note .comment 
[+] Preparing new symbol tables...
[+] Copying all sections: .init .text .fini .rodata .data .ctors .dtors .bss .note .comment 
[+] All set. Detected fingerprints for 211 of 371 functions.

Running this new executable through REC gave us a decompilation with several symbols in it. This new file was used as a reference for our efforts on improving the readability of the original REC output.

A Giant Wrong Turn

Warning

This section documents the steps I took to track down an inconsistency I encountered, and can be safely skipped by readers. In the text below, mention of "command 0" refers to case 0 of the switch statement. While massaging the REC output, a distraction caused me to totally flub the line which extracts the command number from the decoded packet, causing me to believe that command numbers were zero-based. We felt that there might be some educational value in glossing over the use of gdb to trace through the execution, so we left this section alone.

With so many functions identified and more suitably named, it was easy to see that a rough translation of the implementation for command 0 would be:

build a message from some global variables (status report?)
encode(the message);
L08056058();
send_reply(the message, and other stuff);
but this left lots of questions. According to my understanding, any time a packet with command 0 was received, the-binary should always reply with a packet. And there didn't seem to be any dependence on further data within the control packet.

At this time, I began developing a program to create command packets to test this idea. The resulting program was sendcmd.c. My first tests proved unsuccesful at eliciting a packet from the-binary, although strace showed that more system calls were executed. At around this time, the Honeynet people released a Snort capture of the attacker communicating with the-binary, so I decided to review that log for instances of command 0 being used.

As luck would have it, the first several packets were of command 0. They can be identified by a 0x17 byte at position 22[3] in the packet:

% tcpdump -X -r snort.log | less
01:32:34.417321 172.16.196.132 > 172.16.183.2:  ip-proto-11 402
0x0000   4500 01a6 6b09 0000 ed0b 8d9b ac10 c484        E...k...........
0x0010   ac10 b702 0200 1730 482a ee95 cfe6 fd14        .......0H*......
0x0020   2b42 5970 879e b5cc e3fa 1128 3f56 6d84        +BYp.......(?Vm.
[...output deleted...]
I was also gratified to note a two byte 0x2 starting at position 20 in the packet, confirming the code I had decompiled in main(). Looking further at the snort capture file, I saw that it takes four packets before the-binary sends a reply packet, leading me to conclude that my program probably isn't broken.

To find out what was going on, I figured that my best bet was to load up the program in gdb and trace it a little. To do so, I did the following:

# gdb ./the-binary
GNU gdb Red Hat Linux (5.1-0.71)
Copyright 2001 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux"...
(no debugging symbols found)...
(gdb) info file
Symbols from "/mnt/hdb1/ufs/thumper/challenge2002/reverse/./the-binary".
Local exec file:
	`/mnt/hdb1/ufs/thumper/challenge2002/reverse/./the-binary',
	file type elf32-i386.
	Entry point: 0x8048090
	0x08048080 - 0x08048088 is .init
	0x08048090 - 0x080675cc is .text
	0x080675cc - 0x080675d0 is __libc_subinit
	0x080675d0 - 0x080675d8 is .fini
	0x080675d8 - 0x0806c222 is .rodata
	0x0806d228 - 0x080792ac is .data
	0x080792ac - 0x080792b4 is .ctors
	0x080792b4 - 0x080792bc is .dtors
	0x080792bc - 0x0807eb98 is .bss
(gdb) break *0x08048080
Breakpoint 1 at 0x8048080
(gdb) run
Starting program: /mnt/hdb1/ufs/thumper/challenge2002/reverse/./the-binary
warning: shared library handler failed to enable breakpoint

Breakpoint 1, 0x08048080 in ?? ()
(gdb) disas 0x080571E8 0x08057200
Dump of assembler code from 0x80571e8 to 0x8057200:
0x80571e8:      push   %ebp
0x80571e9:      mov    %esp,%ebp
0x80571eb:      mov    $0x2,%eax
0x80571f0:      int    $0x80
0x80571f2:      mov    %eax,%edx
0x80571f4:      test   %edx,%edx
0x80571f6:      jge    0x8057208
0x80571f8:      neg    %edx
0x80571fa:      push   %edx
0x80571fb:      call   0x8056e64
End of assembler dump.
(gdb) set *0x80571ec = 0
(gdb) print /x *0x80571f0
$1 = 0xc28980cd
(gdb) set *0x80571f0 = 0xc2899090
(gdb) disas 0x080571E8 0x08057200
Dump of assembler code from 0x80571e8 to 0x8057200:
0x80571e8:      push   %ebp
0x80571e9:      mov    %esp,%ebp
0x80571eb:      mov    $0x0,%eax
0x80571f0:      nop
0x80571f1:      nop
0x80571f2:      mov    %eax,%edx
0x80571f4:      test   %edx,%edx
0x80571f6:      jge    0x8057208
0x80571f8:      neg    %edx
0x80571fa:      push   %edx
0x80571fb:      call   0x8056e64
End of assembler dump.
(gdb) break *0x80482CC
Breakpoint 2 at 0x80482cc

After starting up sendcmd, I use stepi commands to verify that all the elements of the if statement pass. They do, so I set up a break point at 0x8048314, just after the call to decode().

Note

It hasn't been mentioned, but the decompilation so far doesn't explain how the-binary determines an address to send a reply packet to. I would have expected it to just grab the source address from the control packet, but the code doesn't do this. The fact that it takes four packets before a reply is sent immediately makes me suspect that perhaps the source address is being sent one octet at a time. Looking at the reply packet, the observation that the destination IP is different than the source IP of the previous packets reinforces this suspicion.

Tip

If you review the final massaged REC output, you'll see that the last suspicion is not true. The errors I introduced in the decompilation caused me to misunderstand which case was being executed. By tracing the execution after sending the-binary my own test packet, I was able to discover my error and rectify it.

Destination: Unknown

During my investigation in the last section, I began to wonder how the destination IP address of packets from the-binary were determined, so I decided to pursue that angle for a while. The first step was to improve the readability of function send_packet() (located at 0x08048F94 - 0x08049135) to the point where it was more clear what was going on. (Follow along in the-binary.rec-processed.3.) It's operation is pretty simple to follow:

  • Create a raw socket

  • malloc() a buffer to store the final packet into

  • Fill in the source and destination addresses of the IP header, into the buffer.

    It's immediately obvious from the REC output that the first argument becomes the source address of the new packet. The value of the destination address comes from register ebx, but right near the top of the function there is an assignment from the second argument into ebx.

  • After setting the source and destination IP addresses in the packet buffer, there is a temporary intermission where some very bizarre code appears. Several calculations are made, but the results are not used. The question is raised in my mind on whether this is intentional obfuscation, or the groping of an inexperienced programmer.

  • Several other fields of the packet IP header are then filled in, all conforming to normal values. The most interesting part is the setting of the id field to the result from the routine at 0x8056058, because that function is repeatedly called throughout the program. What has been vexing is that it involves a lot of math, making me suspicious that it is some kind of additional encryption. It's use here finally makes me realize that it is probably a call to rand(), rather than some kind of encryption. This new supposition is supported by the fact that the function's address is in the midst of other library functions (library functions are grouped together in an executable, because of the way that linkers work). I also learned from experimenting with sendraw.c that Linux overrides both the checksum and id fields of the IP header, even with a raw socket.

  • There is a bit of hairy code that calculates the IP header checksum.

  • Then finally, the third argument is used to copy data into the packet buffer payload area, just before it is sent out over the socket.

Searching for calls to send_packet() (0x08048F94), I found that there are only two -- in the function I renamed send_reply() (located at 0x08048ECC - 0x08048F2E). What's interesting about this is that both calls use the same static variable for the the first argument (source address) to send_packet(), so I renamed the static variable to SOURCE_ADDRESS. We can also see that the first argument to send_reply() gets used either as the destination address, or in a calculation to figure out the destination address.

Tracing backwards another level, it turns out that there are only two callers to send_reply(), and both use the same variable as the first argument. Clearly, this variable (ebp - 17636) is a pointer to some kind of table of addresses.

Proceeding in a similar manner, I don't make much progress on understanding the use of this table until I get to decompiling case 1 of the switch statement. This clears matters up for me, and makes me realize that the code in send_reply() is a bit obtuse because of a poor decompilation by REC. The variable at ebp - 17636 is just a simple table of four byte IP addresses which are used as destinations for reply packets from the-binary.

Approximating the Time

The release of a Snort capture by the Honeynet Project allows us to place an approximate time on this breakin. A full timestamp is available in libpcap packet captures, so we can use tcpdump to extract it:

% tcpdump -tt -r snort.log | head -2
1014885154.417321 172.16.196.132 > 172.16.183.2:  ip-proto-11 402
1014885206.930071 172.16.196.132 > 172.16.183.2:  ip-proto-11 402
The first column is the time that the packet was captured. I believe the timestamp information comes from the driver, so I don't think we can make a definitive statement about what timezone the timestamp is in reference to. However, it is probably safe to assume that the computer used to capture the log was reasonably synchronized to a valid time source. Therefore, we can use perl to convert this to a human readable form:
% perl -e "print scalar(gmtime( 1014885154 ) );"
Thu Feb 28 08:32:34 2002
and learn that the initial intrusion happened on or before 28-Feb.

Tools Used

During the analysis of this scan, we used the following tools:

ToolDescriptionLocation
stringsThis tool is used to find all printable strings (not containing control characters) within a binary file.standard on Unix platforms
md5sumCalculates a checksum of a file. Often used to verify downloads.standard
objdumpDisplays disassembly and other information for object files.standard
fenrisForensic package for tracing program execution and returning symbols to statically-linked programs.http://lcamtuf.coredump.cx/fenris/devel.shtml
RECDecompiles binaries on many architectures into higher-level pseudocode.http://www.backerstreet.com/rec/rec.htm
tcpdumpUsed to capture packets from the network, or to read previously saved packets.http://www.tcpdump.org
gdbAn application debugger. http://sources.redhat.com/gdb/

Notes

[1]

Initially, I ran strings without the -a option, but Brad pointed out my error. My memory from long ago (on Solaris) was that the -a flag was to also show printable strings of lengths 2 and 3, which I didn't feel was useful for this analysis. Now I wonder if this was ever true, or if the meaning of the flag has changed in the intervening years.

[2]

To find the addresses of statements from the REC output, I would find a nearby label that REC had inserted (label names are based on their address within memory). I would then examine the statements towards my target statement, and compare them with the disassembled output. I am not familiar with Intel x86 assembly, so I mostly identified statements by locating assignments (mov statements in the assembly) and matching up constants that were used.

[3]

This was a critical error introduced by me. The command actually comes from the second byte of the decoded data, which would be at position 23 in the original IP packet.