Findings

We briefly state our findings here, with a detailed analysis of how we arrived at these findings following.

The binary is a remotely controllable program which can be used to execute arbitrary commands on the compromised system, and provides capabilities for several types of denial of service (DoS) attacks. The binary uses a rarely-used IP protocol number (Network Voice Protocol, or NVP) for control packets, which thus slip by many firewalls and IDS systems which are oriented toward TCP, UDP, and ICMP attacks. The binary can generate several DoS attacks including DNS reply floods, DNS server floods, SYN floods, and IP fragment attacks.

Next, we present a high level timeline that includes the sequence in which we discovered what the binary did. The tools and methods used are described, and examples are given where appropriate. The purpose of this document is to show how things are discovered, not to give in-depth documentation of all the discovered functionality. For a thorough documentation of the workings of the binary, see the technical advisory.

Timeline of the events

May 10 [ 9:30-11:30 am]

The binary was downloaded on an internal machine (Eliza) in the CoPS lab. Eliza was connected directly (using a crossover network cable) to another lab machine, Darwin, which was used as a router and packet sniffer.

A sniffer was started on Darwin to inspect all the packets that might be sent from and to Eliza. The binary file was run on the Eliza using strace under normal user login to figure out what system calls were made. The output log file for strace showed that it checked the user id and quit. There was no activity on the sniffer. The binary was again run on Eliza under 'root'. The first try at strace failed since the process almost immediately forked. This execution was tried again using the "-ff" flag to strace. Now the strace log file showed a socket call and then a recv call. The entire strace is shown below:
```
chdir("/")                              = 0
close(0)                                = 0
close(1)                                = 0
close(2)                                = 0
time(NULL)                              = 1021044128
socket(PF_INET, SOCK_RAW, 0xb /* IPPROTO_??? */) = 0
sigaction(SIGHUP, {SIG_IGN}, {SIG_DFL}, 0x400587a8) = 0
sigaction(SIGTERM, {SIG_IGN}, {SIG_DFL}, 0x400587a8) = 0
sigaction(SIGCHLD, {SIG_IGN}, {SIG_IGN}, 0x80575a8) = 0
sigaction(SIGCHLD, {SIG_IGN}, {SIG_IGN}, 0x80575a8) = 0
recv(0, 0xbffff2d4, 2048, 0) = ? ERESTARTSYS (To be restarted)
+++ killed by SIGKILL +++
```
The recv was a blocking call, waiting for a packet with protocol number 11 (0xb in the socket call above).

We ran netstat on Eliza and used NMAP from Darwin to check for open ports. This didn't yield anything informative.

The next step was to examine the binary in more detail. The "file" command was used, and we discovered that the binary is stripped and statically linked. The binary was disassembled using "objdump -d". The last system call in the strace log was a recv(), so this served as a starting point for further analysis of the binary code. System calls are fairly easy to find, even in a stripped binary, because they always involve an "int 0x80", which can be search for. We looked up the system call function number (for the al and in this case bl registers), and fairly quickly found the recv() function.

We looked through the code and verified what we saw in the strace: the protocol being used was protocol 11 (NVP). Combining this information with the strace log, we deduced that the binary was listening on a raw socket for protocol 11 packets. The binary opens a socket and waits for remote commands. Since protocol 11 is a rarely-used (ever-used???) protocol number, this provides a non-standard way of doing communication with the compromised machine. In fact, checking our standard RedHat 7.2 installations, we found that the firewall settings only filtered out unwanted TCP and UDP packets. Protocol 11 slips right past! This should definitely be fixed in the firewall -- our recommendation is that the firewall rules have a default policy of "REJECT", so that any unknown or unexpected packet formats are not allowed through. Furthermore, it would be easy to set up an IDS to alert us when a protocol is seen that is not ICMP, UDP, or TCP.

May 10 [ 2:30-4:30 pm]

The compiler signatures in the .comment section of the binary show compiler version 2.7.2.l.2. A search for this string on google shows some hits on binaries compiled for Slackware 3.1, so we download the libc package for Slackware versions 3.0-3.4. There's a perfect match for version 3.1, so we conclude the binary was compiled and linked with the Slackware 3.1 library. (The recv() function code also matches up with what we identified earlier).

A few perl scripts were written, one for making "signatures" of functions extracted from the libc package, and one for making signatures of functions in the challenge binary. These are extracted to files, one per function, and then a couple of "for" loops in the shell accompanied with "diff" allow us to match functions from the binary to functions from libc. This makes a very simple symbol table, and a third perl script is used to go through and add labels to the objdump output on the challenge binary. Now the sequence of events in the code is much easier to follow!

May 12 [ 10:30-12:00 am]

IDA Pro is investigated; notes on the web site indicate that the preview version won't work for the challenge, so the freeware version is downloaded (along with the supplied patch). The "Flirt" library function recognition doesn't work, as there are no linux libc signatures. There doesn't seem to be a way to create new signatures, and in fact there doesn't even seem to be a way to import the symbol names that we learned with the perl scripts. IDA Pro was played with for a little while to see what its capabilities are.

May 13 [ 9:30 am - 1:30 pm ]

We began looking at code after recv call. We looked at the labeled objdump output and IDA Pro. In the end we stick with just using the objdump output in an emacs window. We were able to add comments directly. Also, emacs macros are used for inserting new labels as functionality is discovered. IDA Pro did recognize a switch statement and jump table in the code following recv, which was very helpful.

Between the recv call and the switch statement is a call to a non-library function. We guessed that this is a data decoding routine, and the switch statement selects one out of 12 different commands that can be sent to the binary. The data decoding (and hence encoding) routines are figured out, and written and tested in C. We used these routines as the beginning of a control program, where we could remotely control the compromised machine. We knew that there were 12 commands from the switch statement, so we made a skeleton that will support 12 generic commands that we could fill in later with functions as they were discovered.

As of now our understanding of the first case of the switch statement is the following: It sets some values from global variables, calls the data encoding function, followed by getting a random value which is used as a random packet length (from 400 to 600). The following function call indicates that it tries to open a RAW socket. We guess that this is to send the encoded response back through the raw socket.

May 13 [ 2:30-4:30 pm]

"Command 3" looks interesting because of the call to the system() function, so this case is examined to learn that it allows for an arbitrary system command (send in the encoded command) to be executed. In discovering where the IP address to send the response to is kept, it becomes clear that this is set with "Command 2" and so that code is also examined. These two commands are completely figured out. Command 2 is used to set communication parameters for return messages.

May 14 [ 2:00-5:00 pm]

The control program is modified to support these two commands, and it is tested. After a couple of debugging runs, using sniffers on both eliza and darwin to watch traffic, the remote machine is successfully controlled. Results of "ps -ef" and "ls -l /etc" are returned (although the second command timed out since it took longer than 10 seconds, and since the end of the command was never reached the controller had to be modified to time out as well). Interestingly, we found that due to the poor way in which the data encoding procedure was written, prior results in the send buffer are pushed back in the buffer and are visible in the padded packet. In this case, this is the actual plaintext result of the command, which should not be visible in the packet! Here's a snort dump showing one of these packets (from a "ps -ef" command):

05/22-10:19:29.298257 0:60:97:D7:BD:A7 -> 0:60:8:93:2C:FA type:0x800 len:0x24D
10.1.1.7 -> 10.1.1.5 NVP TTL:250 TOS:0x0 ID:62211 IpLen:20 DgmLen:575
03 60 87 A1 0D 6D C8 FF 36 6D A4 DB 12 49 80 E7  .`...m..6m...I..

  ... 23 lines of random-looking data snipped ...

7D B4 EB 22 59 A0 E7 38 7F C6 17 5E A5 DC 4E D0  }.."Y..8...^..N.
59 D5 70 03 55 49 44 20 20 20 20 20 20 20 20 50  Y.p.UID        P
49 44 20 20 50 50 49 44 20 20 43 20 53 54 49 4D  ID  PPID  C STIM
45 20 54 54 59 20 20 20 20 20 20 20 20 20 20 54  E TTY          T
49 4D 45 20 43 4D 44 0A 72 6F 6F 74 20 20 20 20  IME CMD.root
20 20 20 20 20 31 20 20 20 20 20 30 20 20 30 20       1     0  0
4D 61 79 31 33 20 3F 20 20 20 20 20 20 20 20 30  May13 ?        0
30 3A 30 30 3A 30 34 20 69 6E 69 74 0A 72 6F 6F  0:00:04 init.roo
74 20 20 20 20 20 20 20 20 20 32 20 20 20 20 20  t         2
31 20 20 30 20 4D 61 79 31 33 20 3F 20 20 20 20  1  0 May13 ?
20 20 20 20 30 30 3A 30 30 3A 30                     00:00:0

Next we began examining the code for "Command 4". It involves a call to a rather long function, so the code was read and written out by hand in a C-like pseudo code for easier analysis. UDP packet assembly code is recognized, with destination addresses taken from a table of 8000 addresses (presumably DNS servers because the destination port is 53).

May 15 [ 9:30 am - 2:30 pm ]

With some knowledge of how the command 4 works, and where parameters are extracted from the command packet, the control program was modified to create a sample control packet, and this was sent to the compromised machine to see what would happen.

Many guesses about the command parameters were verified by examining the sniffed packets. All parameters are modified slightly to see the effect, and between this knowledge and looking at the code the entire "command 4" was figured out as being a DNS reply flood attack. It was noticed that command 9 was almost identical, and looking at the parameters it was noticed that the only difference was a rate-controlling parameter.

A basic structure of recording the PID of a currently-running attack was discovered, and this allowed us to quickly figure out what commands 1 and 8 did. Command 1 was discovered to give the status of the running process. Command 8 turned out to be a kill command. These were quickly added to the control program.

Command 7 was identified by the overall structure (e.g., using the system() call) as being very similar to command 3, so was quickly decoded (it executes a system command, like command 3, but has a timeout of 20 minutes instead of 10 seconds, and the output is not kept or returned).

Commands 5, 10, 11, and 12 all looked similar in overall structure to commands 4 and 9 (the active attack commands), and the only other remaining command was command 6. Looking at the sequence of function calls, without paying too much attention to the details, led us to the guess that this starts a backdoor shell listening on a TCP port. We started the command, and netstat identified the port used as port 23281. Connecting didn't seem to do anything, so we looked back at the code and discovered that the first thing entered after connecting to the port had to be a password, "SeNiF" (the actual string stored in the binary was "TfOjG", to prevent this from being seen by the "strings" command). With this knowledge a shell was successfully launched on the compromised machine, and interacted with using netcat from a different machine. Here's a cut-and-paste of doing this, with our typing shown in bold (this was captured after the complete analysis, so you can see all of the command descriptions in the control program):

[root@darwin code]# ./control eliza

Your command, oh mighty one (enter 0 for menu): 0

 1. Attack status query
 2. Set communication parameters
 3. Execute command
 4. DNS reply flood (slow)
 5. Fragment flood
 6. Execute shell
 7. Silent execute command
 8. Stop current attack
 9. DNS reply flood (rate adjustable)
10. SYN flood (slow)
11. SYN flood (rate adjustable)
12. DNS request flood

13. Quit

Your command, oh mighty one (enter 0 for menu): 6

Instructions:  To access the shell, use netcat to connect to
port 23281 on the target machine.  The very first thing you send
must be a line with the backdoor password: SeNiF
After that, it's a standard netcat backdoor shell (with the usual
caveats of no prompts, etc.

Your command, oh mighty one (enter 0 for menu): 13
[root@darwin code]# nc eliza 23281
SeNiF
ls -l /
total 164
drwxr-xr-x    2 root     root         4096 May  8 05:46 bin
drwxr-xr-x    3 root     root         4096 May  8 05:32 boot
drwxr-xr-x   17 root     root        77824 May 13 22:02 dev
drwxr-xr-x   43 root     root         4096 May 14 14:46 etc
drwxr-xr-x   13 root     root         4096 May  1 12:55 home
drwxr-xr-x    2 root     root         4096 Jun 21  2001 initrd
drwxr-xr-x    3 root     root         4096 May 10 10:22 lhome
drwxr-xr-x    6 root     root         4096 May  8 05:48 lib
drwxr-xr-x    2 root     root        16384 May  8 05:25 lost+found
drwxr-xr-x    5 root     root         4096 May  1 12:31 mirror
drwxr-xr-x    2 root     root         4096 Aug 29  2001 misc
drwxr-xr-x    5 root     root         4096 May  8 06:01 mnt
drwxr-xr-x    2 root     root         4096 Aug 23  1999 opt
dr-xr-xr-x   57 root     root            0 May 13 22:00 proc
drwxr-x---    4 root     root         4096 May 22 10:07 root
drwxr-xr-x    2 root     root         4096 May  8 05:52 sbin
drwxrwxrwt   10 root     root         4096 May 15 19:29 scratch
drwxrwxrwt    6 root     root         4096 May 22 04:02 tmp
drwxr-xr-x   15 root     root         4096 May  8 05:31 usr
drwxr-xr-x   18 root     root         4096 May  8 05:53 var
exit

We did a web search for "senif" and "tfojg" (the way the password was hidden in the binary) to see if there was information on the web to be learned, but found nothing interesting.

We returned to the remaining commands (numbers 5, 10, 11, and 12) -- looking at the code only to determine the number of parameters to send, we tried all these commands while varying the parameters, watching a sniffer to see what happened. This made it easy to quickly discover the purpose of all the remaining commands (well, quicker than reading all the disassembled code anyway!).

May 16 [ 5:00-6:00 pm and 10:00-11:30 pm]

With knowledge of what these commands were supposed to do, the code was examined for new insights. The only additional thing discovered was that in command 12 the source address (in addition to the source port) could be randomized by using an address of all zeroes.

The control program was modified for commands 4, 9, and 12 to get attack parameters from the user, rather than using the hard-coded experimental values we had used in exploring their function.

May 17 [9:45 to 11:45 am]

The objdump code was examined in detail to ensure that the function parameters matched the guesses.

The controller program was further modified to provide full functionality for all the commands.

At this point we have completely determined all the functionality of the unknown binary, and have a control program that can exercise all functionality of the binary. Focus now turned to writing up the results as required for the reverse engineering challenge.

Tools used

netstat
strace
netcat
nmap
objdump (disassembler)
emacs
Slackware 3.1 library
perl
Ethereal (sniffer)
Various references found through google and in Linux header files for protocol numbers, header formats, etc.

Methods summary

We had great success with a general technique of experimental execution interleaved with examining the disassembled code. We think this went much faster than would have been possible just looking at assembly code. Furthermore, experimenting alone cannot discover all functionality in an unknown binary, since you cannot be sure that your experiments have exercised all possible flows in the code. By setting parameters to known values (for example, integers starting at 1) and seeing what happens, it is possible to make almost certain guesses about the purpose of the parameters (for example, seeing a SYN flood to address 1.2.3.4 shows that the address comes from the first 4 parameters). Then with this knowledge, the functions in the disassembled code can be searched for uses of the parameters to see if the usage is consistent with our guess. A similar pattern of experimenting and examining code can be seen with the use of strace and using the output to zero in on particular function calls (particular in the beginning).

On the experimental side, the tools we found most valuable were strace and a sniffer (we usually used ethereal, although snort was used in some cases). When examining code, nothing we tried worked as well as simply having the code in an editor (emacs) and using the built-in searching and keyboard macro capabilities to label things as they were discovered. We had high hopes initially for IDA Pro, but the free version we used was more frustrating that it was worth. Perhaps the full commercial version would be easier to use.

Finally, while we discussed attaching to the running binary using gdb so that we could actively trace through the code, we never felt the need to do this. Perhaps in a more complex binary this would be something that would have a higher payoff.

References:

http://www.iana.org/ for official protocol numbers
http://www.ietf.org/rfc/rfc741.txt (RFC for NVP)
URLs for basic background
include files for IP/TCP/UDP header for format and values
man pages for library functions
Slackware 3.1 library