Analysis

1. Introduction

In this paper, we document our methodology and present our results in analyzing an unknown binary. Our task is to determine, among other things, its main functions and features.
An brief overview of the steps that we took can be described as follows:

First, we tried to learn as much as possible about the binary, without having to execute it. This involved applying GNU/Linux utilities like strings and objdump on the binary. This way, we get an idea of what to expect when we do execute the binary.
Then, we execute the binary under a controlled environment to observe its behavioural characteristics. A controlled environment will prevent the binary from doing unnecessary damage.
Next, we proceeded to disassemble the binary to further determine its full range of functions.
Finally, we provided external stimuli to the binary by injecting packets into the network to understand its network behaviour.

Details of our analysis in each of the above steps will be described in the subsequent sections. We will also be highlighting a bug that we uncovered in the code. Finally, we will conclude this paper with some general comments.

2. Tools of the trade

We used both public domain and commercial tools in our analysis. Besides native GNU/Linux utilities like strings, strace, gdb, file, objdump, lsof, tcpdump and ethereal, the following tools deserve special mention:

2.1 IDA Pro

IDA Pro [1] is by far the best code disassembly tool in our opinion. It supports a myriad of processors and platforms, and has several excellent features that drastically reduced our code disassembly effort. While it is not free (the standard edition costs US$299), it is well worth the money if you need a disassembler to do your job. Perhaps the the only thing to wish for in this particular situation is a version that runs natively in Linux; a Windows environment is required. This requirement introduces a somewhat minor inconvenience, because the binary in question turns out to be a Linux ELF executable. This means that we have to juggle between a Linux machine and a Windows machine.

For those of you who are thinking of trying out IDA Pro in Wine [2], we can report that we were unsuccessful in getting the latest 4.21 version, nor the freeware version, to run in Wine. However, we did managed to get the older 4.18 version to work with Wine, with the occasional non-fatal bugs. Other versions may work too. Unfortunately, it may be difficult to get hold of an older version of IDA Pro, because the IDA Pro website no longer provides them for download. So unless you are a registered user and keep an older copy around (like us), then you may be out of luck, until the compatibility issues get sorted out.

2.2 LibNet

LibNet [3] is a network programming library that aims to provide a generic interface to network programming, without being tied down to any particular types of network. LibNet also gives the programmer greater control over the packet creation process. In the later part of our analysis, we found a need to craft specific non-TCP or UDP network packets as external stimuli to the binary. We did this easily with LibNet.

3. First Looks

One of the first things we did was to run the strings command on the binary, probably because it is one of the easiest thing to do :-)

Looking through the entire strings output, we immediately made 2 interesting observations. Firstly, we spotted a section of strings that gave some indication as to the nature of this binary:

[mingetty]
/tmp/.hj237349
/bin/csh -f -c "%s" 1> %s 2>&1
TfOjG
/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/bin/:.
PATH
HISTFILE
linux
TERM
/bin/sh
/bin/csh -f -c "%s" 
%d.%d.%d.%d
%u.%u.%u.%u
%c%s

We can clearly see what looks like:

a process name
a temporary storage file name
some shell commands
some environment variables
some IP address format specifiers
a strange looking string "TfOjG"

At this point, it is difficult to ascertain the exact purpose of these strings, although one might be able to make intelligent enough guesses. As it turns out, the purpose of these strings was revealed to us at a later stage, during the disassembly process. Incidentally, this set of strings can also be useful as a signature for detecting the binary in the wild.

The second interesting bits were some strings that may allude to the origin of the binary:

@(#) The Linux C library 5.3.12
GCC: (GNU) 2.7.2.l.2
GCC: (GNU) 2.7.2

This combination of gcc and C library versions were prevalent 4 or 5 years ago. Notice that this is also the same period of time where Distributed Denial of Service (DDoS) started to rear its ugly head, led by the tools like Tribes Flood Network [4]. It is possible that the binary may have originated from around the same era. Of course, it is also possible that the binary was compiled on a really old Linux distribution, e.g. RedHat 5.2.

Using the file command, we can see that this is a Linux ELF binary built on the i386 platform, and is statically linked with symbols and relocation bits stripped. This is bad news, because without symbols information, the disassembly process will be more difficult.

root:# file the-binary
the-binary: ELF 32-bit LSB executable, Intel 80386, version 1, statically linked, stripped

The objdump command can also be used to derive various information about an object file. But in this case, we did not learn any useful information from the `objdump -x` command.

4. Execution in a Controlled Environment

Now that we have a vague idea regarding the nature of the binary, our next step was to execute it in a controlled environment, to observe its behavioural characteristics. Some information that we would be interested to find out are:

What network connections does it open?
What files does it access?
What system calls does it make?
What are its process and memory footprint?

Our "controlled environment" is simply a default installation of a Red Hat Linux 7.2 Workstation, connected to a network hub. Also hooked up to this hub is another Red Hat Linux 7.2 Workstation. This second Linux box functions as a network sniffer - it will pick up any network packet that the binary emits. The sniffing software we used was the ever-reliable tcpdump. This two machines are not connected to any other networks.

Fig 1: Testbed setup (IP addresses are for examples only)

Instead of simply executing the binary at the command line, we decided to run it via the strace command. strace is a GNU/Linux system utility that runs a specified program, and at the same time intercepts and displays the system calls that are made by the program as well as the signals that are received by the program. It is an extremely useful debugging and instructional tool. When using strace, it is often useful to use the "-f" switch, to instruct strace to follow forks into child processes.

Thus, executing `strace -f` with the name of the binary as the argument, we obtained the following output:

86    execve("./the-binary", ["./the-binary"], [/* 28 vars */]) = 0
86    personality(PER_LINUX)            = 0
86    geteuid()                         = 0
86    sigaction(SIGCHLD, {SIG_IGN}, {SIG_DFL}, 0x4004e928) = 0
86    fork()                            = 87
86    _exit(0)                          = ?
87    setsid()                          = 87
87    sigaction(SIGCHLD, {SIG_IGN}, {SIG_IGN}, 0x80575a8) = 0
87    fork()                            = 88
87    _exit(0)                          = ?
88    chdir("/")                        = 0
88    close(0)                          = 0
88    close(1)                          = 0
88    close(2)                          = 0
88    time(NULL)                        = 1021725793
88    socket(PF_INET, SOCK_RAW, 0xb /* IPPROTO_??? */) = 0
88    sigaction(SIGHUP, {SIG_IGN}, {SIG_DFL}, 0x4004e928) = 0
88    sigaction(SIGTERM, {SIG_IGN}, {SIG_DFL}, 0x4004e928) = 0
88    sigaction(SIGCHLD, {SIG_IGN}, {SIG_IGN}, 0x80575a8) = 0
88    sigaction(SIGCHLD, {SIG_IGN}, {SIG_IGN}, 0x80575a8) = 0
88    recv(0,

Now things start to get interesting. Let's look at the above system trace in more detail.

The leftmost column lists the process IDs, or PIDs, of the current process. The PID of the binary goes through 3 incarnations, i.e. 86, 87 and 88, because of the multiple forks. PID 88 is its final process ID.

First of all, the binary uses the geteuid() function to determine if it is being executed with root privileges. If this function fails, the binary will exit. This confirmation of root privileges is necessary because subsequent actions of the binary, like opening a raw network socket, requires root privileges.

The next few system calls will look confusing to someone who has not programmed a daemon process in a Unix environment before. For the uninitiated, he/she is urged to refer to the Unix Programming FAQ [5]. In particular, section 1.7 of the FAQ is especially helpful to understand the initial behaviour of the binary. Here is an excerpt:

Simply invoking a program in the background isn't really adequate for these long-running programs; that does not correctly detach the process from the terminal session that started it. Also, the conventional way of starting daemons is simply to issue the command manually or from an rc script; the daemon is expected to put itself into the background.

Here are the steps to become a daemon:

fork() so the parent can exit, this returns control to the command line or shell invoking your program. This step is required so that the new process is guaranteed not to be a process group leader. The next step, setsid(), fails if you're a process group leader.
setsid() to become a process group and session group leader. Since a controlling terminal is associated with a session, and this new session has not yet acquired a controlling terminal our process now has no controlling terminal, which is a Good Thing for daemons.
fork() again so the parent, (the session group leader), can exit. This means that we, as a non-session group leader, can never regain a controlling terminal.
chdir("/") to ensure that our process doesn't keep any directory in use. Failure to do this could make it so that an administrator couldn't unmount a filesystem, because it was our current directory. [Equivalently, we could change to any directory containing files important to the daemon's operation.]
umask(0) so that we have complete control over the permissions of anything we write. We don't know what umask we may have inherited. [This step is optional]
close() fds 0, 1, and 2. This releases the standard in, out, and error we inherited from our parent process. We have no way of knowing where these fds might have been redirected to. Note that many daemons use sysconf() to determine the limit _SC_OPEN_MAX. _SC_OPEN_MAX tells you the maximun open files/process. Then in a loop, the daemon can close all possible file descriptors. You have to decide if you need to do this or not. If you think that there might be file-descriptors open you should close them, since there's a limit on number of concurrent file descriptors.
Establish new open descriptors for stdin, stdout and stderr. Even if you don't plan to use them, it is still a good idea to have them open. The precise handling of these is a matter of taste; if you have a logfile, for example, you might wish to open it as stdout or stderr, and open `/dev/null' as stdin; alternatively, you could open `/dev/console' as stderr and/or stdout, and `/dev/null' as stdin, or any other combination that makes sense for your particular daemon.

As we can see now, the binary was merely observing good programming etiquette expected of any daemon process, even though steps 5 and 7 were skipped; they are non-applicable in this situation.

After closing standard in, out and error, the binary obtains the local time and opens a raw socket of protocol 11 (0xb). According to the official assigned list of protocol numbers maintained at iana.org [6], protocol 11 is assigned to "Network Voice Protocol", defined in RFC741 in 1977 [7]. We can safely say that this is now a defunct protocol, at least in the modern Internet space. So the binary is using a protocol that is not used for any known purpose.

Then, after setting itself up to ignore the SIGHUP and SIGTERM signals, it sits quietly on the opened raw socket connection listening for incoming data.

Using the netstat command, we verified that indeed a raw socket of protocol 11 (0xB) has been opened and is in a listening state:

Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State      
raw        0      0 *:11                    *:*                     7

We also verified with lsof that the binary did not access any files on the filesystem up to this stage. Neither did our network sniffer pick up network packets emanated from the binary.

There is one additional action taken by the malware that is not apparent from the strace output. If one were to use the ps command to list the processes resident on the system at this moment, one would notice a process that goes by the name "[mingetty]" with a PID of 88. As we recall, PID 88 is the final PID of the binary. Apparently, the binary has changed its process name to "[mingetty]".

However, if the top command was executed instead, then the process name shown would still be the original name of the malware, i.e. "the-binary". Clearly, top and ps uses different ways to read the process name. top is the more sophisticated as it relies on the /proc-based filesystem.

So much for the easy part. Now for the heady stuff...

5. Binary Analysis

We loaded the binary into IDA Pro, which promptly produces a disassembled code listing. The size of the entire binary is over 200Kbytes long. However, after much study of the disassembled code, we determined that the main daemon logic code occupies only a small portion of the entire binary, specifically from .text:0x08048134 to .text:0x08048EC5. Most of the remaining codes are standard C library function codes. So we concentrated our efforts on this portion of the binary.

The following commented listings are the fruits of our labour, and covers almost the entire functionality of the daemon:

Listing 1: Main daemon logic code in assembly language
Listing 2: Pseudo-C equivalent of main daemon logic code

All-in-all, the daemon is programmed with 12 functionalities, dispatched by a switch statement. Briefly, these functions are:

Function 0 is for status query. The information returned consist of the tuple: 1,7, attacking child pid, attacking type ( if child pid != 0 ) The numbers 1 and 7 may be some form of version information, as it is hardcoded into the binary.
Also, payloadbuf[1] is set to 3, which probably indicates a reply packet, as opposed to payloadbuf[1] == 2 for traffic from a master to a slave.
Function 1 sets the reply IP address of the master.There can be 10 possible reply addresses, and a reply may be sent to all 10 of them, but only 1 should be the real master's IP address. The rest are most possibly decoys. There are 3 possible cases:

payloadbuf[2] == 2: all the reply addresses are copied from the payload, of which one should be the true IP address of the master.
payloadbuf[2] == 0: all the reply addresses are generated randomly except the first, which is copied from the payload. This also sets a global var (ctrl_parameter) which causes all future replies to be sent to only 1 host -- the master.
payloadbuf[2] == other values: An index is generated randomly. All reply addresses are generated randomly except for the address at the particular index generated earlier. This special address is copied from the payload and should be the master's IP address.

Function 2 executes a command specified in the payload and sends back the output.
Function 3 starts a single packet reflective DNS query attack
Function 4 starts a fragmented IP packets attack
Function 5 allows a remote shell to be spawned when a remote telnet client connects to port 23281 and supply the password "SeNiF".
Function 6 executes a command specified in the payload, but do not send back the output
Function 7 terminates an active listening remote shell or an ongoing attack
Function 8 starts a multiple packet reflective DNS query attack
Function 9 starts a single packet TCP SYN attack
Function 10 starts a multiple packet TCP SYN attack
Finally, Function 11 floods a DNS server with DNS queries

More details will be revealed in the next section when we look at the network behaviour.

We also noted that the daemon sports a network data encoding process. All network packets exchanged between the master and the daemon are encoded in this way. Fortunately for us, the encoder and decoder codes were easily isolated. Furthermore the algorithm used turned out to be quite simple and straightforward, so we had no problems translating the encoder and decoder assembly codes into their pseudo-C equivalents.

Listing 3: Network data encoder in assembly language
Listing 4: Pseudo-C equivalent of network data encoder

Listing 5: Network data decoder in assembly language
Listing 6: Pseudo-C equivalent of network data decoder

6. Network Analysis

6.1 Getting interactive with the binary

In order to study the network behaviour of the binary in closer details (it is afterall a network daemon), we need to be able to trigger the binary from the network side by sending it network packets that it can recognize. From the previous section, we have already figured out the network encoding algorithm that the daemon and its master use. That enabled us to write the following set of programs using libnet:

Listing 7: create_payload.c : This is a single program that can create the necessary payloads to trigger all the 12 functions of the binary. Once the program runs, the user will receive instructions on screen to create the necessary payload for a specific case. The payload will then be written to a file.

Listing 8: encode.c: This is the network data encoder that encodes the payload so that the binary can decode it.

Listing 9: inject.c: This programs injects the payload into the network.

Listing 10: receive.c: For Functions 1 and 3, the binary will respond with some data packet. This program can be used to listen for the reply. However, in these cases, the payload for Function 2 should first be sent to the binary to set the return return address first.

So the sequence of commands that we run for each case is:

./create_payload output_file
cat output_file | ./encode | ./inject -s bogus_source_ip -d the-binary_ip

If a reply is expected, then of couse "./receive" must be execute before the packet injection command.

Please refer the README file and the source codes themselves in the inject directory for more details.

Now that we are able to trigger to the binary externally, let us examine its network behaviour in details for each of the 12 functions.

6.2 Network behaviour of the binary

All the network traffic shown below were captured with tcpdump.

6.2.1 Function 1

We need to examine Function 1 first, because it sets the reply IP address of the master. To illustrate what an encoded packet looks like, the following is the packet that we send to the binary for this case:

10:14:08.640747 < 99.98.97.96 > 10.0.1.28: ip-proto-11 502 [tos 0x10]

             4510 020a 00f2 0000 300b b809 6362 6160
             0a00 011c 0201 1730 4869 8098 cfed 0c2c
             4d6f 92b6 db01 2850 79a3 cefa 2755 84b4
             e517 4a7e b3e9 2058 91cb 0642 7fbd fc3c
             7dbf 0246 8bd1 1860 a9f3 3e8a d725 74c4
             1567 ba0e 63b9 1068 c11b 76d2 2f8d ec4c
             ad0f 72d6 3ba1 0870 d943 ae1a 87f5 64d4

The 1st 20 bytes in blue are the standard IP header. The next byte in red is always 2 for packets sent from a master to the binary. The next byte is not used. The rest of the packet are the encoded payload.

If we look at the corresponding decoded payload :

0000000 00 02 01 0a 00 01 20 07 08 09 0a 0b 0c 0d 0e 0f
0000020 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f
0000040 20 21 22 23 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f
0000060 30 31 32 33 34 35 36 37 38 39 3a 3b 3c 3d 3e 3f
0000100 40 41 42 43 44 45 46 47 48 49 4a 4b 4c 4d 4e 4f

The first byte is not used. The next byte is 02. We need to subtract 1 from this value to get the function number, which is function 1 in this instance. This is how the master specifies to the daemon what function to call. What follows are the parameters to the specified function. These of course will vary in content from function to function.

In the case, the next byte is 01. For function 1, we name this byte ctrl_parameter. It can take the value of either 0, 2 or any other value. The next 4 bytes contains the master IP address, which is 10.0.1.32 (0a 00 01 20). The next 36 bytes contains what we call the auxiliary IP addresses.

When ctrl_parameter is 0, the binary will only send reply to one IP, the master IP. When ctrl_parameter is 2, the binary will send replies to all 10 IPs in the payload, i.e. the master IP and the auxiliary IPs. When ctrl_parameter is any other value, which is exactly the case in the above payload, the binary will generate 9 random IPs and send its reply to the master IP and the 9 random IPs.

6.2.2 Function 0

The following packet shows a status query packet to the binary :

10:14:12.300747 < 99.98.97.96 > 10.0.1.28: ip-proto-11 502 [tos 0x10] 
             4510 020a 00f2 0000 300b b809 6362 6160
             0a00 011c 0201 172f 4862 7d99 b6d4 f313
             3456 799d c2e8 0f37 608a b5e1 0e3c 6b9b
             ccfe 3165 9ad0 073f 78b2 ed29 66a4 e323
             64a6 e92d 72b8 ff47 90da 2571 be0c 5bab
             fc4e a1f5 4aa0 f74f a802 5db9 1674 d333
             94f6 59bd 2288 ef57 c02a 9501 6edc 4bbb

What is more interesting is the way the reply is sent back by the binary:

10:14:12.320747 > 10.0.1.28 > 123.246.85.96: ip-proto-11 478
10:14:12.330747 > 10.0.1.28 > 123.183.40.136: ip-proto-11 478
10:14:12.350747 > 10.0.1.28 > 35.57.172.73: ip-proto-11 478
10:14:12.360747 > 10.0.1.28 > 45.254.84.85: ip-proto-11 478
10:14:12.370747 > 10.0.1.28 > 10.0.1.32: ip-proto-11 478
10:14:12.380747 > 10.0.1.28 > 223.211.200.132: ip-proto-11 478
10:14:12.390747 > 10.0.1.28 > 9.17.242.51: ip-proto-11 478
10:14:12.400747 > 10.0.1.28 > 158.88.202.25: ip-proto-11 478
10:14:12.410747 > 10.0.1.28 > 78.31.122.201: ip-proto-11 478

Notice that an identical payload is replied to 9 destinations. The original master IP 10.0.1.32 is one of them, and its position among the payloads had been randomly determined when we called Function 1 earlier. Actually 10 destinations should be expected, but because the auxiliary IPs are generated randomly, one of them happened to be in the invalid IP range and hence was not sent out by the kernel.

Let's look at the one legitimate reply packet to 10.0.1.32 in detail :

10:14:12.330747 > 10.0.1.28 > 123.183.40.136: ip-proto-11 478
             4500 01f2 4251 0000 fa0b cd54 0a00 011c
             7bb7 2888 0300 172f 4d64 7b84 9bb2 f91b
             ea0a 84fd 75ec 0d24 3c6f 88a0 ce14 73ec
             8030 fde8 f21c 67d4 6418 f1f0 1664 db7c
             4840 65b8 3aec cfe4 2ca8 5940 5eb4 430c
             1050 cd88 82bc 37f4 f438 c190 a604 ab9c
             d860 3558 ca8c 9f04 bcc8 29e0 ee54 132c

Notice that the first byte after the IP header is now 03. The binary always set this byte to 3 for the first packet of its reply, as opposed to 2 for packets that it receives. We will examine this in more details when we see a reply consisting of multiple packets in Function 2 below.

The first 5 bytes constitutes the reply, and they decode to 00 01 07 00 00. The first byte is taken from a global variable in the binary, but appears to be always 0. The next 2 bytes are hardcoded and will always appear as 01 07 for all packets sent for this case. We suspect that this is some version information. The 4th byte will be 0 if there are no active attack ( Functions 3,4,8,9,10,11 ) or remote shell ( Function 5 ). If this byte is 1, then the fifth byte contains the currently active function number + 1. This indicates to the master which function is currently active for this daemon.

6.2.3 Function 2

In this function, a command is contained in the payload and the binary will execute the command, then sends back the output of the command. In this case, the command that we specify "ls -la /\n".

10:14:22.480747 < 99.98.97.96 > 10.0.1.28: ip-proto-11 502 [tos 0x10] 
             4510 020a 00f2 0000 300b b809 6362 6160
             0a00 011c 0201 1731 b43e 75b9 3cb4 eb31
             5269 8cb0 d5fb 224a 739d c8f4 214f 7eae
             df11 4478 ade3 1a52 8bc5 003c 79b7 f636
             77b9 fc40 85cb 125a a3ed 3884 d11f 6ebe
             0f61 b408 5db3 0a62 bb15 70cc 2987 e646
             a709 6cd0 359b 026a d33d a814 81ef 5ece

The decoded payload is as follows :

0000000 00 03 6c 73 20 2d 6c 61 20 2f 0a 00 0c 0d 0e 0f
0000020 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f

The second byte indicates it is case 2. The rest of the payload up to the first null character contains the command to execute. In this case, it shows the string "ls -la /\n". The binary executes this command, and pipes the output to a temporary file "/tmp/.hj237349" first. It then reads back the content of the file, before encoding the data and subsequently transmitting it.

As before, the reply is sent to multiple destinations. The difference is that now, we observed that there are 4 packets sent to each destination. That is because the binary sends at most 0x190 bytes in the encoded payload ( i.e. max pkt size = 0x190 + 2 + 20 ). If it needs to send more than this size, it splits up the content into multiple packets. So if we look at the replies sent to one particular address 123.246.85.96:

10:14:22.720747 > 10.0.1.28 > 123.246.85.96: ip-proto-11 419
             4500 01b7 75a0 0000 fa0b 6d29 0a00 011c
             7bf6 5560 0354 ...

10:14:23.220747 > 10.0.1.28 > 123.246.85.96: ip-proto-11 410
             4500 01ae 4d52 0000 fa0b 9580 0a00 011c
             7bf6 5560 0354 ...

10:14:23.720747 > 10.0.1.28 > 123.246.85.96: ip-proto-11 456
             4500 01dc 007c 0000 fa0b e228 0a00 011c
             7bf6 5560 0354 ...

10:14:24.220747 > 10.0.1.28 > 123.246.85.96: ip-proto-11 489
             4500 01fd 028b 0000 fa0b dff8 0a00 011c
             7bf6 5560 0354 ...

For all of them, the first byte after the IP header is 03, which as we noted before, indicates a reply packet back to the master. The next byte is not in use. The subsequent payload is the encoded output of the command executed earlier. In our case, it is a directory listing of the root directory. For each of the above payload, the first 2 bytes of the payload decodes to the following:

The first byte is not used. We observed that for multi packet replies, the 2nd byte is always 3 for first packet, and 4 for subsequent packets. These bytes are properly additional control bytes used for multi-packet replies.

For the rest of the functions, no reply will be sent from the binary back to the master. For the command packet sent to the binary, the first 22 bytes are similar. Hence we shall look at the encoded payload portion from now on.

6.2.4 Function 3

The encoded payload for Function 3 is:

0000000 00 04 01 02 03 04 10 e1 00 61 62 63 00 0d 0e 0f

The second byte indicates Function 3. The next 4 bytes contains the victim's ip. The next 2 bytes contains the victim port. The next byte indicates whether the binary is to use the ip 01 02 03 04 or the hostname 61 62 63 00 ("abc") as the victim.

The response from the binary is as follows :

18:20:52.777481 > 1.2.3.4.rwhois > 157.16.248.3.domain: 5620+ SOA? com. (21)
             4500 0031 a200 0000 d011 afa1 0102 0304
             9d10 f803 10e1 0035 001d 0000 15f4 0100
             0001 0000 0000 0000 0363 6f6d 0000 0600
             01

18:20:52.787481 > 1.2.3.4.rwhois > 157.16.251.2.domain: 43161+ SOA? com. (21)
             4500 0031 c200 0000 bd11 9fa2 0102 0304
             9d10 fb02 10e1 0035 001d 0000 a899 0100
             0001 0000 0000 0000 0363 6f6d 0000 0600
             01

18:20:52.797481 > 1.2.3.4.rwhois > 157.160.32.12.domain: 61119+ SOA? com. (21)
             4500 0031 2e00 0000 ba11 110a 0102 0304
             9da0 200c 10e1 0035 001d 0000 eebf 0100
             0001 0000 0000 0000 0363 6f6d 0000 0600
             01

18:20:52.807481 > 1.2.3.4.rwhois > 157.161.1.2.domain: 4076+ SOA? com. (21)
             4500 0031 0b00 0000 a211 6b13 0102 0304
             9da1 0102 10e1 0035 001d 0000 0fec 0100
             0001 0000 0000 0000 0363 6f6d 0000 0600
             01

18:20:52.817481 > 1.2.3.4.rwhois > 157.164.136.8.domain: 42875+ SOA? com. (21)
             4500 0031 9d00 0000 9411 6009 0102 0304
             9da4 8808 10e1 0035 001d 0000 a77b 0100
             0001 0000 0000 0000 0363 6f6d 0000 0600
             01

18:20:52.827481 > 1.2.3.4.rwhois > 157.164.136.9.domain: 42651+ SOA? com. (21)
             4500 0031 4900 0000 8111 c708 0102 0304
             9da4 8809 10e1 0035 001d 0000 a69b 0100
             0001 0000 0000 0000 0363 6f6d 0000 0600
             01

18:20:52.837481 > 1.2.3.4.rwhois > 157.169.10.1.domain: 20878+ SOA? com. (21)
             4500 0031 e700 0000 9511 930b 0102 0304
             9da9 0a01 10e1 0035 001d 0000 518e 0100
             0001 0000 0000 0000 0363 6f6d 0000 0600
             01

We see that the binary sends DNS queries to a preconfigured list of public DNS servers with the victim's IP address as the source IP. The result is a reflective DNS query attack on the victim. This form of attack is particularly potent, because the DNS responses can be significantly larger than the request, resulting in bandwidth amplification. Refer to CERT Incident Note IN-2000-04 [8] for more details of this attack.

6.2.5 Function 4

The encoded payload for this function is:

0000000 00 05 00 07 01 02 03 04 05 06 07 08 00 61 62 63
0000020 00

The 2nd byte indicates it is case 4. The 3rd byte indicates whether to use icmp or udp, while the 4th byte holds the victim's port if it is udp. Interestingly, only 1 byte is allocated for this purpose. Therefore the port value can only range from 1-255. The next 4 bytes contains the victim's IP address. Then comes 4 bytes for a possibly faked source IP address. The next byte indicates whether to use the ip 01 02 03 04 or the hostname 61 62 63 00 ("abc") as the victim. The response of the-binary is as follows :

18:22:55.593553 < 1.2.3.4 > 10.0.1.28: ip-proto-11 502 [tos 0x10] 
             4510 020a 00f2 0000 300b 78c6 0102 0304
             0a00 011c 0201 1733 4a68 8099 b3ce ea07
             2544 5bd3 4cc6 dd05 2e58 83af dc0a 3969
             9acc ff33 689e d50d 4680 bbf7 3472 b1f1
             3274 b7fb 4086 cd15 5ea8 f33f 8cda 2979
             ca1c 6fc3 186e c51d 76d0 2b87 e442 a101
             62c4 278b f056 bd25 8ef8 63cf 3caa 1989

18:22:55.613553 > 5.6.7.8 > 1.2.3.4: (frag 1109:9@65520) [tos 0xb2,ECT] 
             45b2 001d 0455 1ffe c401 c1c7 0506 0708
             0102 0304 0800 9567 c9ca cbcc cd

18:22:55.613553 > 5.6.7.8 > 1.2.3.4: (frag 1109:9@65520) [tos 0xb2,ECT] 
             45b2 001d 0455 1ffe c401 c1c7 0506 0708
             0102 0304 0800 9567 c9ca cbcc cd

18:22:55.623553 > 5.6.7.8 > 1.2.3.4: (frag 1109:9@65520) [tos 0xb2,ECT] 
             45b2 001d 0455 1ffe c401 c1c7 0506 0708
             0102 0304 0800 9567 c9ca cbcc cd

18:22:55.623553 > 5.6.7.8 > 1.2.3.4: (frag 1109:9@65520) [tos 0xb2,ECT] 
             45b2 001d 0455 1ffe c401 c1c7 0506 0708
             0102 0304 0800 9567 c9ca cbcc cd

18:22:55.633553 > 5.6.7.8 > 1.2.3.4: (frag 1109:9@65520) [tos 0xb2,ECT] 
             45b2 001d 0455 1ffe c401 c1c7 0506 0708
             0102 0304 0800 9567 c9ca cbcc cd

It sends out fragmented packets with 9 byte payload, but with an offset of 65520, constituting a fragmented IP packets attack.

6.2.6 Function 5

The following is the payload for Function 5 :

0000000 00 06

The binary will spawn a child to listen on tcp port 23281. When remote telnet connects to this port and types "SeNiF", a remote shell will be created. Note that if you add 1 to each byte in this password, you will get the string "TfOjG", which was the string that is shown in the strings output in Section 3.

6.2.7 Function 6

The payload for Function 6 is identical to Function 2. The only difference is that the binary does not send any reply in this case.

6.2.8 Function 7

The payload for Function 7 is as follows :

0000000 00 08

Upon receiving this packet, the binary will kill the active attacking child or remote shell on receipt of this packet. The binary maintains a global variable that indicates which function is currently active. At any one time, only 1 attack function or remote shell function can be active.

6.2.9 Function 8

The payload for Function 8 is similar to Function 3, except that it has 1 more field which controls the number of attacking packets sent out.

0000000 00 09 01 02 03 04 08 00 7b 00 61 62 63 00

The 2nd bytes indicates it is Function 8. The next 4 bytes specify the victim IP address. The next byte, highlighted in red, is the field that specifies the number of packets. The next 2 bytes specify the victim port. The next byte indicates whether to use the ip 01 02 03 04 or to use the hostname 61 62 63 00 ("abc") as the victim. The binary 's response to this case is similar to case 3.

6.2.10 Function 9

This function starts a TCP SYN attack. The decoded payload is as follows:

0000000 00 0a 01 02 03 04 00 7b 00 05 06 07 08 00 61 62

0000020 63 00

The 4 bytes after the function code is the victim's IP. The next 2 bytes is the victim's port. The next byte highlighted in red indicates whether to generate a random source IP or to use the one supplied in the payload. The next 4 bytes contains a probably faked source IP. The next byte indicates whether to use the ip 01 02 03 04 or the hostname 61 62 63 00 ("abc") as the victim.

The response the binary is as follows, illustrating a SYN attack. To learn more about the TCP SYN attack, please refer to CERT Advisory CA-1996-21 [9].

18:25:28.885667 > 94.207.87.95.29342 > 1.2.3.4.ntp: S 35354633:35354633(0) win 1033
             4500 0028 0385 0000 dc06 2117 5ecf 575f
             0102 0304 729e 007b 021b 7809 0000 0000
             5002 0409 0468 0000

18:25:28.895667 > 235.94.192.111.31802 > 1.2.3.4.ntp: S 30203676:30203676(0) win 1418
             4500 0028 032d 0000 da06 2dcf eb5e c06f
             0102 0304 7c3a 007b 01cc df1c 0000 0000
             5002 058a 9ce6 0000

18:25:28.905667 > 212.209.113.103.10768 > 1.2.3.4.ntp: S 12671260:12671260(0) win 849
             4500 0028 07b7 0000 b906 afda d4d1 7167
             0102 0304 2a10 007b 00c1 591c 0000 0000
             5002 0351 ddea 0000

18:25:28.915667 > 125.153.130.194.9037 > 1.2.3.4.ntp: S 17593335:17593335(0) win 449
             4500 0028 0dcf 0000 8606 22a0 7d99 82c2
             0102 0304 234d 007b 010c 73f7 0000 0000
             5002 01c1 10f5 0000

6.2.11 Function 10

The payload for Function 10 is similar to Function 9, except that it has 1 more field highlighted in red that controls the number of attacking packets sent out.

0000000 00 0b 01 02 03 04 00 7b 01 05 06 07 08 09 00 61
0000020 62 63 00

The response for Function 10 is similar to Function 9.

6.2.12 Function 11

The payload for Function 11 is as follows :

0000000 00 0c 01 02 03 04 05 06 07 08 06 00 7b 00 61 62
0000020 63 00

01 02 03 04 is the victim's IP. The next 4 bytes contains a probably faked source IP. The next byte highlighted in red indicates the number of packets to generate. The next 2 bytes is the victim's port. The next byte indicates whether to use the IP 01 02 03 04 or the hostname 61 62 63 00 ("abc") as the victim.

The response for Function 11 is as follows. It indicates an attack on a single DNS server.

18:25:59.030630 > 5.6.7.8.ntp > 1.2.3.4.domain: 62485+ SOA? com. (21)
             4500 0031 6700 0000 cc11 77a8 0506 0708
             0102 0304 007b 0035 001d 0000 f415 0100
             0001 0000 0000 0000 0363 6f6d 0000 0600
             01

18:25:59.030630 > 5.6.7.8.ntp > 1.2.3.4.domain: 41640+ SOA? net. (21)
             4500 0031 9f00 0000 f311 18a8 0506 0708
             0102 0304 007b 0035 001d 0000 a2a8 0100
             0001 0000 0000 0000 036e 6574 0000 0600
             01

18:25:59.030630 > 5.6.7.8.ntp > 1.2.3.4.domain: 49902+ (20)
             4500 0030 2900 0000 c411 bda9 0506 0708
             0102 0304 007b 0035 001c 0000 c2ee 0100
             0001 0000 0000 0000 0364 6500 0006 0001

7. A bug

While exploring Function 2, we stumbled upon the following interesting occurance. The picture below shows a captured response packet from the the binary after executing Function 2. By right, all the data should be encoded, so no plain-text should be seen. However, the picture clearly shows that some plain-text has appeared in the "encoded" packet.

After much study of the source code, we finally figured out the reason for the above anamoly. It has to do with the way the binary uses its buffers in the encoding process.

There are 3 buffers declared in the binary, contiguous in memory, related to this function:

char bufEncode[400];

char bufClear[512];

char bufTemp[2048];

When a shell command is executed, the output is first piped to a file in the /tmp directory. Then the contents of this file are read and stored in bufTemp, but in blocks of 0x18E = 398 bytes. The 398 bytes of data are then transferred into the payload buffer, bufClear, prepended with 2 bytes of control data to form a 400 bytes payload. The payload is then encoded and stored in bufEncode of size 400 bytes. The contents of bufEncode are then transmitted back to the master.

So far so good. However, the binary tries to be cute. Instead of transmitting a fixed size payload of 400 bytes everytime, it tries to make the payload size random, by adding a random number of 0 to 200 to the size of bufEncode. As a result, the total number of bytes transmitted each time can vary between 400 and 600 bytes. Extending in the size of bufEncode to greater than 400 bytes results in additional data from bufClear, which is in clear text, to be appended to the payload and transmitted as shown in the diagram above.

8. Conclusion and comments

The binary in question is a DDoS daemon that shares much similarities with other DDoS tools like TFN [4], espcially in terms of control architecture and features. It is controlled by a master that communicates with it via IP protocol 11 packets. It is capable of carrying out a variety of DDoS attacks, and has ample housekeeping functions.

The only interesting feature is the network encoder. But it turns out to be a simple procedure that can be easily reverse engineered. That was why we were able to craft encoded packets that we used to trigger the binary and observe its network behaviour extensively.

One particularly puzzling "feature" is the choice of IP protocol 11 (0xb). This is such an uncommon IP protocol. We seriously suspect that the usefulness of the DDoS system will be severely limited, because whatever packets sent by either the master or the slave, firewalls will block it, routers will drop it, and IDSes will pick it out like a spotlight.

Because of this, we are inclined to think that this is a "lab" code, used for experimentation in a lab. The choice of protocol seems more effective for containment, not general use.

9. References

[1] IDA Pro Disassembler, http://www.datarescue.com/idabase
[2] Windows Emulator for Unix, http://www.winehq.com
[3] LibNet, http://libnet.sourceforge.net
[4] "The 'Tribe Flood Network' distributed denial of service attack tool", http://staff.washington.edu/dittrich/misc/tfn.analysis, Dave Dittrich, Oct 1999.
[5] Unix Programming FAQ, http://www.erlenstar.demon.co.uk/unix/faq_toc.html
[6] Internet Protocol Numbers, http://www.iana.org/assignments/protocol-numbers
[7] RFC741 - Specifications For The Network Voice Protocol, http://www.faqs.org/rfcs/rfc741.html
[8] CERT Incident Note IN-2000-04, "Denial of Service Attacks using Nameservers", http://www.cert.org/incident_notes/IN-2000-04.html
[9] CERT Advisory CA-1996-21, "TCP SYN Flooding and IP Spoofing Attacks", http://www.cert.org/advisories/CA-1996-21.html

1. Introduction

2. Tools of the trade

2.1 IDA Pro

2.2 LibNet

3. First Looks

4. Execution in a Controlled Environment

Here are the steps to become a daemon:

`fork()` so the parent can exit, this returns control to the command line or shell invoking your program. This step is required so that the new process is guaranteed not to be a process group leader. The next step, `setsid()`, fails if you're a process group leader.

`setsid()` to become a process group and session group leader. Since a controlling terminal is associated with a session, and this new session has not yet acquired a controlling terminal our process now has no controlling terminal, which is a Good Thing for daemons.

`fork()` again so the parent, (the session group leader), can exit. This means that we, as a non-session group leader, can never regain a controlling terminal.

`umask(0)` so that we have complete control over the permissions of anything we write. We don't know what umask we may have inherited. [This step is optional]

5. Binary Analysis

6. Network Analysis

6.1 Getting interactive with the binary

6.2 Network behaviour of the binary

7. A bug

8. Conclusion and comments

9. References

1. Introduction

2. Tools of the trade

2.1 IDA Pro

2.2 LibNet

3. First Looks

4. Execution in a Controlled Environment

Here are the steps to become a daemon:

fork() so the parent can exit, this returns control to the command line or shell invoking your program. This step is required so that the new process is guaranteed not to be a process group leader. The next step, setsid(), fails if you're a process group leader.

setsid() to become a process group and session group leader. Since a controlling terminal is associated with a session, and this new session has not yet acquired a controlling terminal our process now has no controlling terminal, which is a Good Thing for daemons.

fork() again so the parent, (the session group leader), can exit. This means that we, as a non-session group leader, can never regain a controlling terminal.

umask(0) so that we have complete control over the permissions of anything we write. We don't know what umask we may have inherited. [This step is optional]

5. Binary Analysis

6. Network Analysis

6.1 Getting interactive with the binary

6.2 Network behaviour of the binary

7. A bug

8. Conclusion and comments

9. References

`fork()` so the parent can exit, this returns control to the command line or shell invoking your program. This step is required so that the new process is guaranteed not to be a process group leader. The next step, `setsid()`, fails if you're a process group leader.

`setsid()` to become a process group and session group leader. Since a controlling terminal is associated with a session, and this new session has not yet acquired a controlling terminal our process now has no controlling terminal, which is a Good Thing for daemons.

`fork()` again so the parent, (the session group leader), can exit. This means that we, as a non-session group leader, can never regain a controlling terminal.

`umask(0)` so that we have complete control over the permissions of anything we write. We don't know what umask we may have inherited. [This step is optional]