Analysis

Tools

Tools I used: Tools I wrote:

Procedure

The first step was to try and determine what operating system the binary was designed to be executed on. I dropped the binary onto a Linux box and ran the 'file' command on it for some hints.
[ds@localhost tmp]$ file the-binary
the-binary: ELF 32-bit LSB executable, Intel 80386, version 1, statically linked, stripped
It's an ELF binary for some operating system that runs on i386. Unfortunately the binary is statically linked and stripped. That means it's going to be rather difficult to disassemble. Let's look at the 'strings' output.

Here is the interesting string output that isn't from the libraries:

[mingetty]
/tmp/.hj237349
/bin/csh -f -c "%s" 1> %s 2>&1
TfOjG
/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/bin/:.
PATH
HISTFILE
linux
TERM
/bin/sh
/bin/csh -f -c "%s"
%d.%d.%d.%d
%u.%u.%u.%u
%c%s

And here is some interesting string output that is from the libraries:

@(#) The Linux C library 5.3.12

Ok, now we know that this binary is supposed to be run on Linux. From the other strings we can guess that the binary opens a file in the /tmp directory at some time, and that it likely runs some shell commands. The '[mingetty]' looks like the kind of thing that a malicious hacking program might want to overwrite argv[0] with.

Ok, let's run it and see what it does!

[ds@localhost tmp]$ strace -ff ./the-binary
execve("./the-binary", ["./the-binary"], [/* 31 vars */]) = 0
personality(0 /* PER_??? */)            = 0
geteuid()                               = 500
_exit(-1)                               = ?
[ds@localhost tmp]$
Wow, that's pretty boring. I guess it wants me to run it as root. Feeling reckless? The nice honeynet folks wouldn't actually send me a binary to reverse engineer that destroys my computer would they?

(Just kidding, I actually ran this in a VMWare virtual machine that is running in nonpersistant mode. The binary can eat all my files and all I have to do is reboot)

[ds@localhost tmp]$ su
Password:
[root@localhost tmp]# strace -ff ./the-binary
execve("./the-binary", ["./the-binary"], [/* 29 vars */]) = 0
personality(0 /* PER_??? */)            = 0
geteuid()                               = 0
sigaction(SIGCHLD, {SIG_IGN}, {SIG_DFL}, 0x40063848) = 0
fork()                                  = 13295
[pid 13295] setsid()                    = 13295
[pid 13295] sigaction(SIGCHLD, {SIG_IGN}, {SIG_IGN}, 0x80575a8) = 0
[pid 13295] fork()                      = 13296
[pid 13294] _exit(0)                    = ?
[pid 13295] _exit(0)                    = ?
chdir("/")                              = 0
close(0)                                = 0
close(1)                                = 0
close(2)                                = 0
time(NULL)                              = 1022881147
socket(PF_INET, SOCK_RAW, 0xb /* IPPROTO_??? */) = 0
sigaction(SIGHUP, {SIG_IGN}, {SIG_DFL}, 0x40063848) = 0
sigaction(SIGTERM, {SIG_IGN}, {SIG_DFL}, 0x40063848) = 0
sigaction(SIGCHLD, {SIG_IGN}, {SIG_IGN}, 0x80575a8) = 0
sigaction(SIGCHLD, {SIG_IGN}, {SIG_IGN}, 0x80575a8) = 0
recv(0,
 
Ok, so now we know that the binary:

  1. ensures that it is running as root
  2. forks a couple of times
  3. changes to the root directory
  4. ignores some signals
  5. closes the standard file descriptors
  6. finds out what time it is
  7. opens a raw socket on a strange protocol ignores some more signals
  8. ignores some more signals
  9. Waits to receive a packet from the raw socket.

This is about as far as we're going to get without actually disassembling the binary to see what it does. Unless we can solve the problem of the binary not having any symbols, we're going to spend a year disassembling half of libc.

Why don't we try putting the symbols back into the binary?

How are we going to do that?

unstrip

The static version of libc (libc.a) is an archived collection of object files. My approach was to write a utility that would scan through a directory of object files from an unpacked static libc and for each object file do the following:

Now in order for this to work very well we need to locate the exact original static libc file that the binary was built with. Remember that there was a pretty bit clue in the strings output:

@(#) The Linux C library 5.3.12
I decided to go looking for Redhat RPMs containing a static version of libc 5.3.12. I choose Redhat for no other reason than it's a fairly popular linux distribution. I was surprised to find that the most recent version of Redhat that shipped with an rpm with a static version of this library was 5.2. I grabbed the static libc from version 5.2 and 5.0. I used these versions to develop the unstrip utility and after I had it matching object files that the older version produced more matches than the newer version. Very curious, I decided to investigate 4.x versions. I could only find updates on the internet, not the original RPMs so I tried those and found that an update to 4.0 was the best match that I could find. Actual redhat distributions older than 5.x are impossible to find on the internet, so I hunted down a CD version of Redhat 4.0 which I was lucky enough to stumble across. The libc.a from the RPM that shipped with Redhat 4.0 fit like a glove!. I am fairly certain that the binary was built on this platform. Using unstrip with this binary resolved the symbols for all of the library functions that the binary used. There were a couple of misidentifications that were easily corrected.

Results

Once I had a binary with symbols, I fed it into IDA and spent a week or so translating it into pseudo C code which I wrote in the comment field of the disassembly. From this I derived a client that supports most of the functions of the binary. And wrote server code that is a rough approximation of the original source code. I didn't have enough time to finish the server or the client. The functionality that exists in the client works with the original binary though. I didn't have time to write detailed comments in the server code either as I'd wanted to. Regardless, the results of the analysis are a recovered C version of something resembling the original source code.

A detailed description of the protocol for communication between the client and the server can be found in the advisory.