The Reverse Challenge

Analysis


There are a couple of quick things we can look at when we get an unknown binary. First, let's see what we can learn about the executable itself. We will do this with the file command.
$ file the-binary
the-binary: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, stripped
$ 
OK, so the program doen't need any external libraries to run and all names have been stripped from the binary. We expected this, but it will make it harder to seperate the library functions from the custum functions and to identify the library functions used.

Next, lets see what else is in the binary. We will use the strings command to extract any readable strings in the executable.

$ strings -5 the-binary
.
.
.
Z,)J4
C,9C0t
 WVSj
u j@j
[mingetty]
/tmp/.hj237349
/bin/csh -f -c "%s" 1> %s 2>&1
TfOjG
/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/bin/:.
HISTFILE
linux
/bin/sh
/bin/csh -f -c "%s" 
%d.%d.%d.%d
%u.%u.%u.%u
gethostby*.getanswer: asked for "%s", got "%s"
RESOLV_HOST_CONF
/etc/host.conf
order
resolv+: %s: "%s" command incorrectly formatted.
hosts
resolv+: "%s" is an invalid keyword
resolv+: valid keywords are: %s, %s and %s
resolv+: search order not specified or unrecognized keyword, host resolution will fail.
.
.
.
@(#) The Linux C library 5.3.12
.
.
.
I have extracted the certain sections for discussion. For reference, the entire output as strings_output.txt in the additional files. In the main part of the strings extracted, starting halfway down we see "gethostby*...", "/etc/host.conf" and several lines starting "resolv+:". We are clearly getting into strings from the library here. The "[mingetty]" is interesting since it doesn't seem like something that should be in libc. Indeed, on my RedHad 7.2 machine strings /usr/lib/libc.a | grep 'mingetty' returns no matches. It appears, then, that at least a few of the strings at the top of the list are from the programs author. We note that this includes format strings getting a shell to execute a command.

One additional string stands out from the list: "@(#) The Linux C library 5.3.12". This is an old version of the old libc library (as opposed to a version of the newer glibc.) This fact will come up again.

What fun is a program without running it? To set up a safe environment for testing, a vmware virtual machine was set up and loaded with Redhat 7.3. I was configured for host only networking, which meant that the network was entirely simulated in the real machine. As a further precaution, the real machine was configured with IP forwarding turned off.

To help monitor what the programs does, we will run it using strace to record all of the system calls it makes. We will use one flag, -f, to follow forks.

$ strace -f ./the-binary
execve("./the-binary", ["./the-binary", "2"], [/* 19 vars */]) = 0
personality(0 /* PER_??? */)            = 0
geteuid()                               = 500
_exit(-1)                               = ?
Well, it doesn't do much except check our effective userid. It must not like the fact that we are not root so...
# strace -f ./the-binary
execve("./the-binary", ["./the-binary"], [/* 18 vars */]) = 0
personality(0 /* PER_??? */)            = 0
geteuid()                               = 0
sigaction(SIGCHLD, {SIG_IGN}, {SIG_DFL}, 0x42029098) = 0
fork()                                  = 1567
[pid  1566] _exit(0)                    = ?
[pid  1567] setsid()                    = 1567
sigaction(SIGCHLD, {SIG_IGN}, {SIG_IGN}, 0x80575a8) = 0
fork()                                  = 1568
[pid  1567] _exit(0)                    = ?
chdir("/")                              = 0
close(0)                                = 0
close(1)                                = 0
close(2)                                = 0
time(NULL)                              = 1022595528
socket(PF_INET, SOCK_RAW, 0xb /* IPPROTO_??? */) = 0
sigaction(SIGHUP, {SIG_IGN}, {SIG_DFL}, 0x42029098) = 0
sigaction(SIGTERM, {SIG_IGN}, {SIG_DFL}, 0x42029098) = 0
sigaction(SIGCHLD, {SIG_IGN}, {SIG_IGN}, 0x80575a8) = 0
sigaction(SIGCHLD, {SIG_IGN}, {SIG_IGN}, 0x80575a8) = 0
recv(0
Ah, that's better. Let's see what is going on here. It starts off with two forks, a setsid, chdir "/" and a bunch of closes. That is pretty much standard daemonize functionality and so is just initializatoin. The socket call is the really interesting part. It opens a raw IP socket for reading IP protocal 11. According to the IANA that is the PUP protocol, which is probably not what this program is looking for. After that it just sits in a recv looking for packets.

To take a look at the processes lets fire up another terminal. The processes forked all have pids of 156x so we will just look for those.

# ps aux | grep 156
root      1565  0.0  0.3  1536  472 pts/0    S    09:18   0:00 strace -f ./the-b
root      1568  0.0  0.0   240   44 ?        S    09:18   0:00 [mingetty]  
# kill -9 1568
#
Here we see our original strace, the name of the executable having been truncated. Process 1568 shows its process name as "[mingetty]". Referring back to the strace output, the last fork returned 1568, meaning the the process has that as its pid. Our binary has changed its name to "[mingetty]"! Now we know whay that appeared in the strings output.

Unless we want to try sending random packets, we need to look at the binary again. First we will see if there is anything to learn from the ELF headers.

$ objdump -x the-binary 

the-binary:     file format elf32-i386
the-binary
architecture: i386, flags 0x00000102:
EXEC_P, D_PAGED
start address 0x08048090

Program Header:
    LOAD off    0x00000000 vaddr 0x08048000 paddr 0x08048000 align 2**12
         filesz 0x00024222 memsz 0x00024222 flags r-x
    LOAD off    0x00024228 vaddr 0x0806d228 paddr 0x0806d228 align 2**12
         filesz 0x0000c094 memsz 0x00011970 flags rw-

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .init         00000008  08048080  08048080  00000080  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  1 .text         0001f53c  08048090  08048090  00000090  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  2 __libc_subinit 00000004  080675cc  080675cc  0001f5cc  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  3 .fini         00000008  080675d0  080675d0  0001f5d0  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  4 .rodata       00004c4a  080675d8  080675d8  0001f5d8  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  5 .data         0000c084  0806d228  0806d228  00024228  2**2
                  CONTENTS, ALLOC, LOAD, DATA
  6 .ctors        00000008  080792ac  080792ac  000302ac  2**2
                  CONTENTS, ALLOC, LOAD, DATA
  7 .dtors        00000008  080792b4  080792b4  000302b4  2**2
                  CONTENTS, ALLOC, LOAD, DATA
  8 .bss          000058dc  080792bc  080792bc  000302bc  2**2
                  ALLOC
  9 .note         00000d5c  00000000  00000000  000302bc  2**0
                  CONTENTS, READONLY
 10 .comment      00000ea6  00000000  00000000  00031018  2**0
                  CONTENTS, READONLY
objdump: the-binary: no symbols
$
There are a number of things we learn from this. First the starting address of execution is at 0x08048090. That may come in handy when we start looking at the disassembled code. The initialisation and finalization sections are small and probably uninteresting. The sections we really care about are:
.text
mem: 0x8048090 - 0x80675cc executable code
.rodata
mem: 0x80675d8 - 0x806c221 constant strings (why not 0x806d227 ?)
.data
mem: 0x806d228 - 0x80792ab other initialized data
.note
not loaded into memory but may contain something interesting.
.comment
not loaded into memory but may contain something interesting.
The text section is the meat of the program so we will save that one for last. If we look at the .rodata section we should find the strings we saw earlier.
$ objdump -j .rodata --full-contents the-binary  | head -60  

the-binary:     file format elf32-i386

Contents of section .rodata:
 80675d8 5b6d696e 67657474 795d002f 00002f74  [mingetty]./../t
 80675e8 6d702f2e 686a3233 37333439 002f6269  mp/.hj237349./bi
 80675f8 6e2f6373 68202d66 202d6320 22257322  n/csh -f -c "%s"
 8067608 20313e20 25732032 3e263100 72620054   1> %s 2>&1.rb.T
 8067618 664f6a47 00fffb01 002f7362 696e3a2f  fOjG...../sbin:/
 8067628 62696e3a 2f757372 2f736269 6e3a2f75  bin:/usr/sbin:/u
 8067638 73722f62 696e3a2f 7573722f 6c6f6361  sr/bin:/usr/loca
 8067648 6c2f6269 6e2f3a2e 00504154 48004849  l/bin/:..PATH.HI
 8067658 53544649 4c45006c 696e7578 00544552  STFILE.linux.TER
 8067668 4d007368 002f6269 6e2f7368 002f6269  M.sh./bin/sh./bi
 8067678 6e2f6373 68202d66 202d6320 22257322  n/csh -f -c "%s"
 8067688 20002564 2e25642e 25642e25 64008d36   .%d.%d.%d.%d..6
 8067698 15000000 15000000 14000000 15000000  ................
 80676a8 15000000 19000000 14000000 14000000  ................
 80676b8 14000000 476e0100 00010000 00000000  ....Gn..........
 80676c8 03636f6d 00000600 01000000 00000000  .com............
 80676d8 00000000 00000000 00000000 00000000  ................
 80676e8 00000000 0000476e 01000001 00000000  ......Gn........
 80676f8 0000036e 65740000 06000100 00000000  ...net..........
 8067708 00000000 00000000 00000000 00000000  ................
 8067718 00000000 00000000 476e0100 00010000  ........Gn......
 8067728 00000000 03646500 00060001 00000000  .....de.........
 8067738 00000000 00000000 00000000 00000000  ................
 8067748 00000000 00000000 0000476e 01000001  ..........Gn....
 8067758 00000000 00000365 64750000 06000100  .......edu......
 8067768 00000000 00000000 00000000 00000000  ................
 8067778 00000000 00000000 00000000 476e0100  ............Gn..
 8067788 00010000 00000000 036f7267 00000600  .........org....
 8067798 01000000 00000000 00000000 00000000  ................
 80677a8 00000000 00000000 00000000 0000476e  ..............Gn
 80677b8 01000001 00000000 00000375 73630365  ...........usc.e
 80677c8 64750000 06000100 00000000 00000000  du..............
 80677d8 00000000 00000000 00000000 00000000  ................
 80677e8 476e0100 00010000 00000000 03657300  Gn...........es.
 80677f8 00060001 00000000 00000000 00000000  ................
 8067808 00000000 00000000 00000000 00000000  ................
 8067818 0000476e 01000001 00000000 00000367  ..Gn...........g
 8067828 72000006 00010000 00000000 00000000  r...............
 8067838 00000000 00000000 00000000 00000000  ................
 8067848 00000000 476e0100 00010000 00000000  ....Gn..........
 8067858 03696500 00060001 00000000 00000000  .ie.............
 8067868 00000000 00000000 00000000 00000000  ................
 8067878 00000000 00000000 00000000 00000000  ................
 8067888 00000000 00000000 00000000 00000000  ................
 8067898 00000000 00000000 00000000 00000000  ................
 80678a8 00000000 00000000 25752e25 752e2575  ........%u.%u.%u
 80678b8 2e257500 25630025 63257300 67657468  .%u.%c.%c%s.geth
 80678c8 6f737462 792a2e67 6574616e 73776572  ostby*.getanswer
 80678d8 3a206173 6b656420 666f7220 22257322  : asked for "%s"
 80678e8 2c20676f 74202225 73220052 45534f4c  , got "%s".RESOL
 80678f8 565f484f 53545f43 4f4e4600 2f657463  V_HOST_CONF./etc
 8067908 2f686f73 742e636f 6e660072 006f7264  /host.conf.r.ord
 8067918 65720020 09007265 736f6c76 2b3a2025  er. ..resolv+: %
 8067928 733a2022 25732220 636f6d6d 616e6420  s: "%s" command 
 8067938 696e636f 72726563 746c7920 666f726d  incorrectly form
 8067948 61747465 642e0a00 202c3b3a 0062696e  atted... ,;:.bin
$
And, indeed, we do see those strings. In addition, there is a section in the middle that contains a number of top level domain names. The program may be doing so DNS stuff. The rest of this section looks like strings from libc and probably not very interesting, but if we see an address pointing into here we can check it out.

Doing the same thing to the .data section does give us much. It looks like there might be some regularity in the bytes, but it doesn't give us much to go on. The .note section is rather devoid of information, so we move on the .comment section.

$ objdump -j .comment --full-contents the-binary   | head -20

the-binary:     file format elf32-i386

Contents of section .comment:
 0000 00474343 3a202847 4e552920 322e372e  .GCC: (GNU) 2.7.
 0010 322e6c2e 32000047 43433a20 28474e55  2.l.2..GCC: (GNU
 0020 2920322e 372e3200 00474343 3a202847  ) 2.7.2..GCC: (G
 0030 4e552920 322e372e 322e6c2e 32000047  NU) 2.7.2.l.2..G
 0040 43433a20 28474e55 2920322e 372e322e  CC: (GNU) 2.7.2.
 0050 6c2e3200 00474343 3a202847 4e552920  l.2..GCC: (GNU) 
 0060 322e372e 322e6c2e 32000047 43433a20  2.7.2.l.2..GCC: 
 0070 28474e55 2920322e 372e322e 6c2e3200  (GNU) 2.7.2.l.2.
 0080 00474343 3a202847 4e552920 322e372e  .GCC: (GNU) 2.7.
 0090 322e6c2e 32000047 43433a20 28474e55  2.l.2..GCC: (GNU
 00a0 2920322e 372e322e 6c2e3200 00474343  ) 2.7.2.l.2..GCC
 00b0 3a202847 4e552920 322e372e 322e6c2e  : (GNU) 2.7.2.l.
 00c0 32000047 43433a20 28474e55 2920322e  2..GCC: (GNU) 2.
 00d0 372e322e 6c2e3200 00474343 3a202847  7.2.l.2..GCC: (G
 00e0 4e552920 322e372e 322e6c2e 32000047  NU) 2.7.2.l.2..G
 00f0 43433a20 28474e55 2920322e 372e322e  CC: (GNU) 2.7.2.
$
Here we see information that the compiler has tucked away in the object files. Namely all of the code was compiled with version 2.7.2 or 2.7.2.1.2 of gcc. It seems likely that the main program was compiled with gcc 2.7.2 and libc with 2.7.2.1.2. Note that in these listing I have abbreviated the outout to save space. The contents of the entire section look like the first 20 lines that were printed.

So much for preliminaries, on to the main event. We can easily get a disassembly listing of the .text section with objdump -j .text -d the-binary. We will do that and save the output as tbo ("the-binary output" -- keep the typing to a minumum.) This gives us a single file of 43641 lines. That alot and we are going to need to find a way to deal with that. First of all, we know that it is composed of may functions. If we search through the disassembly listing looking for the calls we can then make a list of the addresses of all of the subroutines called. functionize does this and then takes a second pass through the file to prepend the string "function addr" before any line called as a subroutine. functionize can be found in the included files. We have now reduced our problem to figuring out roughly 460 functions.

Before we go any further lets take care of one other tool. There two things we want this tool to do. First when given an address extract the function at that address to standard out. That way we can easily grab a particular function. As we identify functions we will be keeping track which address corresponds to which function (name). So when we extract a function, we want to replace all calls to an address with calls to the corresponding function name. This will simplify our analysis of functions that call other functions we have already identified. The perl script extract (see included files) does this using a file call "addrs" to hold the mapping of addresses to subroutine names.

As we saw earlier, the binary uses an old version off the standard library, namely 5.3.12. To make it easier to identify the functions from the library, I went looking for old distributions. I found a Slackware 3.5.0 distribution from 1998 that I installed in another vmware virtual machine. It proved to contain libc version 5.4.44. This is newer than what was used in the-binary but might be close enough.

The simplest functions to identify should be the system calls. These are implemented by putting a function code into the eax register and genereating an interupt 0x80. The syscall numbers are defined in the linux source and available to the library in /usr/include/asm/unistd.h. Looking at our tbo, the first int calls we see are in function 8048090, this is the entry point for the program and has two different syscalls, so let's come back to it later. The next int $0x90 show up int function 80569fc.

function 80569fc
 80569fc:       55                      push   %ebp
 80569fd:       89 e5                   mov    %esp,%ebp
 80569ff:       56                      push   %esi
 8056a00:       53                      push   %ebx
 8056a01:       8b 5d 08                mov    0x8(%ebp),%ebx
 8056a04:       8b 4d 0c                mov    0xc(%ebp),%ecx
 8056a07:       8b 55 10                mov    0x10(%ebp),%edx
 8056a0a:       b8 72 00 00 00          mov    $0x72,%eax
 8056a0f:       31 f6                   xor    %esi,%esi
 8056a11:       cd 80                   int    $0x80
 8056a13:       85 c0                   test   %eax,%eax
 8056a15:       7d 0c                   jge    0x8056a23
 8056a17:       f7 d8                   neg    %eax
 8056a19:       a3 14 8b 07 08          mov    %eax,0x8078b14
 8056a1e:       b8 ff ff ff ff          mov    $0xffffffff,%eax
 8056a23:       8d 65 f8                lea    0xfffffff8(%ebp),%esp
 8056a26:       5b                      pop    %ebx
 8056a27:       5e                      pop    %esi
 8056a28:       89 ec                   mov    %ebp,%esp
 8056a2a:       5d                      pop    %ebp
 8056a2b:       c3                      ret    
At 8056a0a, the eax register is loaded with 0x72 = 114. asm/unistd.h #defines __NR_wait4 114. 805601-07 load 3 paramaters from the stack into ebx, ecx, and edx, but the xor in 8056a0f puts zero in esi, the 4 argument to the wait4 system call. What we are seeing here, then is not wait4() but a very similar call. Checking the man pages we see wait3() only has 3 parameters but the one it is missing is the first one. There is a reference to waitpid() whose prototype looks just like wait4 except it doesn't have the last pointer. We therefore conclude that function 80569fc is waitpid() and "8569fc,waitpid" to our address file.

While the identification is complete it is instructive to look at rest of the function. If the return code is negative -- that's how the kernel indicates an error -- then the positive value is stored in location 8078b14. That location will likely be errno.

We can thus go through all of tbo looking for other syscalls. Some of the functions are more complicated that wait3 and sometimes multiple functions will use the same syscall. One important instance of this is syscall 0x66, __NR_socketcall. We find almost a dozen different functions all using that value. Apparently all functions involving sockets go through that one syscall. Looking at the disassembly we do see that the edx register is loaded with a different small integer for eash function. That should help us determine which function is which.

This first such function we find is 8056a2c, which loads 5 into the eax register. If we do an objdump -d /usr/bin/libc.a we can search the ouput a function that 1. has int $0x80 2. has mov $0x66,eax and 3. has mov $0x5,%eax. What we find is the following function:

00000054 <accept>      
  54:   55              pushl  %ebp
  55:   89 e5           movl   %esp,%ebp
  57:   83 ec 0c        subl   $0xc,%esp
  5a:   56              pushl  %esi
  5b:   53              pushl  %ebx
  5c:   8b 55 0c        movl   0xc(%ebp),%edx
  5f:   8b 4d 10        movl   0x10(%ebp),%ecx
  62:   8b 45 08        movl   0x8(%ebp),%eax
  65:   89 45 f4        movl   %eax,0xfffffff4(%ebp)
  68:   89 55 f8        movl   %edx,0xfffffff8(%ebp)
  6b:   89 4d fc        movl   %ecx,0xfffffffc(%ebp)
  6e:   ba 05 00 00 00  movl   $0x5,%edx
  73:   8d 4d f4        leal   0xfffffff4(%ebp),%ecx
  76:   b8 66 00 00 00  movl   $0x66,%eax
  7b:   89 d3           movl   %edx,%ebx
  7d:   cd 80           int    $0x80
  7f:   89 c6           movl   %eax,%esi
  81:   85 f6           testl  %esi,%esi
  83:   7c 07           jl     8c <accept+0x38>
  85:   89 f2           movl   %esi,%edx
  87:   eb 11           jmp    9a <accept+0x46>
  89:   8d 76 00        leal   0x0(%esi),%esi
  8c:   e8 fc ff ff ff  call   8d <accept+0x39>
  91:   f7 de           negl   %esi
  93:   89 30           movl   %esi,(%eax)
  95:   ba ff ff ff ff  movl   $0xffffffff,%edx
  9a:   89 d0           movl   %edx,%eax
  9c:   8d 65 ec        leal   0xffffffec(%ebp),%esp
  9f:   5b              popl   %ebx
  a0:   5e              popl   %esi
  a1:   89 ec           movl   %ebp,%esp
  a3:   5d              popl   %ebp
  a4:   c3              ret    
This has the same structure and constants as our function 6056a2c, which we then conclude is accept, and add that to our addrs list.

Now that we have a start on the library lets tackle the main program. To help us determine where it is, we will compile and look at a test program on our old slackware system.

$ cat hello.c 
#include <stdio.h>

main()
{
   printf("Hello, world.\n");
}
$ gcc -o hello --static hello.c
$ objdump -d -j .text hello | head -51

hello:     file format elf32-i386

Disassembly of section .text:

08048090 <_start>
 8048090:       59              popl   %ecx
 8048091:       89 e3           movl   %esp,%ebx
 8048093:       89 e0           movl   %esp,%eax
 8048095:       83 e4 f8        andl   $0xfffffff8,%esp
 8048098:       89 ca           movl   %ecx,%edx
 804809a:       01 d2           addl   %edx,%edx
 804809c:       01 d2           addl   %edx,%edx
 804809e:       01 d0           addl   %edx,%eax
 80480a0:       83 c0 04        addl   $0x4,%eax
 80480a3:       31 ed           xorl   %ebp,%ebp
 80480a5:       55              pushl  %ebp
 80480a6:       55              pushl  %ebp
 80480a7:       55              pushl  %ebp
 80480a8:       89 e5           movl   %esp,%ebp
 80480aa:       50              pushl  %eax
 80480ab:       53              pushl  %ebx
 80480ac:       51              pushl  %ecx
 80480ad:       b8 88 00 00 00  movl   $0x88,%eax
 80480b2:       bb 00 00 00 00  movl   $0x0,%ebx
 80480b7:       cd 80           int    $0x80
 80480b9:       8b 44 24 08     movl   0x8(%esp,1),%eax
 80480bd:       a3 84 e3 05 08  movl   %eax,0x805e384
 80480c2:       0f b7 05 40 e6  movzwl 0x805e640,%eax
 80480c7:       05 08 
 80480c9:       50              pushl  %eax
 80480ca:       e8 79 62 00 00  call   804e348 <__setfpucw>
 80480cf:       83 c4 04        addl   $0x4,%esp
 80480d2:       e8 fd 60 00 00  call   804e1d4 <__libc_init>
 80480d7:       68 c0 93 05 08  pushl  $0x80593c0
 80480dc:       e8 bf 5e 00 00  call   804dfa0 <atexit>
 80480e1:       83 c4 04        addl   $0x4,%esp
 80480e4:       e8 97 ff ff ff  call   8048080 <_init>
 80480e9:       e8 8e 00 00 00  call   804817c <main>
 80480ee:       50              pushl  %eax
 80480ef:       e8 60 5f 00 00  call   804e054 <exit>
 80480f4:       5b              popl   %ebx
 80480f5:       8d 74 26 00     leal   0x0(%esi,1),%esi
 80480f9:       8d bc 27 00 00  leal   0x0(%edi,1),%edi
 80480fe:       00 00 

08048100 <done>
 8048100:       b8 01 00 00 00  movl   $0x1,%eax
 8048105:       cd 80           int    $0x80
 8048107:       eb f7           jmp    8048100 <done>
 8048109:       8d b4 26 00 00  leal   0x0(%esi,1),%esi
 804810e:       00 00 

$
Here, the function at the entry point does some initialization and then calls our main(). Therefore if we look at the same segment of the binary is should locate main() for us.
$ ./extract 8048090
function 8048090
 8048090:       59                      pop    %ecx
 8048091:       89 e3                   mov    %esp,%ebx
 8048093:       89 e0                   mov    %esp,%eax
 8048095:       89 ca                   mov    %ecx,%edx
 8048097:       01 d2                   add    %edx,%edx
 8048099:       01 d2                   add    %edx,%edx
 804809b:       01 d0                   add    %edx,%eax
 804809d:       83 c0 04                add    $0x4,%eax
 80480a0:       31 ed                   xor    %ebp,%ebp
 80480a2:       55                      push   %ebp
 80480a3:       55                      push   %ebp
 80480a4:       55                      push   %ebp
 80480a5:       89 e5                   mov    %esp,%ebp
 80480a7:       50                      push   %eax
 80480a8:       53                      push   %ebx
 80480a9:       51                      push   %ecx
 80480aa:       b8 88 00 00 00          mov    $0x88,%eax
 80480af:       bb 00 00 00 00          mov    $0x0,%ebx
 80480b4:       cd 80                   int    $0x80
 80480b6:       8b 44 24 08             mov    0x8(%esp,1),%eax
 80480ba:       a3 28 d2 06 08          mov    %eax,0x806d228
 80480bf:       0f b7 05 18 8b 07 08    movzwl 0x8078b18,%eax
 80480c6:       50                      push   %eax
 80480c7:       e8 a0 f4 00 00          call   0x805756c
 80480cc:       83 c4 04                add    $0x4,%esp
 80480cf:       e8 70 ec 00 00          call   0x8056d44
 80480d4:       68 d0 75 06 08          push   $0x80675d0
 80480d9:       e8 2a de 00 00          call   0x8055f08
 80480de:       83 c4 04                add    $0x4,%esp
 80480e1:       e8 9a ff ff ff          call   0x8048080
 80480e6:       e8 49 00 00 00          call   0x8048134
 80480eb:       50                      push   %eax
 80480ec:       e8 cb de 00 00          call   <exit>  (0x8055fbc)
 80480f1:       5b                      pop    %ebx
 80480f2:       8d b4 26 00 00 00 00    lea    0x0(%esi,1),%esi
 80480f9:       8d b4 26 00 00 00 00    lea    0x0(%esi,1),%esi
 8048100:       b8 01 00 00 00          mov    $0x1,%eax
 8048105:       cd 80                   int    $0x80
 8048107:       eb f7                   jmp    0x8048100
 8048109:       8d b4 26 00 00 00 00    lea    0x0(%esi,1),%esi
 8048110:       53                      push   %ebx
 8048111:       bb b8 92 07 08          mov    $0x80792b8,%ebx
 8048116:       83 3d b8 92 07 08 00    cmpl   $0x0,0x80792b8
 804811d:       74 0d                   je     0x804812c
 804811f:       90                      nop    
 8048120:       8b 03                   mov    (%ebx),%eax
 8048122:       ff d0                   call   *%eax
 8048124:       83 c3 04                add    $0x4,%ebx
 8048127:       83 3b 00                cmpl   $0x0,(%ebx)
 804812a:       75 f4                   jne    0x8048120
 804812c:       5b                      pop    %ebx
 804812d:       c3                      ret    
 804812e:       8d 36                   lea    (%esi),%esi
 8048130:       c3                      ret    
 8048131:       90                      nop    
 8048132:       90                      nop    
 8048133:       90                      nop    
$
Matching these two up, we find the main() is call immedialty before push %eax; call exit, which makes 0x8048134 our main function. Let's just start going through it. The next several (many) excerpts will be from ./extract 8048134.
function 8048134
 8048134:       55                      push   %ebp
 8048135:       89 e5                   mov    %esp,%ebp
 8048137:       81 ec f0 44 00 00       sub    $0x44f0,%esp
 804813d:       57                      push   %edi
 804813e:       56                      push   %esi
 804813f:       53                      push   %ebx
 8048140:       8b 5d 0c                mov    0xc(%ebp),%ebx
 8048143:       c7 85 40 bb ff ff 01    movl   $0x1,0xffffbb40(%ebp)
 804814a:       00 00 00 
 804814d:       8d 95 00 f8 ff ff       lea    0xfffff800(%ebp),%edx
 8048153:       89 95 30 bb ff ff       mov    %edx,0xffffbb30(%ebp)
 8048159:       8d 8d 14 f8 ff ff       lea    0xfffff814(%ebp),%ecx
 804815f:       89 8d 2c bb ff ff       mov    %ecx,0xffffbb2c(%ebp)
 8048165:       8d 95 16 f8 ff ff       lea    0xfffff816(%ebp),%edx
 804816b:       89 95 28 bb ff ff       mov    %edx,0xffffbb28(%ebp)
 8048171:       c7 85 3c bb ff ff 10    movl   $0x10,0xffffbb3c(%ebp)
 8048178:       00 00 00 
 804817b:       e8 8c f0 00 00          call   <geteuid>  (0x805720c)
 8048180:       85 c0                   test   %eax,%eax
 8048182:       74 08                   je     804818c
 8048184:       6a ff                   push   $0xffffffff
 8048186:       e8 31 de 00 00          call   <exit>  (0x8055fbc)
 804818b:       90                      nop    
 804818c:       8b 13                   mov    (%ebx),%edx
After the standard prolog several local (automatic) variables are initialized. For the sake of convenience we will refer to the variable by their offset from %exp, but dropping the leading "ffff". Thus at 8048137 variable bb40 is initialized to 1; moving down bb30 gets the address of f800. The following lines reference f814 and f815, which leads us to think the f800 might be a buffer and bb2c and bb28 are then initialized to point into that buffer. In C it might look something like this:
byte f800[###];
byte *bb30 = &f800[0];
byte *bb2c = &f800[0x14];
byte *bb28 = &f800[0x16];
Next up is the call to geteuid we saw in the strace output followed by an exit(-1) if it is not 0. Also not the instruction at 8048140. The save the second argument to main (char **argv) in ebx. This is used in the next section of code.

 804818e:       30 c0                   xor    %al,%al
 8048190:       89 d7                   mov    %edx,%edi
 8048192:       fc                      cld    
 8048193:       b9 ff ff ff ff          mov    $0xffffffff,%ecx
 8048198:       f2 ae                   repnz scas %es:(%edi),%al
 804819a:       89 c8                   mov    %ecx,%eax
 804819c:       f7 d0                   not    %eax
 804819e:       48                      dec    %eax
 804819f:       50                      push   %eax
 80481a0:       6a 00                   push   $0x0
 80481a2:       52                      push   %edx
 80481a3:       e8 bc f5 00 00          call   8057764
 80481a8:       8b 13                   mov    (%ebx),%edx
 80481aa:       a1 d8 75 06 08          mov    80675d8,%eax
 80481af:       89 02                   mov    %eax,(%edx)
 80481b1:       a1 dc 75 06 08          mov    80675dc,%eax
 80481b6:       89 42 04                mov    %eax,0x4(%edx)
 80481b9:       66 a1 e0 75 06 08       mov    80675e0,%ax
 80481bf:       66 89 42 08             mov    %ax,8(%edx)
 80481c3:       8a 05 e2 75 06 08       mov    80675e2,%al
 80481c9:       88 42 0a                mov    %al,0xa(%edx)
In this section the repnz scasscans the string pointed to by edi looking for 0 (al was set to 0 in the first line. What we end up with at the end is the length of the string in eax. edi is set from edx the start of argv and therefor is argv[0]. the call that immediately foolows then looks like function(argv[0], 0, strlen(argv[0]) which makes memset a good candidate for function 8057764. extracting that function confirms that is behaves like memset.

The last part references as series of integers starting at 80675d8. Checking the objdump -x output, that is at the beginning of the .rodata section. Our dump of that section reveals the string "[mingetty]". The mov commands transfers a total of 4+4+2+1 bytes which is the total length of the string including the terminating null. This then is where the process changes its name.

 80481cc:       6a 01                   push   $0x1
 80481ce:       6a 11                   push   $0x11
 80481d0:       e8 e7 e7 00 00          call   <signal>  (0x80569bc)
 80481d5:       e8 0e f0 00 00          call   <fork>  (0x80571e8)
 80481da:       83 c4 14                add    $0x14,%esp
 80481dd:       85 c0                   test   %eax,%eax
 80481df:       74 07                   je     80481e8
 80481e1:       6a 00                   push   $0x0
 80481e3:       e8 d4 dd 00 00          call   <exit>  (0x8055fbc)
 80481e8:       e8 4f f1 00 00          call   <setsid>  (0x805733c)
 80481ed:       6a 01                   push   $0x1
 80481ef:       6a 11                   push   $0x11
 80481f1:       e8 c6 e7 00 00          call   <signal>  (0x80569bc)
 80481f6:       e8 ed ef 00 00          call   <fork>  (0x80571e8)
 80481fb:       83 c4 08                add    $0x8,%esp
 80481fe:       85 c0                   test   %eax,%eax
 8048200:       74 0a                   je     804820c
 8048202:       6a 00                   push   $0x0
 8048204:       e8 b3 dd 00 00          call   <exit>  (0x8055fbc)
 8048209:       8d 76 00                lea    0x0(%esi),%esi
 804820c:       68 e3 75 06 08          push   $80675e3
 8048211:       e8 1e ef 00 00          call   <chdir>  (0x8057134)
 8048216:       6a 00                   push   $0x0
 8048218:       e8 43 ef 00 00          call   <close>  (0x8057160)
 804821d:       6a 01                   push   $0x1
 804821f:       e8 3c ef 00 00          call   <close>  (0x8057160)
 8048224:       6a 02                   push   $0x2
 8048226:       e8 35 ef 00 00          call   <close>  (0x8057160)
Looking at the system calls in this next section, this looks like the daemonization we saw from the strace. Refering to the .rodata section we find that 80675e3 (loc 804820c) is the address of the string "/", which is what we expected for chdir. (There are some tools like reap which automatically find strings like this for you).
 804822b:       c7 05 74 e7 07 08 00    movl   $0x0,807e774
 8048232:       00 00 00 
 8048235:       c7 05 70 e7 07 08 00    movl   $0x0,807e770
 804823c:       00 00 00 
 804823f:       c7 05 78 e7 07 08 00    movl   $0x0,807e778
 8048246:       00 00 00 
 8048249:       6a 00                   push   $0x0
 804824b:       e8 f4 f1 00 00          call   <time>  (0x8057444)
 8048250:       83 c4 14                add    $0x14,%esp
 8048253:       50                      push   %eax
 8048254:       e8 47 d7 00 00          call   80559a0
 8048259:       83 c4 04                add    $0x4,%esp
 804825c:       6a 0b                   push   $0xb
 804825e:       6a 03                   push   $0x3
 8048260:       6a 02                   push   $0x2
 8048262:       e8 8d ea 00 00          call   <socket>  (0x8056cf4)
 8048267:       89 85 38 bb ff ff       mov    %eax,0xffffbb38(%ebp)
 804826d:       6a 01                   push   $0x1
 804826f:       6a 01                   push   $0x1
 8048271:       e8 46 e7 00 00          call   <signal>  (0x80569bc)
 8048276:       6a 01                   push   $0x1
 8048278:       6a 0f                   push   $0xf
 804827a:       e8 3d e7 00 00          call   <signal>  (0x80569bc)
 804827f:       6a 01                   push   $0x1
 8048281:       6a 11                   push   $0x11
 8048283:       e8 34 e7 00 00          call   <signal>  (0x80569bc)
 8048288:       83 c4 24                add    $0x24,%esp
 804828b:       6a 01                   push   $0x1
 804828d:       6a 11                   push   $0x11
 804828f:       e8 28 e7 00 00          call   <signal>  (0x80569bc)
 8048294:       83 c4 08                add    $0x8,%esp
 8048297:       8d 8d 00 f0 ff ff       lea    0xfffff000(%ebp),%ecx
 804829d:       89 8d 20 bb ff ff       mov    %ecx,0xffffbb20(%ebp)
 80482a3:       8d 95 48 ee ff ff       lea    0xffffee48(%ebp),%edx
 80482a9:       89 95 1c bb ff ff       mov    %edx,0xffffbb1c(%ebp)
 80482af:       90                      nop    
 80482b0:       6a 00                   push   $0x0
 80482b2:       68 00 08 00 00          push   $0x800
 80482b7:       8d 85 00 f8 ff ff       lea    0xfffff800(%ebp),%eax
 80482bd:       50                      push   %eax
 80482be:       8b 8d 38 bb ff ff       mov    0xffffbb38(%ebp),%ecx
 80482c4:       51                      push   %ecx
 80482c5:       e8 7a e8 00 00          call   <recv>  (0x8056b44)
The last call here is a recv so we no we are getting somewhere. At the top, after some more initializations, time(0) is called with the result passed 80559a0. The result of that is not used. We may need to come back to that function. (srandom). Next we create a socket and store it in bb38. The socket is created as socket(2=AF_INET,3=SOCK_RAW, 12) confirming that we are looking for IP protocal 12 packets. After initializing some pointers, we call recv(bb38, &f800, 0x800, 0). We just stored the sock in bb38, and our hypothesis above that f800 is a buffer turned out to be correct. It is big enough to receive 2024 (0x800) bytes. We are about to find out what happens when the program receives a packet.
 80482c5:       e8 7a e8 00 00          call   <recv>  (0x8056b44)
 80482ca:       89 c6                   mov    %eax,%esi
 80482cc:       83 c4 10                add    $0x10,%esp
 80482cf:       8b 95 30 bb ff ff       mov    0xffffbb30(%ebp),%edx
 80482d5:       80 7a 09 0b             cmpb   $0xb,0x9(%edx)
 80482d9:       0f 85 d9 0b 00 00       jne    8048eb8
 80482df:       8b 8d 2c bb ff ff       mov    0xffffbb2c(%ebp),%ecx
 80482e5:       80 39 02                cmpb   $0x2,(%ecx)
 80482e8:       0f 85 ca 0b 00 00       jne    8048eb8
 80482ee:       81 fe c8 00 00 00       cmp    $0xc8,%esi
 80482f4:       0f 8e be 0b 00 00       jle    8048eb8
 80482fa:       8b 95 20 bb ff ff       mov    0xffffbb20(%ebp),%edx
 8048300:       52                      push   %edx
 8048301:       8b 8d 28 bb ff ff       mov    0xffffbb28(%ebp),%ecx
 8048307:       51                      push   %ecx
 8048308:       8d 46 ea                lea    0xffffffea(%esi),%eax
 804830b:       50                      push   %eax
 804830c:       e8 d7 1e 00 00          call   804a1e8
 8048311:       83 c4 0c                add    $0xc,%esp
 8048314:       0f b6 85 01 f0 ff ff    movzbl 0xfffff001(%ebp),%eax
 804831b:       48                      dec    %eax
 804831c:       83 f8 0b                cmp    $0xb,%eax
 804831f:       0f 87 93 0b 00 00       ja     8048eb8
 8048325:       ff 24 85 2c 83 04 08    jmp    *804832c(,%eax,4)
 804832c:
After saving the number of bytes read to esi, we examine byte 9 from bb30 which was initialized to start of the receive buffer f800. The ninth byte in the IP packet header (it's a raw socket so we get everything) is the protocol [add reference here]. If it is not 11 skip down to the bottom. Likewise bb2c points to the byte at 0x14. This is the start of the IP payload. That byte must be equal to 2 and the number of bytes read must be greater than 200 (0xc8) bytes or again the program skips down to the bottom.

After we are sure we want to continue, we set up a call to 804a1e8. That address is close enough, it might be one of the authors routines rather than a library routine. The first argument is si-0x16, using a common technique of doing additions with the lea command. bb28 points to the byte 0x16 in from the start of the buffer at f800, and the last argument bb20 points to f000. Let look at what is being passed. 0x16 bytes in from the receive buffer is just past the (byte) 2 we were looking for. Those 0x16 bytes are also subtracted from the number of bytes read for the first argument. Immediately up returning we use a byte from the area pointed to by the last argument and subsequently other bytes from that area. This looks like it may be the decode subroutine. We will check into that soon.

As noted above, the second byte from the return buffer. The next 3 statement subract one from that byte and make sure the result is not bigger that 11 (0xb). The jmp then uses that as an index into a jump table that immediately follow. Therefore the next 48 (12 addrs x 4 bytes/addr) bytes need to be interpretted as address not as instructions. This is a standard way if implementing a switch statement. Taking into account the decrement the values being looked for are 1 to 12. These will likely correspond to different commands. I have annotated the display below with the command numbers and actuall address.

 804832c:          5c 83 04 08     ;  1  0804835c
 8048330:          f0 83 04 08     ;  2  080483f0
 8048334:          90 85 04 08     ;  3  08048590
 8048338:          1c 87 04 08     ;  4  0804871c
 804833c:          c8 87 04 08     ;  5  080487c8
 8048340:          94 88 04 08     ;  6  08048894
 8048344:          cc 8a 04 08     ;  7  08048acc
 8048348:          58 8b 04 08     ;  8  08048b58
 804834c:          80 8b 04 08     ;  9  08048b80
 8048350:          34 8c 04 08     ; 10  08048c34
 8048354:          08 8d 04 08     ; 11  08048d08
 8048358:          e4 8d 04 08     ; 12  08048de4
Time prevents going into further detail of all of the disassembly analysis. One function that I have converted to C like syntax is the function at 0x8049564 that implements the DNS2 function. It can be found in the files as dns2.c.

The full details of the commands and their formats are in command.html. I also implemented a tool to test the command formats by sending them to the program. Those tests are discussed in test.html