Analysis
$ file the-binary the-binary: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, stripped $OK, so the program doen't need any external libraries to run and all names have been stripped from the binary. We expected this, but it will make it harder to seperate the library functions from the custum functions and to identify the library functions used.
Next, lets see what else is in the binary. We will use the strings command to extract any readable strings in the executable.
$ strings -5 the-binary . . . Z,)J4 C,9C0t WVSj u j@j [mingetty] /tmp/.hj237349 /bin/csh -f -c "%s" 1> %s 2>&1 TfOjG /sbin:/bin:/usr/sbin:/usr/bin:/usr/local/bin/:. HISTFILE linux /bin/sh /bin/csh -f -c "%s" %d.%d.%d.%d %u.%u.%u.%u gethostby*.getanswer: asked for "%s", got "%s" RESOLV_HOST_CONF /etc/host.conf order resolv+: %s: "%s" command incorrectly formatted. hosts resolv+: "%s" is an invalid keyword resolv+: valid keywords are: %s, %s and %s resolv+: search order not specified or unrecognized keyword, host resolution will fail. . . . @(#) The Linux C library 5.3.12 . . .I have extracted the certain sections for discussion. For reference, the entire output as strings_output.txt in the additional files. In the main part of the strings extracted, starting halfway down we see "gethostby*...", "/etc/host.conf" and several lines starting "resolv+:". We are clearly getting into strings from the library here. The "[mingetty]" is interesting since it doesn't seem like something that should be in libc. Indeed, on my RedHad 7.2 machine strings /usr/lib/libc.a | grep 'mingetty' returns no matches. It appears, then, that at least a few of the strings at the top of the list are from the programs author. We note that this includes format strings getting a shell to execute a command.
One additional string stands out from the list: "@(#) The Linux C library 5.3.12". This is an old version of the old libc library (as opposed to a version of the newer glibc.) This fact will come up again.
What fun is a program without running it? To set up a safe environment for testing, a vmware virtual machine was set up and loaded with Redhat 7.3. I was configured for host only networking, which meant that the network was entirely simulated in the real machine. As a further precaution, the real machine was configured with IP forwarding turned off.
To help monitor what the programs does, we will run it using strace to record all of the system calls it makes. We will use one flag, -f, to follow forks.
$ strace -f ./the-binary execve("./the-binary", ["./the-binary", "2"], [/* 19 vars */]) = 0 personality(0 /* PER_??? */) = 0 geteuid() = 500 _exit(-1) = ?Well, it doesn't do much except check our effective userid. It must not like the fact that we are not root so...
# strace -f ./the-binary execve("./the-binary", ["./the-binary"], [/* 18 vars */]) = 0 personality(0 /* PER_??? */) = 0 geteuid() = 0 sigaction(SIGCHLD, {SIG_IGN}, {SIG_DFL}, 0x42029098) = 0 fork() = 1567 [pid 1566] _exit(0) = ? [pid 1567] setsid() = 1567 sigaction(SIGCHLD, {SIG_IGN}, {SIG_IGN}, 0x80575a8) = 0 fork() = 1568 [pid 1567] _exit(0) = ? chdir("/") = 0 close(0) = 0 close(1) = 0 close(2) = 0 time(NULL) = 1022595528 socket(PF_INET, SOCK_RAW, 0xb /* IPPROTO_??? */) = 0 sigaction(SIGHUP, {SIG_IGN}, {SIG_DFL}, 0x42029098) = 0 sigaction(SIGTERM, {SIG_IGN}, {SIG_DFL}, 0x42029098) = 0 sigaction(SIGCHLD, {SIG_IGN}, {SIG_IGN}, 0x80575a8) = 0 sigaction(SIGCHLD, {SIG_IGN}, {SIG_IGN}, 0x80575a8) = 0 recv(0Ah, that's better. Let's see what is going on here. It starts off with two forks, a setsid, chdir "/" and a bunch of closes. That is pretty much standard daemonize functionality and so is just initializatoin. The socket call is the really interesting part. It opens a raw IP socket for reading IP protocal 11. According to the IANA that is the PUP protocol, which is probably not what this program is looking for. After that it just sits in a recv looking for packets.
To take a look at the processes lets fire up another terminal. The processes forked all have pids of 156x so we will just look for those.
# ps aux | grep 156 root 1565 0.0 0.3 1536 472 pts/0 S 09:18 0:00 strace -f ./the-b root 1568 0.0 0.0 240 44 ? S 09:18 0:00 [mingetty] # kill -9 1568 #Here we see our original strace, the name of the executable having been truncated. Process 1568 shows its process name as "[mingetty]". Referring back to the strace output, the last fork returned 1568, meaning the the process has that as its pid. Our binary has changed its name to "[mingetty]"! Now we know whay that appeared in the strings output.
Unless we want to try sending random packets, we need to look at the binary again. First we will see if there is anything to learn from the ELF headers.
$ objdump -x the-binary the-binary: file format elf32-i386 the-binary architecture: i386, flags 0x00000102: EXEC_P, D_PAGED start address 0x08048090 Program Header: LOAD off 0x00000000 vaddr 0x08048000 paddr 0x08048000 align 2**12 filesz 0x00024222 memsz 0x00024222 flags r-x LOAD off 0x00024228 vaddr 0x0806d228 paddr 0x0806d228 align 2**12 filesz 0x0000c094 memsz 0x00011970 flags rw- Sections: Idx Name Size VMA LMA File off Algn 0 .init 00000008 08048080 08048080 00000080 2**4 CONTENTS, ALLOC, LOAD, READONLY, CODE 1 .text 0001f53c 08048090 08048090 00000090 2**4 CONTENTS, ALLOC, LOAD, READONLY, CODE 2 __libc_subinit 00000004 080675cc 080675cc 0001f5cc 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA 3 .fini 00000008 080675d0 080675d0 0001f5d0 2**4 CONTENTS, ALLOC, LOAD, READONLY, CODE 4 .rodata 00004c4a 080675d8 080675d8 0001f5d8 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA 5 .data 0000c084 0806d228 0806d228 00024228 2**2 CONTENTS, ALLOC, LOAD, DATA 6 .ctors 00000008 080792ac 080792ac 000302ac 2**2 CONTENTS, ALLOC, LOAD, DATA 7 .dtors 00000008 080792b4 080792b4 000302b4 2**2 CONTENTS, ALLOC, LOAD, DATA 8 .bss 000058dc 080792bc 080792bc 000302bc 2**2 ALLOC 9 .note 00000d5c 00000000 00000000 000302bc 2**0 CONTENTS, READONLY 10 .comment 00000ea6 00000000 00000000 00031018 2**0 CONTENTS, READONLY objdump: the-binary: no symbols $There are a number of things we learn from this. First the starting address of execution is at 0x08048090. That may come in handy when we start looking at the disassembled code. The initialisation and finalization sections are small and probably uninteresting. The sections we really care about are:
$ objdump -j .rodata --full-contents the-binary | head -60 the-binary: file format elf32-i386 Contents of section .rodata: 80675d8 5b6d696e 67657474 795d002f 00002f74 [mingetty]./../t 80675e8 6d702f2e 686a3233 37333439 002f6269 mp/.hj237349./bi 80675f8 6e2f6373 68202d66 202d6320 22257322 n/csh -f -c "%s" 8067608 20313e20 25732032 3e263100 72620054 1> %s 2>&1.rb.T 8067618 664f6a47 00fffb01 002f7362 696e3a2f fOjG...../sbin:/ 8067628 62696e3a 2f757372 2f736269 6e3a2f75 bin:/usr/sbin:/u 8067638 73722f62 696e3a2f 7573722f 6c6f6361 sr/bin:/usr/loca 8067648 6c2f6269 6e2f3a2e 00504154 48004849 l/bin/:..PATH.HI 8067658 53544649 4c45006c 696e7578 00544552 STFILE.linux.TER 8067668 4d007368 002f6269 6e2f7368 002f6269 M.sh./bin/sh./bi 8067678 6e2f6373 68202d66 202d6320 22257322 n/csh -f -c "%s" 8067688 20002564 2e25642e 25642e25 64008d36 .%d.%d.%d.%d..6 8067698 15000000 15000000 14000000 15000000 ................ 80676a8 15000000 19000000 14000000 14000000 ................ 80676b8 14000000 476e0100 00010000 00000000 ....Gn.......... 80676c8 03636f6d 00000600 01000000 00000000 .com............ 80676d8 00000000 00000000 00000000 00000000 ................ 80676e8 00000000 0000476e 01000001 00000000 ......Gn........ 80676f8 0000036e 65740000 06000100 00000000 ...net.......... 8067708 00000000 00000000 00000000 00000000 ................ 8067718 00000000 00000000 476e0100 00010000 ........Gn...... 8067728 00000000 03646500 00060001 00000000 .....de......... 8067738 00000000 00000000 00000000 00000000 ................ 8067748 00000000 00000000 0000476e 01000001 ..........Gn.... 8067758 00000000 00000365 64750000 06000100 .......edu...... 8067768 00000000 00000000 00000000 00000000 ................ 8067778 00000000 00000000 00000000 476e0100 ............Gn.. 8067788 00010000 00000000 036f7267 00000600 .........org.... 8067798 01000000 00000000 00000000 00000000 ................ 80677a8 00000000 00000000 00000000 0000476e ..............Gn 80677b8 01000001 00000000 00000375 73630365 ...........usc.e 80677c8 64750000 06000100 00000000 00000000 du.............. 80677d8 00000000 00000000 00000000 00000000 ................ 80677e8 476e0100 00010000 00000000 03657300 Gn...........es. 80677f8 00060001 00000000 00000000 00000000 ................ 8067808 00000000 00000000 00000000 00000000 ................ 8067818 0000476e 01000001 00000000 00000367 ..Gn...........g 8067828 72000006 00010000 00000000 00000000 r............... 8067838 00000000 00000000 00000000 00000000 ................ 8067848 00000000 476e0100 00010000 00000000 ....Gn.......... 8067858 03696500 00060001 00000000 00000000 .ie............. 8067868 00000000 00000000 00000000 00000000 ................ 8067878 00000000 00000000 00000000 00000000 ................ 8067888 00000000 00000000 00000000 00000000 ................ 8067898 00000000 00000000 00000000 00000000 ................ 80678a8 00000000 00000000 25752e25 752e2575 ........%u.%u.%u 80678b8 2e257500 25630025 63257300 67657468 .%u.%c.%c%s.geth 80678c8 6f737462 792a2e67 6574616e 73776572 ostby*.getanswer 80678d8 3a206173 6b656420 666f7220 22257322 : asked for "%s" 80678e8 2c20676f 74202225 73220052 45534f4c , got "%s".RESOL 80678f8 565f484f 53545f43 4f4e4600 2f657463 V_HOST_CONF./etc 8067908 2f686f73 742e636f 6e660072 006f7264 /host.conf.r.ord 8067918 65720020 09007265 736f6c76 2b3a2025 er. ..resolv+: % 8067928 733a2022 25732220 636f6d6d 616e6420 s: "%s" command 8067938 696e636f 72726563 746c7920 666f726d incorrectly form 8067948 61747465 642e0a00 202c3b3a 0062696e atted... ,;:.bin $And, indeed, we do see those strings. In addition, there is a section in the middle that contains a number of top level domain names. The program may be doing so DNS stuff. The rest of this section looks like strings from libc and probably not very interesting, but if we see an address pointing into here we can check it out.
Doing the same thing to the .data section does give us much. It looks like there might be some regularity in the bytes, but it doesn't give us much to go on. The .note section is rather devoid of information, so we move on the .comment section.
$ objdump -j .comment --full-contents the-binary | head -20 the-binary: file format elf32-i386 Contents of section .comment: 0000 00474343 3a202847 4e552920 322e372e .GCC: (GNU) 2.7. 0010 322e6c2e 32000047 43433a20 28474e55 2.l.2..GCC: (GNU 0020 2920322e 372e3200 00474343 3a202847 ) 2.7.2..GCC: (G 0030 4e552920 322e372e 322e6c2e 32000047 NU) 2.7.2.l.2..G 0040 43433a20 28474e55 2920322e 372e322e CC: (GNU) 2.7.2. 0050 6c2e3200 00474343 3a202847 4e552920 l.2..GCC: (GNU) 0060 322e372e 322e6c2e 32000047 43433a20 2.7.2.l.2..GCC: 0070 28474e55 2920322e 372e322e 6c2e3200 (GNU) 2.7.2.l.2. 0080 00474343 3a202847 4e552920 322e372e .GCC: (GNU) 2.7. 0090 322e6c2e 32000047 43433a20 28474e55 2.l.2..GCC: (GNU 00a0 2920322e 372e322e 6c2e3200 00474343 ) 2.7.2.l.2..GCC 00b0 3a202847 4e552920 322e372e 322e6c2e : (GNU) 2.7.2.l. 00c0 32000047 43433a20 28474e55 2920322e 2..GCC: (GNU) 2. 00d0 372e322e 6c2e3200 00474343 3a202847 7.2.l.2..GCC: (G 00e0 4e552920 322e372e 322e6c2e 32000047 NU) 2.7.2.l.2..G 00f0 43433a20 28474e55 2920322e 372e322e CC: (GNU) 2.7.2. $Here we see information that the compiler has tucked away in the object files. Namely all of the code was compiled with version 2.7.2 or 2.7.2.1.2 of gcc. It seems likely that the main program was compiled with gcc 2.7.2 and libc with 2.7.2.1.2. Note that in these listing I have abbreviated the outout to save space. The contents of the entire section look like the first 20 lines that were printed.
So much for preliminaries, on to the main event. We can easily get a disassembly listing of the .text section with objdump -j .text -d the-binary. We will do that and save the output as tbo ("the-binary output" -- keep the typing to a minumum.) This gives us a single file of 43641 lines. That alot and we are going to need to find a way to deal with that. First of all, we know that it is composed of may functions. If we search through the disassembly listing looking for the calls we can then make a list of the addresses of all of the subroutines called. functionize does this and then takes a second pass through the file to prepend the string "function addr" before any line called as a subroutine. functionize can be found in the included files. We have now reduced our problem to figuring out roughly 460 functions.
Before we go any further lets take care of one other tool. There two things we want this tool to do. First when given an address extract the function at that address to standard out. That way we can easily grab a particular function. As we identify functions we will be keeping track which address corresponds to which function (name). So when we extract a function, we want to replace all calls to an address with calls to the corresponding function name. This will simplify our analysis of functions that call other functions we have already identified. The perl script extract (see included files) does this using a file call "addrs" to hold the mapping of addresses to subroutine names.
As we saw earlier, the binary uses an old version off the standard library, namely 5.3.12. To make it easier to identify the functions from the library, I went looking for old distributions. I found a Slackware 3.5.0 distribution from 1998 that I installed in another vmware virtual machine. It proved to contain libc version 5.4.44. This is newer than what was used in the-binary but might be close enough.
The simplest functions to identify should be the system calls. These are implemented by putting a function code into the eax register and genereating an interupt 0x80. The syscall numbers are defined in the linux source and available to the library in /usr/include/asm/unistd.h. Looking at our tbo, the first int calls we see are in function 8048090, this is the entry point for the program and has two different syscalls, so let's come back to it later. The next int $0x90 show up int function 80569fc.
function 80569fc 80569fc: 55 push %ebp 80569fd: 89 e5 mov %esp,%ebp 80569ff: 56 push %esi 8056a00: 53 push %ebx 8056a01: 8b 5d 08 mov 0x8(%ebp),%ebx 8056a04: 8b 4d 0c mov 0xc(%ebp),%ecx 8056a07: 8b 55 10 mov 0x10(%ebp),%edx 8056a0a: b8 72 00 00 00 mov $0x72,%eax 8056a0f: 31 f6 xor %esi,%esi 8056a11: cd 80 int $0x80 8056a13: 85 c0 test %eax,%eax 8056a15: 7d 0c jge 0x8056a23 8056a17: f7 d8 neg %eax 8056a19: a3 14 8b 07 08 mov %eax,0x8078b14 8056a1e: b8 ff ff ff ff mov $0xffffffff,%eax 8056a23: 8d 65 f8 lea 0xfffffff8(%ebp),%esp 8056a26: 5b pop %ebx 8056a27: 5e pop %esi 8056a28: 89 ec mov %ebp,%esp 8056a2a: 5d pop %ebp 8056a2b: c3 retAt 8056a0a, the eax register is loaded with 0x72 = 114. asm/unistd.h #defines __NR_wait4 114. 805601-07 load 3 paramaters from the stack into ebx, ecx, and edx, but the xor in 8056a0f puts zero in esi, the 4 argument to the wait4 system call. What we are seeing here, then is not wait4() but a very similar call. Checking the man pages we see wait3() only has 3 parameters but the one it is missing is the first one. There is a reference to waitpid() whose prototype looks just like wait4 except it doesn't have the last pointer. We therefore conclude that function 80569fc is waitpid() and "8569fc,waitpid" to our address file.
While the identification is complete it is instructive to look at rest of the function. If the return code is negative -- that's how the kernel indicates an error -- then the positive value is stored in location 8078b14. That location will likely be errno.
We can thus go through all of tbo looking for other syscalls. Some of the functions are more complicated that wait3 and sometimes multiple functions will use the same syscall. One important instance of this is syscall 0x66, __NR_socketcall. We find almost a dozen different functions all using that value. Apparently all functions involving sockets go through that one syscall. Looking at the disassembly we do see that the edx register is loaded with a different small integer for eash function. That should help us determine which function is which.
This first such function we find is 8056a2c, which loads 5 into the eax register. If we do an objdump -d /usr/bin/libc.a we can search the ouput a function that 1. has int $0x80 2. has mov $0x66,eax and 3. has mov $0x5,%eax. What we find is the following function:
00000054 <accept> 54: 55 pushl %ebp 55: 89 e5 movl %esp,%ebp 57: 83 ec 0c subl $0xc,%esp 5a: 56 pushl %esi 5b: 53 pushl %ebx 5c: 8b 55 0c movl 0xc(%ebp),%edx 5f: 8b 4d 10 movl 0x10(%ebp),%ecx 62: 8b 45 08 movl 0x8(%ebp),%eax 65: 89 45 f4 movl %eax,0xfffffff4(%ebp) 68: 89 55 f8 movl %edx,0xfffffff8(%ebp) 6b: 89 4d fc movl %ecx,0xfffffffc(%ebp) 6e: ba 05 00 00 00 movl $0x5,%edx 73: 8d 4d f4 leal 0xfffffff4(%ebp),%ecx 76: b8 66 00 00 00 movl $0x66,%eax 7b: 89 d3 movl %edx,%ebx 7d: cd 80 int $0x80 7f: 89 c6 movl %eax,%esi 81: 85 f6 testl %esi,%esi 83: 7c 07 jl 8c <accept+0x38> 85: 89 f2 movl %esi,%edx 87: eb 11 jmp 9a <accept+0x46> 89: 8d 76 00 leal 0x0(%esi),%esi 8c: e8 fc ff ff ff call 8d <accept+0x39> 91: f7 de negl %esi 93: 89 30 movl %esi,(%eax) 95: ba ff ff ff ff movl $0xffffffff,%edx 9a: 89 d0 movl %edx,%eax 9c: 8d 65 ec leal 0xffffffec(%ebp),%esp 9f: 5b popl %ebx a0: 5e popl %esi a1: 89 ec movl %ebp,%esp a3: 5d popl %ebp a4: c3 retThis has the same structure and constants as our function 6056a2c, which we then conclude is accept, and add that to our addrs list.
Now that we have a start on the library lets tackle the main program. To help us determine where it is, we will compile and look at a test program on our old slackware system.
$ cat hello.c #include <stdio.h> main() { printf("Hello, world.\n"); } $ gcc -o hello --static hello.c $ objdump -d -j .text hello | head -51 hello: file format elf32-i386 Disassembly of section .text: 08048090 <_start> 8048090: 59 popl %ecx 8048091: 89 e3 movl %esp,%ebx 8048093: 89 e0 movl %esp,%eax 8048095: 83 e4 f8 andl $0xfffffff8,%esp 8048098: 89 ca movl %ecx,%edx 804809a: 01 d2 addl %edx,%edx 804809c: 01 d2 addl %edx,%edx 804809e: 01 d0 addl %edx,%eax 80480a0: 83 c0 04 addl $0x4,%eax 80480a3: 31 ed xorl %ebp,%ebp 80480a5: 55 pushl %ebp 80480a6: 55 pushl %ebp 80480a7: 55 pushl %ebp 80480a8: 89 e5 movl %esp,%ebp 80480aa: 50 pushl %eax 80480ab: 53 pushl %ebx 80480ac: 51 pushl %ecx 80480ad: b8 88 00 00 00 movl $0x88,%eax 80480b2: bb 00 00 00 00 movl $0x0,%ebx 80480b7: cd 80 int $0x80 80480b9: 8b 44 24 08 movl 0x8(%esp,1),%eax 80480bd: a3 84 e3 05 08 movl %eax,0x805e384 80480c2: 0f b7 05 40 e6 movzwl 0x805e640,%eax 80480c7: 05 08 80480c9: 50 pushl %eax 80480ca: e8 79 62 00 00 call 804e348 <__setfpucw> 80480cf: 83 c4 04 addl $0x4,%esp 80480d2: e8 fd 60 00 00 call 804e1d4 <__libc_init> 80480d7: 68 c0 93 05 08 pushl $0x80593c0 80480dc: e8 bf 5e 00 00 call 804dfa0 <atexit> 80480e1: 83 c4 04 addl $0x4,%esp 80480e4: e8 97 ff ff ff call 8048080 <_init> 80480e9: e8 8e 00 00 00 call 804817c <main> 80480ee: 50 pushl %eax 80480ef: e8 60 5f 00 00 call 804e054 <exit> 80480f4: 5b popl %ebx 80480f5: 8d 74 26 00 leal 0x0(%esi,1),%esi 80480f9: 8d bc 27 00 00 leal 0x0(%edi,1),%edi 80480fe: 00 00 08048100 <done> 8048100: b8 01 00 00 00 movl $0x1,%eax 8048105: cd 80 int $0x80 8048107: eb f7 jmp 8048100 <done> 8048109: 8d b4 26 00 00 leal 0x0(%esi,1),%esi 804810e: 00 00 $Here, the function at the entry point does some initialization and then calls our main(). Therefore if we look at the same segment of the binary is should locate main() for us.
$ ./extract 8048090 function 8048090 8048090: 59 pop %ecx 8048091: 89 e3 mov %esp,%ebx 8048093: 89 e0 mov %esp,%eax 8048095: 89 ca mov %ecx,%edx 8048097: 01 d2 add %edx,%edx 8048099: 01 d2 add %edx,%edx 804809b: 01 d0 add %edx,%eax 804809d: 83 c0 04 add $0x4,%eax 80480a0: 31 ed xor %ebp,%ebp 80480a2: 55 push %ebp 80480a3: 55 push %ebp 80480a4: 55 push %ebp 80480a5: 89 e5 mov %esp,%ebp 80480a7: 50 push %eax 80480a8: 53 push %ebx 80480a9: 51 push %ecx 80480aa: b8 88 00 00 00 mov $0x88,%eax 80480af: bb 00 00 00 00 mov $0x0,%ebx 80480b4: cd 80 int $0x80 80480b6: 8b 44 24 08 mov 0x8(%esp,1),%eax 80480ba: a3 28 d2 06 08 mov %eax,0x806d228 80480bf: 0f b7 05 18 8b 07 08 movzwl 0x8078b18,%eax 80480c6: 50 push %eax 80480c7: e8 a0 f4 00 00 call 0x805756c 80480cc: 83 c4 04 add $0x4,%esp 80480cf: e8 70 ec 00 00 call 0x8056d44 80480d4: 68 d0 75 06 08 push $0x80675d0 80480d9: e8 2a de 00 00 call 0x8055f08 80480de: 83 c4 04 add $0x4,%esp 80480e1: e8 9a ff ff ff call 0x8048080 80480e6: e8 49 00 00 00 call 0x8048134 80480eb: 50 push %eax 80480ec: e8 cb de 00 00 call <exit> (0x8055fbc) 80480f1: 5b pop %ebx 80480f2: 8d b4 26 00 00 00 00 lea 0x0(%esi,1),%esi 80480f9: 8d b4 26 00 00 00 00 lea 0x0(%esi,1),%esi 8048100: b8 01 00 00 00 mov $0x1,%eax 8048105: cd 80 int $0x80 8048107: eb f7 jmp 0x8048100 8048109: 8d b4 26 00 00 00 00 lea 0x0(%esi,1),%esi 8048110: 53 push %ebx 8048111: bb b8 92 07 08 mov $0x80792b8,%ebx 8048116: 83 3d b8 92 07 08 00 cmpl $0x0,0x80792b8 804811d: 74 0d je 0x804812c 804811f: 90 nop 8048120: 8b 03 mov (%ebx),%eax 8048122: ff d0 call *%eax 8048124: 83 c3 04 add $0x4,%ebx 8048127: 83 3b 00 cmpl $0x0,(%ebx) 804812a: 75 f4 jne 0x8048120 804812c: 5b pop %ebx 804812d: c3 ret 804812e: 8d 36 lea (%esi),%esi 8048130: c3 ret 8048131: 90 nop 8048132: 90 nop 8048133: 90 nop $Matching these two up, we find the main() is call immedialty before push %eax; call exit, which makes 0x8048134 our main function. Let's just start going through it. The next several (many) excerpts will be from ./extract 8048134.
function 8048134 8048134: 55 push %ebp 8048135: 89 e5 mov %esp,%ebp 8048137: 81 ec f0 44 00 00 sub $0x44f0,%esp 804813d: 57 push %edi 804813e: 56 push %esi 804813f: 53 push %ebx 8048140: 8b 5d 0c mov 0xc(%ebp),%ebx 8048143: c7 85 40 bb ff ff 01 movl $0x1,0xffffbb40(%ebp) 804814a: 00 00 00 804814d: 8d 95 00 f8 ff ff lea 0xfffff800(%ebp),%edx 8048153: 89 95 30 bb ff ff mov %edx,0xffffbb30(%ebp) 8048159: 8d 8d 14 f8 ff ff lea 0xfffff814(%ebp),%ecx 804815f: 89 8d 2c bb ff ff mov %ecx,0xffffbb2c(%ebp) 8048165: 8d 95 16 f8 ff ff lea 0xfffff816(%ebp),%edx 804816b: 89 95 28 bb ff ff mov %edx,0xffffbb28(%ebp) 8048171: c7 85 3c bb ff ff 10 movl $0x10,0xffffbb3c(%ebp) 8048178: 00 00 00 804817b: e8 8c f0 00 00 call <geteuid> (0x805720c) 8048180: 85 c0 test %eax,%eax 8048182: 74 08 je 804818c 8048184: 6a ff push $0xffffffff 8048186: e8 31 de 00 00 call <exit> (0x8055fbc) 804818b: 90 nop 804818c: 8b 13 mov (%ebx),%edxAfter the standard prolog several local (automatic) variables are initialized. For the sake of convenience we will refer to the variable by their offset from %exp, but dropping the leading "ffff". Thus at 8048137 variable bb40 is initialized to 1; moving down bb30 gets the address of f800. The following lines reference f814 and f815, which leads us to think the f800 might be a buffer and bb2c and bb28 are then initialized to point into that buffer. In C it might look something like this:
804818e: 30 c0 xor %al,%al 8048190: 89 d7 mov %edx,%edi 8048192: fc cld 8048193: b9 ff ff ff ff mov $0xffffffff,%ecx 8048198: f2 ae repnz scas %es:(%edi),%al 804819a: 89 c8 mov %ecx,%eax 804819c: f7 d0 not %eax 804819e: 48 dec %eax 804819f: 50 push %eax 80481a0: 6a 00 push $0x0 80481a2: 52 push %edx 80481a3: e8 bc f5 00 00 call 8057764 80481a8: 8b 13 mov (%ebx),%edx 80481aa: a1 d8 75 06 08 mov 80675d8,%eax 80481af: 89 02 mov %eax,(%edx) 80481b1: a1 dc 75 06 08 mov 80675dc,%eax 80481b6: 89 42 04 mov %eax,0x4(%edx) 80481b9: 66 a1 e0 75 06 08 mov 80675e0,%ax 80481bf: 66 89 42 08 mov %ax,8(%edx) 80481c3: 8a 05 e2 75 06 08 mov 80675e2,%al 80481c9: 88 42 0a mov %al,0xa(%edx)In this section the repnz scasscans the string pointed to by edi looking for 0 (al was set to 0 in the first line. What we end up with at the end is the length of the string in eax. edi is set from edx the start of argv and therefor is argv[0]. the call that immediately foolows then looks like function(argv[0], 0, strlen(argv[0]) which makes memset a good candidate for function 8057764. extracting that function confirms that is behaves like memset.
The last part references as series of integers starting at 80675d8. Checking the objdump -x output, that is at the beginning of the .rodata section. Our dump of that section reveals the string "[mingetty]". The mov commands transfers a total of 4+4+2+1 bytes which is the total length of the string including the terminating null. This then is where the process changes its name.
80481cc: 6a 01 push $0x1 80481ce: 6a 11 push $0x11 80481d0: e8 e7 e7 00 00 call <signal> (0x80569bc) 80481d5: e8 0e f0 00 00 call <fork> (0x80571e8) 80481da: 83 c4 14 add $0x14,%esp 80481dd: 85 c0 test %eax,%eax 80481df: 74 07 je 80481e8 80481e1: 6a 00 push $0x0 80481e3: e8 d4 dd 00 00 call <exit> (0x8055fbc) 80481e8: e8 4f f1 00 00 call <setsid> (0x805733c) 80481ed: 6a 01 push $0x1 80481ef: 6a 11 push $0x11 80481f1: e8 c6 e7 00 00 call <signal> (0x80569bc) 80481f6: e8 ed ef 00 00 call <fork> (0x80571e8) 80481fb: 83 c4 08 add $0x8,%esp 80481fe: 85 c0 test %eax,%eax 8048200: 74 0a je 804820c 8048202: 6a 00 push $0x0 8048204: e8 b3 dd 00 00 call <exit> (0x8055fbc) 8048209: 8d 76 00 lea 0x0(%esi),%esi 804820c: 68 e3 75 06 08 push $80675e3 8048211: e8 1e ef 00 00 call <chdir> (0x8057134) 8048216: 6a 00 push $0x0 8048218: e8 43 ef 00 00 call <close> (0x8057160) 804821d: 6a 01 push $0x1 804821f: e8 3c ef 00 00 call <close> (0x8057160) 8048224: 6a 02 push $0x2 8048226: e8 35 ef 00 00 call <close> (0x8057160)Looking at the system calls in this next section, this looks like the daemonization we saw from the strace. Refering to the .rodata section we find that 80675e3 (loc 804820c) is the address of the string "/", which is what we expected for chdir. (There are some tools like reap which automatically find strings like this for you).
804822b: c7 05 74 e7 07 08 00 movl $0x0,807e774 8048232: 00 00 00 8048235: c7 05 70 e7 07 08 00 movl $0x0,807e770 804823c: 00 00 00 804823f: c7 05 78 e7 07 08 00 movl $0x0,807e778 8048246: 00 00 00 8048249: 6a 00 push $0x0 804824b: e8 f4 f1 00 00 call <time> (0x8057444) 8048250: 83 c4 14 add $0x14,%esp 8048253: 50 push %eax 8048254: e8 47 d7 00 00 call 80559a0 8048259: 83 c4 04 add $0x4,%esp 804825c: 6a 0b push $0xb 804825e: 6a 03 push $0x3 8048260: 6a 02 push $0x2 8048262: e8 8d ea 00 00 call <socket> (0x8056cf4) 8048267: 89 85 38 bb ff ff mov %eax,0xffffbb38(%ebp) 804826d: 6a 01 push $0x1 804826f: 6a 01 push $0x1 8048271: e8 46 e7 00 00 call <signal> (0x80569bc) 8048276: 6a 01 push $0x1 8048278: 6a 0f push $0xf 804827a: e8 3d e7 00 00 call <signal> (0x80569bc) 804827f: 6a 01 push $0x1 8048281: 6a 11 push $0x11 8048283: e8 34 e7 00 00 call <signal> (0x80569bc) 8048288: 83 c4 24 add $0x24,%esp 804828b: 6a 01 push $0x1 804828d: 6a 11 push $0x11 804828f: e8 28 e7 00 00 call <signal> (0x80569bc) 8048294: 83 c4 08 add $0x8,%esp 8048297: 8d 8d 00 f0 ff ff lea 0xfffff000(%ebp),%ecx 804829d: 89 8d 20 bb ff ff mov %ecx,0xffffbb20(%ebp) 80482a3: 8d 95 48 ee ff ff lea 0xffffee48(%ebp),%edx 80482a9: 89 95 1c bb ff ff mov %edx,0xffffbb1c(%ebp) 80482af: 90 nop 80482b0: 6a 00 push $0x0 80482b2: 68 00 08 00 00 push $0x800 80482b7: 8d 85 00 f8 ff ff lea 0xfffff800(%ebp),%eax 80482bd: 50 push %eax 80482be: 8b 8d 38 bb ff ff mov 0xffffbb38(%ebp),%ecx 80482c4: 51 push %ecx 80482c5: e8 7a e8 00 00 call <recv> (0x8056b44)The last call here is a recv so we no we are getting somewhere. At the top, after some more initializations, time(0) is called with the result passed 80559a0. The result of that is not used. We may need to come back to that function. (srandom). Next we create a socket and store it in bb38. The socket is created as socket(2=AF_INET,3=SOCK_RAW, 12) confirming that we are looking for IP protocal 12 packets. After initializing some pointers, we call recv(bb38, &f800, 0x800, 0). We just stored the sock in bb38, and our hypothesis above that f800 is a buffer turned out to be correct. It is big enough to receive 2024 (0x800) bytes. We are about to find out what happens when the program receives a packet.
80482c5: e8 7a e8 00 00 call <recv> (0x8056b44) 80482ca: 89 c6 mov %eax,%esi 80482cc: 83 c4 10 add $0x10,%esp 80482cf: 8b 95 30 bb ff ff mov 0xffffbb30(%ebp),%edx 80482d5: 80 7a 09 0b cmpb $0xb,0x9(%edx) 80482d9: 0f 85 d9 0b 00 00 jne 8048eb8 80482df: 8b 8d 2c bb ff ff mov 0xffffbb2c(%ebp),%ecx 80482e5: 80 39 02 cmpb $0x2,(%ecx) 80482e8: 0f 85 ca 0b 00 00 jne 8048eb8 80482ee: 81 fe c8 00 00 00 cmp $0xc8,%esi 80482f4: 0f 8e be 0b 00 00 jle 8048eb8 80482fa: 8b 95 20 bb ff ff mov 0xffffbb20(%ebp),%edx 8048300: 52 push %edx 8048301: 8b 8d 28 bb ff ff mov 0xffffbb28(%ebp),%ecx 8048307: 51 push %ecx 8048308: 8d 46 ea lea 0xffffffea(%esi),%eax 804830b: 50 push %eax 804830c: e8 d7 1e 00 00 call 804a1e8 8048311: 83 c4 0c add $0xc,%esp 8048314: 0f b6 85 01 f0 ff ff movzbl 0xfffff001(%ebp),%eax 804831b: 48 dec %eax 804831c: 83 f8 0b cmp $0xb,%eax 804831f: 0f 87 93 0b 00 00 ja 8048eb8 8048325: ff 24 85 2c 83 04 08 jmp *804832c(,%eax,4) 804832c:After saving the number of bytes read to esi, we examine byte 9 from bb30 which was initialized to start of the receive buffer f800. The ninth byte in the IP packet header (it's a raw socket so we get everything) is the protocol [add reference here]. If it is not 11 skip down to the bottom. Likewise bb2c points to the byte at 0x14. This is the start of the IP payload. That byte must be equal to 2 and the number of bytes read must be greater than 200 (0xc8) bytes or again the program skips down to the bottom.
After we are sure we want to continue, we set up a call to 804a1e8. That address is close enough, it might be one of the authors routines rather than a library routine. The first argument is si-0x16, using a common technique of doing additions with the lea command. bb28 points to the byte 0x16 in from the start of the buffer at f800, and the last argument bb20 points to f000. Let look at what is being passed. 0x16 bytes in from the receive buffer is just past the (byte) 2 we were looking for. Those 0x16 bytes are also subtracted from the number of bytes read for the first argument. Immediately up returning we use a byte from the area pointed to by the last argument and subsequently other bytes from that area. This looks like it may be the decode subroutine. We will check into that soon.
As noted above, the second byte from the return buffer. The next 3 statement subract one from that byte and make sure the result is not bigger that 11 (0xb). The jmp then uses that as an index into a jump table that immediately follow. Therefore the next 48 (12 addrs x 4 bytes/addr) bytes need to be interpretted as address not as instructions. This is a standard way if implementing a switch statement. Taking into account the decrement the values being looked for are 1 to 12. These will likely correspond to different commands. I have annotated the display below with the command numbers and actuall address.
804832c: 5c 83 04 08 ; 1 0804835c 8048330: f0 83 04 08 ; 2 080483f0 8048334: 90 85 04 08 ; 3 08048590 8048338: 1c 87 04 08 ; 4 0804871c 804833c: c8 87 04 08 ; 5 080487c8 8048340: 94 88 04 08 ; 6 08048894 8048344: cc 8a 04 08 ; 7 08048acc 8048348: 58 8b 04 08 ; 8 08048b58 804834c: 80 8b 04 08 ; 9 08048b80 8048350: 34 8c 04 08 ; 10 08048c34 8048354: 08 8d 04 08 ; 11 08048d08 8048358: e4 8d 04 08 ; 12 08048de4Time prevents going into further detail of all of the disassembly analysis. One function that I have converted to C like syntax is the function at 0x8049564 that implements the DNS2 function. It can be found in the files as dns2.c.
The full details of the commands and their formats are in command.html. I also implemented a tool to test the command formats by sending them to the program. Those tests are discussed in test.html