strings (Linux)
file (Linux)
objdump (Linux)
grep (Linux)
calc.exe (Windows)
ultra-Edit 32 v8.0 (Windows)
TLD.pl - Our custom disassembler (Linux)
IA-32 Intel Architecture Software Developer’s Manual Volume 2: Instruction Set Reference
RFC 791 – IP Protocol Specification
The first step after retrieving the binary to a Linux machine, was to run the ‘file’ command on it, to identify the type of file. ‘file’ reported the following:
the-binary: ELF 32-bit LSB executable, Intel 80386, version 1, statically linked, stripped
This presented a dilemma. Because the binary was statically linked, this meant all functions were compiled into the program rather than being dynamically linked by the system. Because the binary was stripped, the symbols from the binary were removed.
Symbols and dynamically linked functions can both be used to identify the names of functions in an assembly listing. This would allow us to read significant portions of the code in assembly intermixed with named calls to known functions, making the task of reverse engineering significantly easier.
Using ‘strings’ on the binary revealed the following useful piece of information.
@(#) The Linux C library 5.3.12
It was decided to try identifying the library functions by byte comparisons of binaries of from libc5.3.12. A copy of libc-5.3.12.bin.tar.gz was retrieved, and extracted. An arbitrary object file extracted from the libc package, socket.o was used to test this theory.
Using the
command ‘objdump –xd socket.o’, the following output was printed for the
libc function ‘socket’:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 83 ec 0c sub $0xc,%esp
6: 53 push %ebx
7: 8b 55 0c mov 0xc(%ebp),%edx
a: 8b 4d 10 mov 0x10(%ebp),%ecx
d: 8b 45 08 mov 0x8(%ebp),%eax
10: 89 45 f4 mov %eax,0xfffffff4(%ebp)
13: 89 55 f8 mov %edx,0xfffffff8(%ebp)
16: 89 4d fc mov %ecx,0xfffffffc(%ebp)
19: ba 01 00 00 00 mov $0x1,%edx
1e: 8d 4d f4 lea 0xfffffff4(%ebp),%ecx
21: b8 66 00 00 00 mov $0x66,%eax
26: 89 d3 mov %edx,%ebx
The byte values on the left were entered into the hex searcher in Ultra-Edit, and were found at and only at location 0xECF4. This meant we had a hex match for the function, from the information that had been provided by objdump. By looking at the header output of ‘objdump’ against our binary image, we could see that this location was in the .text section. This was clear because the location 0xECF4 was contained between File Offsets between the .text (0x00000090) and libc_subinit (0x0001f5cc) sections. This memory location would be relocated to 0x08048090+0xECF4-0x90 (VMA + bytematch – File off.) when the binary image was loaded into memory.
Sections:
Idx Name Size VMA LMA File off Algn
0 .init 00000008 08048080 08048080 00000080 2**4
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 .text 0001f53c 08048090 08048090 00000090 2**4
CONTENTS, ALLOC, LOAD, READONLY, CODE
2 __libc_subinit 00000004 080675cc 080675cc 0001f5cc 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
Breaking out ‘calc.exe’ in scientific mode we find that this gives the address 0x8056cf4. Using ‘objdump –dx the-binary | grep 0x8056cf4’ back on our Linux machine we found the following:
8048262: e8 8d ea 00 00 call 0x8056cf4
8048906: e8 e9 e3 00 00 call 0x8056cf4
8048fa9: e8 46 dd 00 00 call 0x8056cf4
8049213: e8 dc da 00 00 call 0x8056cf4
8049657: e8 98 d6 00 00 call 0x8056cf4
8049ac7: e8 28 d2 00 00 call 0x8056cf4
8049e22: e8 cd ce 00 00 call 0x8056cf4
804a602: e8 ed c6 00 00 call 0x8056cf4
804ebf8: e8 f7 80 00 00 call 0x8056cf4
804eee6: e8 09 7e 00 00 call 0x8056cf4
8055312: e8 dd 19 00 00 call 0x8056cf4
8063baf: e8 40 31 ff ff call 0x8056cf4
806435e: e8 91 29 ff ff call 0x8056cf4
We were on our way to producing an unstrip tool! This allowed us to identify arbitrary functions in the program by their byte signatures. At this point we began to automate this process in perl, the eventual result of these efforts can be found in the archive accompanying this text.
The output of this newly produced disassembly, containing almost all used libc functions replaced into the text, was then manually reviewed, and edited with comments and notes being taken. These notes can also be found in the accompanying archive. As the binary was not packed or encrypted, this was an arduous but relatively simple task.
During the entire process of disassembly the binary was not executed. The disassembly tool, tld.pl was created specifically for the reverse challenge, during the period of the challenge.