Tools Used


strings (Linux)

file (Linux)

objdump (Linux)

grep (Linux)

calc.exe (Windows)

ultra-Edit 32 v8.0 (Windows)

TLD.pl - Our custom disassembler (Linux)

 

Documents Referenced


IA-32 Intel Architecture Software Developer’s Manual Volume 2: Instruction Set Reference

RFC 791 – IP Protocol Specification

 

 File Analysis


The first step after retrieving the binary to a Linux machine, was to run the ‘file’ command on it, to identify the type of file. ‘file’ reported the following:

 

the-binary: ELF 32-bit LSB executable, Intel 80386, version 1, statically linked, stripped

 

This presented a dilemma. Because the binary was statically linked, this meant all functions were compiled into the program rather than being dynamically linked by the system. Because the binary was stripped, the symbols from the binary were removed.

Symbols and dynamically linked functions can both be used to identify the names of functions in an assembly listing. This would allow us to read significant portions of the code in assembly intermixed with named calls to known functions, making the task of reverse engineering significantly easier.

 Using ‘strings’ on the binary revealed the following useful piece of information.

 

@(#) The Linux C library 5.3.12

 

It was decided to try identifying the library functions by byte comparisons of binaries of from libc5.3.12. A copy of libc-5.3.12.bin.tar.gz was retrieved, and extracted. An arbitrary object file extracted from the libc package, socket.o was used to test this theory.

Using the command ‘objdump –xd socket.o’, the following output was printed for the libc function ‘socket’:

 

   0:   55                      push   %ebp

   1:   89 e5                   mov    %esp,%ebp

   3:   83 ec 0c                sub    $0xc,%esp

   6:   53                      push   %ebx

   7:   8b 55 0c                mov    0xc(%ebp),%edx

   a:   8b 4d 10                mov    0x10(%ebp),%ecx

   d:   8b 45 08                mov    0x8(%ebp),%eax

  10:   89 45 f4                mov    %eax,0xfffffff4(%ebp)

  13:   89 55 f8                mov    %edx,0xfffffff8(%ebp)

  16:   89 4d fc                mov    %ecx,0xfffffffc(%ebp)

  19:   ba 01 00 00 00          mov    $0x1,%edx

  1e:   8d 4d f4                lea    0xfffffff4(%ebp),%ecx

  21:   b8 66 00 00 00          mov    $0x66,%eax

  26:   89 d3                   mov    %edx,%ebx

 

The byte values on the left were entered into the hex searcher in Ultra-Edit, and were found at and only at location 0xECF4. This meant we had a hex match for the function, from the information that had been provided by objdump. By looking at the header output of ‘objdump’ against our binary image, we could see that this location was in the .text section. This was clear because the location 0xECF4 was contained between File Offsets between the .text (0x00000090) and libc_subinit (0x0001f5cc) sections. This memory location would be relocated to 0x08048090+0xECF4-0x90 (VMA + bytematch – File off.) when the binary image was loaded into memory.

 

Sections:

Idx Name          Size      VMA       LMA       File off  Algn

  0 .init         00000008  08048080  08048080  00000080  2**4

                  CONTENTS, ALLOC, LOAD, READONLY, CODE

  1 .text         0001f53c  08048090  08048090  00000090  2**4

                  CONTENTS, ALLOC, LOAD, READONLY, CODE

  2 __libc_subinit 00000004  080675cc  080675cc  0001f5cc  2**2

                  CONTENTS, ALLOC, LOAD, READONLY, DATA

 

Breaking out ‘calc.exe’ in scientific mode we find that this gives the address 0x8056cf4. Using ‘objdump –dx the-binary | grep 0x8056cf4’ back on our Linux machine we found the following:

 

 8048262:       e8 8d ea 00 00          call   0x8056cf4

 8048906:       e8 e9 e3 00 00          call   0x8056cf4

 8048fa9:       e8 46 dd 00 00          call   0x8056cf4

 8049213:       e8 dc da 00 00          call   0x8056cf4

 8049657:       e8 98 d6 00 00          call   0x8056cf4

 8049ac7:       e8 28 d2 00 00          call   0x8056cf4

 8049e22:       e8 cd ce 00 00          call   0x8056cf4

 804a602:       e8 ed c6 00 00          call   0x8056cf4

 804ebf8:       e8 f7 80 00 00          call   0x8056cf4

 804eee6:       e8 09 7e 00 00          call   0x8056cf4

 8055312:       e8 dd 19 00 00          call   0x8056cf4

 8063baf:       e8 40 31 ff ff          call   0x8056cf4

 806435e:       e8 91 29 ff ff          call   0x8056cf4

 

We were on our way to producing an unstrip tool! This allowed us to identify arbitrary functions in the program by their byte signatures. At this point we began to automate this process in perl, the eventual result of these efforts can be found in the archive accompanying this text.

The output of this newly produced disassembly, containing almost all used libc functions replaced into the text, was then manually reviewed, and edited with comments and notes being taken. These notes can also be found in the accompanying archive. As the binary was not packed or encrypted, this was an arduous but relatively simple task.

During the entire process of disassembly the binary was not executed. The disassembly tool, tld.pl was created specifically for the reverse challenge, during the period of the challenge.

 

dc mov %esi,0xffffffdc(%ebp)