Chapter 3. Analysis

Chapter 3. Analysis
Prev		Next

Introduction

Reverse engineering undocumented, untrusted and possibly hostile code is a demanding task. The security community needs more knowledge oh the tools and techniques for reverse engineering and I hope that this challenge would be a good way to introduce more people to this skill set.

Before I start with my analyis I would like to point out a great resource for reverse engineers. Ironically most of the publicly available information about reverse engineering is available at sites dedicated to software piracy and cracking software protection schemes. For years the best resource for crackers was Fravia's website While most of the articles on the site deal with cracking registration codes for shareware programs, the collection of tutorials and well documented techniques for reverse engineering makes it invaluable for the aspiring reverse engineer.

The Tools

The two main approaches to program analysis are the active approach and the passive approach. The active approach involves running the program and monitoring its interaction with the environment. An excelent tool for doing this under Linux is Fenris by Michal Zalewski. The passive approach involves disassembling the program and figuring out the entire program logic before running it. This approach is slower, but the analysis is more thorough. Which approach you would chose is a matter of personal preference the nature of the program. Since the Honeynet binary is a complex program and we have absolutely no idea about what it does, I decided to go with the passive approach.

The commercially available disassembler is IDA Pro. Unfortunately none of the open source tools come even close to its functionality (but the Bastard Disassembly Environemnt project is worth keeping an eye on). IDA Pro supports many different CPU architectures and file formats. Its analysis engine identifies subroutines, local and global variables, Linux system calls and library routines via fingerprinting. The last feature is very important for the analysis of the Honeynet binary.

A evaluation version of IDA Pro is available for download on the Datarescue website. It supports x86 and ELF files. If you are new to IDA, read the Getting Started Manual and Gij's IDA Tutorial. A lot of information about IDA is also available at Fravia's website. IDA Pro does not have a Linux version, but runs fine under Wine.

FLAIR

FLAIR is a collection of tools for generating library function signatures which are later used by IDA to identify library functions in statically compiled binaries. FLAIR38 used to be available for free download from Datarescue, but the latest version is not. You can still find it with Google.

Rpat UNIX Libraries Preprocessor for IDA Pro

The FLAIR signature generator processes .PAT files, generated from the libraries with a preprocessor. The original FLAIR package includes preprocessors only for Dos and Windows libraries. Rpat has written an ELF library preprocessor for FLAIR. Read the article in Russian or English. Source included.

Analysis

Fun with strings

We'll start the analysis by running objdump -t on the binary.

$ objdump -t the-binary

the-binary:     file format elf32-i386

objdump: the-binary: no symbols

The program is statically linked. The output of strings contains the line

@(#) The Linux C library 5.3.12

which indicates the libc version the program is linked with. A quick search on Google shows that Redhat 4.x used this version of libc. Download all libc-devel packages from RedHat 4.x. They will be useful later.

Because the binary is statically linked, the output of strings shows too many strings from libc code. libstrings.pl is a quick and dirty Perl script that runs strings on all .o files from libc.a stores the strings in a hash. Then the scripts prints out all strings found in the binary, prefixing the ones that occur in libc.a with the name of the .o file where they were found. We can filter the output of the script to only see the strings that are part of the program code, not the library functions. There are very few of them, and some look very promising:

                    [mingetty]
                    /tmp/.hj237349
                    /bin/csh -f -c "%s" 1> %s 2>&1
                    TfOjG
                    /sbin:/bin:/usr/sbin:/usr/bin:/usr/local/bin/:.
                    HISTFILE
                    linux
                    TERM
                    /bin/csh -f -c "%s" 
                    %c%s

mov edx, 1
lea ecx, [ebp+var_C]
mov eax, 66h
mov ebx, edx
int 80h             ; LINUX - sys_socketcall

The fastcall calling convention puts the syscall number (0x66) in eax, and the first syscall parameter in ebx. This parameter specifies the network function we are calling and is defined in libc-5.3.12/include/sys/socketcall.h. Number 1 is SYS_SOCKET, so we can identify the function above as socket(). Search the disassembled code for all calls to sys_socketcall and identify the socket functions.

Example: Identifying a Libc Function

sub_8049174 is a function called by main(). It opens a raw socket and sends some data, so we can be certain that it is not a libc function. At address 08049278 we have a call to sub_80556CC. This function is between other libc function in the binary and calls lots of internal libc functions like __libc_sigprocmask, __sigaction, __libc_alarm, sigsuspend, etc. These two observations lead us to believe that it is also a libc function. We need to identify it. We open the cross-references window and look at each of the functions calling sub_80556CC. Unfortunately none of them are libc functions. If they were, we could figure out what sub_80556CC is just by looking at the source of the function calling it.

We'll try to find a function call with specific arguments in this function and then grep the libc source. A good example is .text:08055728

        push    0Eh
        call    __sigaction

A quick consultation with the sigaction manpage and /usr/include/linux turns this into a call of sigaction(SIGALRM, ...)

$ grep -r sigaction * | grep SIGALRM
libc/posix/sleep.c:  if (__sigaction (SIGALRM, &action, &oldaction) < 0)
libc/posix/sleep.c:    (void) __sigaction (SIGALRM, &oldaction, (struct sigaction *) NULL);
libc/posix/sleep.c:    (void) __sigaction (SIGALRM, &oldaction, (struct sigaction *) NULL);
libc/pwd/lckpwdf.c:	if (sigaction(SIGALRM, &act, &oldact) == -1)
libc/pwd/lckpwdf.c:	sigaction(SIGALRM, &oldact, NULL);
libc/pwd/lckpwdf.c:	sigaction(SIGALRM, &oldact, NULL);

We have to look at these two source files and try to identify the function by the sequence of subroutine calls. The first function I checked was sleep() and the source matched the disassembled code perfectly. We can rename sub_80556CC to 'sleep' and go back to sub_8049174 - the function that called sleep().

Example: Identifying a Global Libc Variable

During the analysis of one of the subroutines in the binary we came across the global variable dword_8078B14. The cross-reference window reveals that there are a lot of references to this variable. Go to the fourth reference, at .text:0804E647. It is in an unidentified function.

        mov ebx, dword_8078B14
        test    byte ptr dword_807854C, 2
        jz  short loc_804E682
        push    eax
        call    sub_80566A4
        push    eax
        mov ax, [ebp+arg_E]
        xchg    al, ah
        and eax, 0FFFFh
        push    eax
        mov eax, [ebp+arg_10]
        push    eax
        call    inet_ntoa
        add esp, 4
        push    eax
        push    esi
        push    offset aRes_sendSS_US ; "res_send: %s ([%s].%u): %s\n"
        push    edi
        call    _IO_fprintf
loc_804E682:                ; CODE XREF: Aerror+1C
        mov dword_8078B14, ebx

The string "res_send:" is found in libc/inet/res_send.c, line 134. The function is Aerror():

        static void
        Aerror(file, string, error, address)
        FILE *file;
        char *string;
        int error;
        struct sockaddr_in address;
        {
        int save = errno;

        if (_res.options & RES_DEBUG) {
            fprintf(file, "res_send: %s ([%s].%u): %s\n",
                string,
                inet_ntoa(address.sin_addr),
                ntohs(address.sin_port),
                strerror(error));
        }
        errno = save;
        }

We can identify dword_8078B14 as errno,

Results

The following files and tools were produces during the analysis of the binary:

the-binary Disassembly

the-binary.asm

This is the binary disassembly, produced by IDA Pro.

the-binary IDA Database

the-binary.idb

This is the IDA Pro disasembly database.

the-binary Server

the-binary.c

This is the decompiled source of the binary. It compiles but hasn't been tested. It is a good reference for the program architecture and features.

the-binary Traffic Decoder

decoder.c

The traffic decoder can be run on a live network interface or on a tcpdump file.

$ ./decoder -h
./decoder: invalid option -- h
the-binary Traffic Decoder
Syntax: the-binary [options]
  -i <iface>     Listens on a interface
  -r <dumpfile>  Reads in a tcpdump file

the-binary Client

client.c

The client can send commands to the backdoor. It supports all functions of the backdoor and has been tested with the real binary.

$ ./client 
the-binary Client
Syntax: the-binary-client <command> [options]
To change the IP addresses of the client and the backdoor, edit the source

Commands:
  init: initializes the client address list
    --type <type>             type of address list
    --ip <a.b.c.d>            client ip (if type=2, spiecify 10 addresses)
  status: returns status information
  kill: kills the DoS or shell process
    no parameters
  exec: execute a command and discard the output
    --cmd <string>            command line
  exec_output: execute a command and return the output
    --cmd  <string>           command line
  bind_shell: bind a shell on port 23281
    no parameters
  udp_flood: launch udp flood attack
    --src <a.b.c.d>           source ip address
    --dst <a.b.c.d>           destination ip address
    --hostname <hostname>     destination hostname
    --d_port <port>           destination port for the packet
  icmp_flood: launch icmp ping flood/smurf attack
    --src <a.b.c.d>           source ip address
    --dst <a.b.c.d>           destination ip address
    --hostname <hostname>     destination hostname
  syn_flood: launch syn flood attack
    --src <a.b.c.d>           source ip address (if not supplied, use a random ip)
    --dst <a.b.c.d>           destination ip address
    --hostname <hostname>     destination hostname
    --d_port <port>           destination port for the SYN packet
    --sleep_after <number>    sleep after number packets have been sent (optional)
  dns_flood: launch a dns query flood attack
    --src <a.b.c.d>           source ip address (if not supplied, use a random ip)
    --dst <a.b.c.d>           destination ip address
    --hostname <hostname>     destination hostname
    --s_port <port>           source port for the queries (optional)
    --sleep_after <number>    sleep after number packets have been sent (optional)
  dns_smurf: launch dns smurf attack
    --ip <a.b.c.d>            victim ip address
    --hostname <hostname>     victim hostname
    --s_port <port>           source port for the queries (optional)
    --sleep_after <number>    sleep after number packets have been sent (optional)

The client does not display the responses from the backdoor. Use the decoder in sniffing mode to see them.