Abstract
The Honeynet Project's Scan of the Month for November 2002 requires the analysis of a file obtained from a compromised honeypot. The file turns out to be a gzip-compressed GNU tar archive that contains two C source files. I found out that these files contain the source code for a variant of the Slapper worm that hit the Internet on September 13, 2002 and that exploited the OpenSSL SSLv2 malformed client key remote buffer overflow vulnerability. In this paper I examine how the worm operates, what its capabilities are, and how it propagates and infects other machines.
Table of Contents
This is a submission to the Honeynet Project November 2002 Scan of the Month. Here I analyze a variant of the Slapper worm that hit the Net on September 13, 2002 and that exploited the OpenSSL SSLv2 malformed client key remote buffer overflow vulnerability.
The analysis is in some parts very detailed. If you are a grader, have lots of submissions to read, and can't go over all the details, or are just a casual reader, feel free to go directly to Section 5, which contains the answers to all the questions of this challenge. The questions (and answers) provide a good summary of the must important aspects of the worm. However, just reading the answers is not the best way to understand some of the details nor the process I followed to analyze the worm, so I would encourage you to read the whole submission.
The first thing I need to do after downloading the only file (called .unlock) that the Honeynet Project has given us is to determine what type of file I am dealing with. The easiest way to do this is by running the Unix file command on it:
peloy@canaima:~$ file .unlock .unlock: gzip compressed data, from Unix |
This tells me that the file was compressed using Lempel-Ziv coding (LZ77). Now I can re-run the file command but this time specifying the -z switch to try to look inside compressed file:
peloy@canaima:~$ file -z .unlock .unlock: GNU tar archive (gzip compressed data, from Unix) |
The file command is telling me that the compressed file contains a GNU tar archive.
Finally, to find out the date in which the .unlock file was generated (information we will need to answer one of the Scan of the Month questions) I can use the ls command. As we can see below, the .unlock file was created on September 22, 2002 at 1:06 PM (we don't know the time zone).
peloy@canaima:~$ ls -l .unlock -rw-r--r-- 1 peloy peloy 17973 2002-09-22 13:06 .unlock |
With this new information we can now decompress (gzip's -d switch) the file to standard output (gzip's -c switch) and pipe the output to the tar command. We use the tar command's -t switch to list the contents of the archive:
peloy@canaima:~$ gzip -dc .unlock | tar tvf - -rw-r--r-- root/wheel 70981 2002-09-20 09:28:11 .unlock.c -rw-r--r-- root/wheel 2792 2002-09-19 17:57:48 .update.c |
Bingo! Now we know that we might be dealing we two C source files, one called .unlock.c and the other one called .update.c. We can even see the dates these two files were last modified. To extract the contents of the archive we just need to run the tar command with the -x switch.
Analysis of this month's Scan of the Month is a lot easier than analysis of previous Scans of the Month and Honeynet Project's Challenges like Scan 22 and the Reverse Challenge. The reason it is easy to analyze this month's Scan of the Month is because we are getting the actual source code of the program we are concerned with. In the two other challenges I mentioned above, it was pretty hard to do the analysis because we were only given the binaries (executable files), so we needed to reconstruct symbol tables and decompile the programs. This took a considerable amount of time given that the process is highly manual and there are not good tools for reverse-engineering Unix binaries (save Dion Mendel's tools, which I use for Scan 22.)
To analyze what the worm does, how it propagates to other machines, how it operates, what capabilities it offers, and other details, I will go over the worm's source code. The format I will use will present a source code segment with callouts to comments that follow the code and that explain different features of the code segment. The number right before the comment is a hypelink, and clicking on it will take you to the specific line the comment refers to.
Just to provide a general idea or 20,000 feet view, the program structure is something like:
main() { initialize(); while (1) { select(timeout=2secs); every_60secs_task; every_3secs_task; every_10mins_task; scan_and_infect(); peer_to_peer_network_housekeeping; switch (command) { command 1: handle_command1; break; command 2: handle_command2; udp DoS: do_udp_flood; tcp DoS: do_tcp_flood; dns DoS: do_dns_flood; . . . etc. } } } |
I'll go over each one of these parts in the next sections.
In the lines following the call to the audp_listen() function (lines 1785 to 1798) several structures used by the worm are initialized. One interesting structure that is initialized here is the array of IP addresses cpbases, which is initialized with the list of IP addresses that is passed to the worm on the command line:
1799 dup2(null,0); dup2(null,1); dup2(null,2); 1800 if (fork()) return 1; |
In line 1802 the worm calls the function mailme(), which does the following:
| |
Line 1803 just wipes out the string pointed by argv[0], which is the program name. The name is wiped out by writing zeroes to each byte in the string. | |
Line 1804 also wipes a string, but in this case the one pointed by argv[1], which is the first parameter passed to the worm when it was invoked. | |
In line 1805 the worm tries to obfuscate the program name by overwriting argv[0] with the string "httpd ". This way an administrator running the ps would think that a HTTP server process is running. | |
As we will see later, the worm scans other networks to try to find other vulnerable hosts to which it can spread. In line 1807 the worm initializes the first two 16 bits of the IP networks it will scan. It does this by choosing a random number from the classes[] array for the first octect, and by choosing a completely random value for the second octect. | |
Finally, in line 1808 the worm assigns the signal handler nas() to signals SIGCHLD and SIGHUP. nas() does not do anything so in fact these two signals are ignored if they are received. |
Here ends the initialization section of the worm's code. In the next section I will go in detail over the main loop of the worm.
The main loop begins in line 1809. It is a big "while" loop that never exits. The first thing the worm does inside the main loop is to set a file descriptor set (stored in the variable read, declared as fd_set inside the main loop) so several sockets can be monitored with the select() function call:
After select() is called in line 1830, the worm will execute three pieces of codes depending on whether specific time intervals have elapsed. The first piece of code is executed every 60 seconds, the second will be executed every 3 seconds, and the third every 10 minutes.
The code that is executed every 60 seconds is the following:
If the worm does not know of other peers in the network (that is, if the variable links is NULL or if the variable numlinks is 0) the worm will do the same thing it did in line 1796 (see Section 3.1 above), which is to send a packet with the data {tag=0x70, id=0, len=0} to another IP in the virtual network. Note that there is an off-by-one bug in this loop: the array cpbases[] was initialized with argc elements, but the loop runs from 1 to argc + 1 (bases equal argc + 1 because of a previous operation.) The result is that there is one extra UDP packet that is sent, but since it goes to 0.0.0.0 it ends up going to the local machine. | |
If the worm does not know of other peers in the network and will if the worm knows its IP address (variable myip is not zero) the worm send a packet with the data {tag=0x74, id=0, len=0}. |
The code that is executed every 3 seconds (lines 1869 to 1893) handles the sending of messages in the message queue. This is because the worm maintains a queue of messages to send to other peers it knows about. In some cases the worm just sends the messages immediately but in others messages are just queued for later transmission.
The code that is executed every 6 seconds (lines 1896 to 1900) just sends information about the peer-to-peer network of infected machines from the point of view of the sending machine to a random peer. The work is done by the function broadcast()
In lines 1903 and 1905, the worm just checks if any of the sockets select() is watching has any data, i.e. if data has been received. If there is data these two lines just set the len of the structures the worm uses to keep track of connection state to AREAD.
Next, in line 1907, and extending to line 1938, the worm searches for remote machines it can infect. I go into the details of how the worm does this in Section 3.3.
After worm propagation has been taken care of, the worm seem to do some housekeeping tasks related to the peer-to-peer network. There can be up to 128 peers the worm is in touch with, and from line 1939 to line 2006, the worm performs tasks like adding and deleting peers to and from the internal list that keeps track of all connections. This list is stored in the array clients. I must confess that due to lack of time I did not go into the details of how the peer-to-peer network capabilities of the worm work.
Finally, in line 2008 the last logical section of the main loop is started. This last section will read any command read from UDP port 4156, and process it. One of the features of the worm is that it provides backdoor capabilities that support a variety of tasks. For example, people that know that the worm is executing on a specific machine can request the machine to launch UDP, TCP or DNS Denial of Service attacks against any specific host, can request that the worm runs a command on the infected machine, that the worm scans all files in the infected machine and send back a list of all e-mail addresses found, etc.
Appendix B contains a list of the commands the worm understands.
There are three aspects to the propagation of the worm to other machines: 1) search of remote machines to exploit (scanning of remote machines), 2) exploitation of a known vulnerability to get access to the remote host, and 3) replication of the worm to successfully compromised remote machines. I will go over each one of these aspects in the following sections.
As we shall see, the way the worm scans for other vulnerable hosts is simple. The code that scans remote hosts the worm can exploit begins in line 1907 and extends until line 1938. Here's what the scanning code does:
The function exploit() is very important because that is the one that launches an exploitation attempt, and spreads the worm if the attack is launched and is successful. The function begins in line 1697. Let's see what this function does (I'll just include the most important parts of it):
The first thing exploit() does is to call the GetAddress() function. GetAddress() in turns establishes a TCP connection with port 80 of the remote host. Once the connection is established, it sends the bogus string "GET / HTTP/1.1\r\n\r\n". It is bogus because it should not send a second "\r\n" pair since HTTP version 1.1 requires sending "Host: <hostname>" after the "GET" request. But this doesn't matter because the end goal is to make the remote web server spit an error message that can be used to identify what the web server software is!!!. When the remote web server spits the error message, GetAddress() will look for the string "Server: xxxx", and return a pointer to "xxxx". The idea is that exploit() will then decide, based on whether the remote web server is Apache, if it will launch the exploitation attempt. An example of what a remote web server running Apache would return is:
| ||
If the remote host is not running Apache, the the worm gives up and exits (but remember that this is a forked process, and the parent is still running and scanning other hosts.) | ||
Here the worm is trying to maximize its chances of success by determining what specific version of Apache, and what Linux distribution, are running on the remote host. By identifying precisely these two the worm can tune one exploit-specific parameter that will improve the chances of success. The different "architectures" the worm knows about are stored in the architectures[] array. There are 23 different combinations of Linux distributions and Apache versions. The Linux distributions are Gentoo, Debian, RedHat, SuSE, Mandrake, and Slackware. Apache versions range from 1.3.6 to 1.3.26. The actual list of architectures is:
| ||
Notice here that if the worm can accurately identify a specific architecture, it defaults to the 9th entry in the architectures array, which is RedHat and Apache 1.3.23. | ||
Here the worm is getting into exploit-specific territory: the worm exploits the vulnerability of OpenSSL that was announced by the CERT/CC on July 30, 2002 (see the references in Appendix C for links to online documents that contain more details). So, the worm needs to connect to port 443 (HTTPS) to be able to exploit the vulnerability. What the worm is doing here is attempting to open a connection to port 443 of the remote host. It will retry 20 times, at 100 milliseconds between retries. Inside the function connect_host() we can see that if the connection fails, the worm will exit. | ||
Between lines 1724 and 1753 is where the actual exploitation of the OpenSSL vulnerability takes place. I won't go in details here because this is better explained elsewhere (again, please refer to Appendix C for links to online documents that explain well the details of this vulnerability.) | ||
Finally, here the remote host has been compromised! The worm is in and now is time to perpetuate the species!!! The function sh() is the responsible of propagating the worm to the compromised machine. I will explain what this function does in the next section. |
Once a vulnerable host has been successfully compromised it is time for the worm to preserve the species. Worm propagation is done by the sh() function, which is called after the OpenSSH attack has been successful. At this point, the worm has an open shell in the remote host, and is ready to start sending the commands that will propagate the worm. Let's see what it does in detail:
With this, I have finally covered all the details regarding how the worm propagates to other machines.
After I studied the worm and had a good understanding of what it does I decided to have a little bit of fun with it. For this I ran the worm on a machine connected to a network that is disconnected from the Internet. I also wrote a small C program to control the worm. The control program allowed me to send commands to the worm, and then I observed the activity generated by the worm with a sniffer like tcpdump.
Using the control program and observing the network traffic I was able to discover some of the bugs I have pointed out elsewhere in this document.
Here's an example of how the control program looks:
Agent address is 10.10.10.16 [0] Enter agent IP address [1] Run a command on the agent [2] UDP flood [3] TCP flood [4] DNS flood [5] Scan remote files for e-mail addresses [6] Exit Enter option: |
Q: Which is the type of the .unlock file? When was it generated?
A: As I showed in Section 2, the .unlock file is a compressed GNU tar archive. The tar archive contains only two files, called .unlock.c and .update.c, which are C source files. The compressed GNU tar archived was generated on September 9, 2002 at 1:06 PM.
Q: Based on the source code, who is the author of this worm? When it was created? Is it compatible with the date from question 1?
A: By examining the .unlock.c file we can guess that the author of the worm is an individual that uses the IRC alias contem on the EFNet IRC network. We can see that in the first lines of the program:
1 /****************************************************************** 2 * * 3 * Peer-to-peer UDP Distributed Denial of Service (PUD) * 4 * by contem@efnet * 5 * * [...] |
However, the file we are looking at was modified by an individual that goes by the alias "aion", and whose e-mail address is aion@ukr.net.
[...] 37 * * 38 * some modification done by aion (aion@ukr.net) * 39 ******************************************************************/ [...] |
The .unlock.c file was generated on September 20, 2002 at 9:28 AM, and the .update.c file was generated on September 19, 2002 at 5:57 PM. Since the compressed tar archive was generated two days later (on September 22, 2002) I would say that the creation dates of the C source files are compatible with the creation date of the tar archive (in addition to the timestamps of the files, in the .unlock.c file, the symbol VERSION is declared in line 71 as "20092002", which seems to imply that the version number was chosen based on the day the code was released - September 20, 2002.)
Q: Which process name is used by the worm when it is running?
A: Line 78 of .unlink.c contains the following symbol definition:
78 #define PSNAME "httpd " |
As I mentioned in Section 3.1, in lines 1803-1804, very close to the beginning of main(), the worm clears the program name as well as all the parameters passed through the command line by zeroing out the strings pointed by the pointers in the argv array.
Then, in line 1805, the worm calls the strcpy() C library function to copy the symbol PSNAME to the string pointed by the argv[0] pointer, which happens to be the program's name.
The end result is that the worm will be obfuscating the name of its process so an administrator would only see the process name "httpd" when the ps command is run.
Q: In which format the worm copies itself to the new infected machine? Which files are created in the whole process? After the worm executes itself, which files remain on the infected machine?
A: As I explained in Section 3.3.3, the worm uses the command cat and the "here document" syntax of the bash shell to copy the compressed GNU tar archive that contains the worm's source code to the new compromised machine. Since the worm is just sending the data to the standard output of the remote shell, it can't just copy the tar archive as it is because it contains binary data that could be interpreted as shell meta-characters or terminal control data. For this reason, the worm uuencodes the file and transmit the resulting data, which is just regular ASCII text.
The files that are creating during the worm propagation process are:
After the worm executes itself in the remote machine, the only file that remains is /tmp/.unlock, the compressed GNU tar archive that contains the worm's source code. All other files are deleted.
Q: Which port is scanned by the worm?
A: The worm scans for TCP port 80, the Hyper Text Transfer Procotol port. If this port is found open, i.e. a TCP connection was successfully established, the worm proceeds to launch the exploit. The actual scan takes place in line 1923, where the worm executes atcp_sync_connect(&clients[n],srv,SCANPORT). SCANPORT is a symbol defined as "80" at the beginning of the worm's C source file.
Q: Which vulnerability the worm tries to exploit? In which architectures?
A: The worm exploits the OpenSSL SSLv2 malformed client key buffer overflow vulnerability, which, as we have seen, allows remote exploitation. I will not go into in details here since excellent references to this vulnerability are available on the web, and they explain the problem better than what I could. Check the references in Appendix C.
Once a host has been found to have port 80 open, the worm tries to exploit the vulnerability by launching an attack again the HTTPS port, which on most Apache implementations uses the OpenSSL libraries.
As for the "architectures" the worm tries to exploit, "architectures" is not the correct word (although that is the word used in the C source code.) The exploit the worm uses works only on the Intel i386 family, no Sparcs, no PowerPCs, no ia64, no anything else (the worm will try to exploit all other architectures as long as it finds open TCP port 80, but the exploit will not succeed.) Now, there are several "targets" the worm knows about and that guarantee the success of the exploitation. For these known "targets", the worm knows it can tune an exploitation parameter so the exploit succeeds. The different "targets" the worm knows about are stored in the architectures[] array. There are 23 different combinations of Linux distributions and Apache versions. The Linux distributions are Gentoo, Debian, RedHat, SuSE, Mandrake, and Slackware. Apache versions range from 1.3.6 to 1.3.26 (see Section 3.3.2 or line 1241 of .unlock.c for the actual declaration of the architectures[] array.)
Q: What kind of information is sent by the worm by email? To which account?
A: As I mentioned in Section 3.1, the worm sends an e-mail to the address <aion@ukr.net>. It sends the e-mail by establishing a direct TCP connection to port 25 (SMTP) of the host freemail.ukr.net, and by pretending to be <test@microsoft.com>.
The information sent by the worm is just:
hostid: (decimal number)
hostname: (string)
att_from: (string)
hostid and hostname are obtained via the gethostid() and gethostname() C library functions, and they refer to the host executing the worm. att_from is the only parameter passed to the mailme() function, and represents the first argument passed to the worm from the command like. This argument is an IP address.
Q: Which port (and protocol) is used by the worm to communicate to other infected machines?
A: The worm uses UDP port 4156 to talk to other peers. In the C source code, the symbol PORT is used, and it is defined as "4156" at the beginning of the C source file.
Q: Name 3 functionalities built in the worm to attack other networks.
A: The worm can be remotely programmed to generate three types of denial of service (DoS) attacks. The three types are UDP flood, TCP flood, and DNS flood. The UDP and TCP floods are intended to be used against any host, and the DNS flood is intended to be used against DNS servers since it sends DNS queries to the DNS port (UDP 53) of the specified IP address.
Because of the way the worm communicates with other infected machines, it is easy to use these attacks to create a major Distributed Denial of Service Attack (DDoS), where hundreds or thousands of machines create chaos by DoS'ing one or more hosts.
I personally tested the three attacks, as I mentioned in Section 4. The UDP and TCP attacks worked fine (well, the program is a bit buggy, but the attacks worked more or less.) The DNS attack seemed to have a bit of problems.
Q: What is the purpose of the .update.c program? Which port does it use?
A: .update.c is a little program written not by the original worm author but by aion <aion@ukr.net>, the (apparently 21-year old) person that modified the original worm, and that just provides a shell on demand on TCP port 1052. To get a shell on a machine running this program one needs to provide the password "aion1981" as soon as the TCP connection with port 1052 is established. This in theory, though, since the program as it is has a critical bug:
52 for(stimer=time(NULL);(stimer+UPTIME)>time(NULL);) 53 { 54 soc_cli = accept(soc_des, 55 (struct sockaddr *) &client_addr, sizeof(client_addr)); 56 if (soc_cli > 0) 57 { 58 if (!fork()) { |
The accept() function requires that the last parameter be a pointer to an integer that is initially set to the size of the struct sockaddr structure. In this case our buddy aion <aion@ukr.net>; is not passing a pointer but an integer directly. You need to be more careful when coding <aion@ukr.net>. |
Now, update does not provide a shell on demand on TCP port 1052 of the host running the compiled version of .update.c at all times: the server is programmed to listen for just 10 seconds and the shuts down for 5 minutes. See next question for details.
There isn't really anything else to say about .update.c. It is a very small program that can be understood in 2 minutes. It is pretty obvious what it does.
Q (Bonus Question) What is the purpose of the SLEEPTIME and UPTIME values in the .update.c program?
A: SLEEPTIME is a symbol defined at the beginning of the file as "300", and UPTIME is another symbol defined as "10". As I mentioned in the previous question, when update is run, it will open TCP port 1052 and will provide a shell on demand for UPTIME seconds. After UPTIME seconds have passed update will shut down the TCP server for SLEEPTIME seconds.
My guess is that this feature is provided to prevent system administrators from running the netstat and finding that a strange process is running on a non-standard port.
The following files where generated during this Scan of the Month:
Worm source code: .unlock.c and .update.c. I am not including these files here since it is very easy to generate them: just download the file provided for Scan of the Month November 2002 and follow the procedure I presented in Section 2
control.c: program that allows to control the worm analyzed in this document. Please note that not all commands are implemented, and that some commands are have bugs in the worm source, so they might not work at all.
XML sources for this document: this HTML document was generated using DocBook XML. This directory contains all the files used in the generation of this document.
The following table contains the worm commands that are provided as a backdoor. The source code contains a few comments that give an idea of what some of the commands do. Other commands required study of the source code to be able to figure out what they do. I tested some of the commands by writing a small program that controlled remotely the backdoor.
Table B.1. Worm Commands
Command Code | Function Performed | Comments |
---|---|---|
0x20 | Get information | Information about current status of the worm (version, IP address, etc.) |
0x21 | Open a bounce | Related to the peer-to-peer network. I believe it allows the worm to proxy connections for another host |
0x22 | Close a bounce | |
0x23 | Send message to a bounce | |
0x24 | Run a command | The received packet includes, in addition to the 0x24 command code, the command that the attacker wants the infected machine to execute. The worm has code to send back the output of the command to a programmed (also in the received packet) IP address. However, there is a critical bug in the code that makes a forked worm process crash when it tries to zero 3000 bytes in an array that only holds about 12 bytes. The bug is due to declaration of a variable with the same name of another in another context, making it invisible from the current scope. I tested this code and the command is executed, although nothing is returned because of the bug. |
0x25 | Not implemented, does nothing | |
0x26 | Route | Seems related to management of the peer-to-peer network |
0x27 | Not implemented, does nothing | |
0x28 | List | Apparently, used to get a list of links to other infected machines. Seems related to management/monitoring of the peer-to-peer network |
0x29 | UDP flood | Starts a Denial of Service against another host. UDP is used and the IP address and port to use, as well as the duration of the attack, are specified in the packet. I tested this and it works. |
0x2a | TCP flood | Starts a Denial of Service attack against another host. TCP is used and the IP address and port to use, as well as the duration of the attack, are specified in the packet. I tested this at it works. |
0x2b | IPv6 TCP flood | Starts a Denial of Service attack against another host. This is for IPv6. It is not enabled in the worm source code (disabled with a #undef.) |
0x2c | DNS flood | Starts a Denial of Service attack against a DNS server. DNS requests are sent. The DNS server to use, as well as the duration of the attack, are specified in the packet. I tested this and the DNS query performed is broken. |
0x2d | E-mail scan | Runs find / -type f and searches every file found for e-mail addresses. Sends the addresses it finds to UDP port ESCANPORT (defined as 10100 at the beginning of the file) of a host specified in the incoming packet. |
0x70 | Incoming client | Handles registration of new infected machine |
0x71 | Receive the list | |
0x72 | Send the list | |
0x73 | Get my IP | |
0x74 | Transmit their IP | Sends the IP address of the incoming client to other registered clients |
0x41 to 0x47 | Relay to client | Resends received packet to all registered clients |
Information about the OpenSSL vulnerability exploited by the worm:
Advisory from CERT: CERT Advisory CA-2002-23 Multiple Vulnerabilities In OpenSSL.
Bugtraq information: OpenSSL SSLv2 Malformed Client Key Remote Buffer Overflow Vulnerability.
Media coverage of the worm:
A Wired article: Linux Worm Hits the Network
A CNET article: Slapper worm smarting less.
Understanding of the TCP/IP protocols and of Unix network programming using the BSD sockets API is necessary to understand a worm like the one I analyzed in this paper. The following are my favorite books on these topics:
Stevens, W.R. TCP/IP Illustrated Vol 1. 1994 Addison Wesley.
Stevens, W.R. Unix Network Programming Vol 1. 2nd Ed. 1998 Prentice Hall.
Thanks to ...
Chapu for being so patient while her husband was lost in bits, bytes and lines of C source code, and for bringing so much joy to my life. This is dedicated to you; you deserve it a thousand times.
The Honeynet Project for coming up with these highly educational exercises, and for taking the time to go over all the submissions. That must be a lot of work! Keep 'em coming!