The Honeynet Project: Scan of the Month #16

Overview
Analysis
Questions and Answers

Overview

This month's challenge was to decrypt and analyze a file which was found on a compromised system.

The information given is:

The host was compromised in March, 2001
The compromised host was running Solaris
The file and encryption method are part of a "security toolkit"
A copy of the file itself

Analysis of the file

In this analysis, I trace my my steps as I attempt to gather enough information to answer the questions from the challenge. The questions and my answers are included below.

Retrieve the file

The first step is to download and verify the file.

$ wget http://project.honeynet.org/scans/scan16/somefile.tgz
$ md5sum somefile.tgz
f7964d9860cbf8135ef64bcf5b96facb somefile.tgz

The hash of the downloaded file checked out ok, so I proceeded to untar it.

$ tar xzvf somefile.tgz
somefile

Initial inspection

Since the file has lots of unprintable characters in it, it is easier to look at in hex.

$ od -t x1 somefile
0000000 a4 99 96 93 9a a2 f5 99 96 91 9b c2 d0 9b 9a 89
0000020 d0 8f 8b 8c d0 cf ce d0 9d 96 91 d0 99 96 91 9b
0000040 f5 9b 8a c2 d0 9b 9a 89 d0 8f 8b 8c d0 cf ce d0
0000060 9d 96 91 d0 9b 8a f5 93 8c c2 d0 9b 9a 89 d0 8f
0000100 8b 8c d0 cf ce d0 9d 96 91 d0 93 8c f5 99 96 93
0000120 9a a0 99 96 93 8b 9a 8d 8c c2 cf ce d3 93 9d 93
0000140 96 9d 8f 8c d1 8c 90 d3 8c 91 d1 93 d3 8f 8d 90
0000160 92 d3 9c 93 9a 9e 91 9a 8d d3 9b 90 8c d3 8a 9c
0000200 90 91 99 d1 96 91 89 d3 8f 8c 9d 91 9c d3 93 8f
0000220 9e 9c 9c 8b d3 aa ac ba ad f5 f5 a4 8f 8c a2 f5
0000240 8f 8c c2 d0 9b 9a 89 d0 8f 8b 8c d0 cf ce d0 9d
0000260 96 91 d0 8f 8c 8d f5 8f 8c a0 99 96 93 8b 9a 8d
0000300 8c c2 93 8f 8e d3 93 8f 8c 9c 97 9a 9b d3 8c 97
0000320 ce 8b d3 8f 8c 8d d3 8c 8c 97 9b cd d3 93 8f 8c
0000340 9a 8b d3 93 8f 9e 9c 9c 8b d3 9d 91 9c 93 8f d3
0000360 93 8f 8c 86 8c f5 93 8c 90 99 a0 99 96 93 8b 9a
0000400 8d 8c c2 93 8f d3 8a 9c 90 91 99 d1 96 91 89 d3
0000420 8f 8c 91 96 99 99 d3 8f 8c 8d d3 c5 ce cc cf cf
0000440 cf d3 c5 cd ca cf cf cf d3 c5 c9 c9 c9 c7 d3 c5
0000460 c9 c9 c9 c8 d3 d0 9b 9a 89 d0 8f 8b 8c d0 cf ce
0000500 d3 8c 91 d1 93 d3 8f 8d 90 92 d3 93 8c 90 99 d3
0000520 8f 8c 9d 91 9c f5 f5 a4 91 9a 8b 8c 8b 9e 8b a2
0000540 f5 91 9a 8b 8c 8b 9e 8b c2 d0 9b 9a 89 d0 8f 8b
0000560 8c d0 cf ce d0 9d 96 91 d0 91 9a 8b 8c 8b 9e 8b
0000600 f5 91 9a 8b a0 99 96 93 8b 9a 8d 8c c2 cb c8 cf
0000620 ce c7 d3 c9 c9 c9 c7 f5 f5 a4 93 90 98 96 91 a2
0000640 f5 8c 8a a0 93 90 9c c2 d0 9b 9a 89 d0 8f 8b 8c
0000660 d0 cf ce d0 9d 96 91 d0 8c 8a f5 8f 96 91 98 c2
0000700 d0 9b 9a 89 d0 8f 8b 8c d0 cf ce d0 9d 96 91 d0
0000720 8f 96 91 98 f5 8f 9e 8c 8c 88 9b c2 d0 9b 9a 89
0000740 d0 8f 8b 8c d0 cf ce d0 9d 96 91 d0 8f 9e 8c 8c
0000760 88 9b f5 8c 97 9a 93 93 c2 d0 9d 96 91 d0 8c 97
0001000 f5 f5 8c 8a a0 8f 9e 8c 8c c2 93 cc cc 8b 97 cb
0001020 87 cf 8d f5
0001024

Notice that there is a great deal of repetition in the file, which can aid in cryptalanysis. For example, the sequence "c2 d0 9b 9a 89 d0 8f 8b 8c d0 cf ce d0 9d 96 91 d0" appears 8 times.

Frequency analysis

At this point, I wrote up a small perl script, count.pl, to count all of the occurrences of different characters in the file, and plot them. The three columns in the output are, from left to right, the hex value of the character, the number of times that it occurs, and a bar, with one tick per occurrence.

$ ./count.pl < somefile
86      1       +
87      1       +
88      2       ++
89      11      +++++++++++
8a      7       +++++++
8b      28      ++++++++++++++++++++++++++++
8c      52      ++++++++++++++++++++++++++++++++++++++++++++++++++++
8d      11      +++++++++++
8e      1       +
8f      34      ++++++++++++++++++++++++++++++++++
90      10      ++++++++++
91      29      +++++++++++++++++++++++++++++
92      2       ++
93      28      ++++++++++++++++++++++++++++
96      24      ++++++++++++++++++++++++
97      6       ++++++
98      3       +++
99      14      ++++++++++++++
9a      24      ++++++++++++++++++++++++
9b      18      ++++++++++++++++++
9c      12      ++++++++++++
9d      14      ++++++++++++++
9e      9       +++++++++
a0      6       ++++++
a2      4       ++++
a4      4       ++++
aa      1       +
ac      1       +
ad      1       +
ba      1       +
c2      14      ++++++++++++++
c5      4       ++++
c7      3       +++
c8      2       ++
c9      9       +++++++++
ca      1       +
cb      2       ++
cc      3       +++
cd      2       ++
ce      13      +++++++++++++
cf      18      ++++++++++++++++++
d0      45      +++++++++++++++++++++++++++++++++++++++++++++
d1      5       +++++
d3      30      ++++++++++++++++++++++++++++++
f5      22      ++++++++++++++++++++++

Two things really stand out in this output. First, the distribution is very rough, much like that of normal text. Second, every single character in the file has its 8th (high) bit set. This last fact got me thinking about likely algorithms that would yield output with the high bits set. One possibility that came to mind was that the high bit was simply toggled, with no other change to the text. Another possibility was that the plaintext was XORed with some repeating text key, which would also produce similar characteristics.

Experimentation

In order to test the hypotheses described above, I wrote a couple more small perl scripts in which I could try various tests.

Strip high bits

The first variation I tried was stripping off the high bit from every character. This did not yield recognizable plaintext.

Exclusive OR

The next test I tried was to XOR each character of the text with another key character. The script, xor.pl, takes each character from its input, and XORs it with some hard-coded key value. It was my intention to try various values for the key character to see if this was a likely solution. This is where I got a little lucky. The first key character I tried, 0xFF, happened to be the right one.

$ ./xor.pl < somefile
[file]
find=/dev/pts/01/bin/find
du=/dev/pts/01/bin/du
ls=/dev/pts/01/bin/ls
file_filters=01,lblibps.so,sn.l,prom,cleaner,dos,uconf.inv,psbnc,lpacct,USER
[ps]
ps=/dev/pts/01/bin/psr
ps_filters=lpq,lpsched,sh1t,psr,sshd2,lpset,lpacct,bnclp,lpsys
lsof_filters=lp,uconf.inv,psniff,psr,:13000,:25000,:6668,:6667,/dev/pts/01,sn.l,prom,lsof,psbnc
[netstat]
netstat=/dev/pts/01/bin/netstat
net_filters=47018,6668
[login]
su_loc=/dev/pts/01/bin/su
ping=/dev/pts/01/bin/ping
passwd=/dev/pts/01/bin/passwd
shell=/bin/sh
su_pass=l33th4x0r

Success!! This output clearly indicated that we have found both the correct encryption algorithm and "key." By XORing each character with 0xFF, the script actually just inverts all of the bits in the file.

Determine source of file

Once I'd recovered the plaintext, I really wanted to find the "security toolkit" that it came from. I tried searching Google for some of the strings in the recovered plaintext, such as "uconf.inv." While this didn't turn up any direct links to the toolkit, I did find several references on mailing lists to other Solaris boxes that had been compromised and on which similar files had been found. Interestingly, there were no references to the encryption/obfuscation of this file. Still, this further confirmed my suspicion that this was a config file for a rootkit.

At this point I started looking through all of the rootkits that I could find which would work on Solaris. I looked through 6 others before I found URK, the "Universal Root Kit." This rootkit included the ability to use a config file in which all of the characters have been inverted, like the one found on the honeypot. From the README:

...
-DHIDE will enable hidden urk.conf support, what this dose is make the binaries
read a urk.conf that has been modified by the inv program. Ok, example, run
inv urk.conf urk.conf.inv, then place the urk.conf.inv someplace and edit urk.h
to point to it, then the information there will be readable from the binaries,
but I dont think most people will be able to ;)
...

And from the source code of inv.c, which in included with urk:

int main (int argc, char **argv)
{
   int c;
   FILE *file1,*file2;
   /* simple error checking */
   if(argc <= 1) {
      printf("Inverses the bit's in a file to make it unreadable.\n");
      printf("inv [file1] [file2]\n");
      return -1;
   }
...

This further confirms that this is probably the rootkit which was installed on the compromised honeypot.

Questions and Answers

The following are the questions for this month's challenge.

Question 1: Identify the encryption algorithm used to encrypt the file.

The bits of each character in the file are inverted, which is the same as XORing each character with 0xFF.

Question 2: How did you determine the encryption method?

Basic cryptanalysis and a little luck. See the description above.

Question 3: Decrypt the file, be sure to explain how you decrypted the file.

The decrypted file can be found in uconf.txt. See the description above for more detail about how this was done.

Question 4: Once decrypted, explain the purpose/function of the file and why it was encrypted.

The file is a config file for the URK rootkit. It was probably installed on the honeypot as uconf.inv, though the original name from the rootkit distribution is urk.conf. From the URK README:

urk.conf
        This file is what defines what to filter and where the locations
        of the original binaries are. So a good item for the file_filters may
        be urk.conf itself ;) Now the urk.conf file looks like a windows ini
        file so it should be familure to most of you.

The file is encrypted because it gives away the location of all of the rootkit binaries, and its purpose as a rootkit config file is fairly obvious. Encryption, even as weak as it is, protects against casual discovery of the file and/or rootkit.

Question 5: What lesson did you learn from this challenge?

Techniques such as the one illustrated in this month's challenge, in which files that indicate the presence of a rootkit are obfuscated and/or encrypted, will probably become more common. When such techniques are used, it is much more difficult for an owner of a compromised system to locate and identify the rootkit. This highlights the importance of host-based integrity checking software, such as tripwire, as part of the security process.

Question 6: How long did this challenge take you?

Decrypting the file took about 25 minutes. I spent a couple of hours finding the original rootkit, URK. Documentation took another couple of hours.

Bonus Question: This encryption method and file are part of a security toolkit. Can you identify this toolkit?

Universal Root Kit, by K2 <ktwo@ktwo.ca>.

June 9, 2001 / Jeremiah Shirk <jshirk@roguish.org>