John Edward Scott <jes@daedalus-soln.demon.co.uk>
Once again, apologies if it seems superfluous to anyone.
copper% wget http://project.honeynet.org/scans/scan16/somefile.tgz copper% md5sum somefile.tgz f7964d9860cbf8135ef64bcf5b96facb somefile.tgz copper% tar -zxvf somefile.tgz somefile copper% ls -al somefile -rw-r----- 1 jes users 532 Jun 4 11:15 somefile
copper% file somefile somefile: dataUsing the Unix od command we dump the contents of the file into a (semi) readable format.
copper% od -c somefile 0000000 231 226 223 232 231 226 221 233 233 232 211 0000020 217 213 214 235 226 221 231 226 221 233 0000040 233 212 233 232 211 217 213 214 0000060 235 226 221 233 212 223 214 233 232 211 217 0000100 213 214 235 226 221 223 214 231 226 223 0000120 232 240 231 226 223 213 232 215 214 223 235 223 0000140 226 235 217 214 214 220 214 221 223 217 215 220 0000160 222 234 223 232 236 221 232 215 233 220 214 212 234 0000200 220 221 231 226 221 211 217 214 235 221 234 223 217 0000220 236 234 234 213 217 214 0000240 217 214 233 232 211 217 213 214 235 0000260 226 221 217 214 215 217 214 240 231 226 223 213 232 215 0000300 214 223 217 216 223 217 214 234 227 232 233 214 227 0000320 213 217 214 215 214 214 227 233 223 217 214 0000340 232 213 223 217 236 234 234 213 235 221 234 223 217 0000360 223 217 214 206 214 223 214 220 231 240 231 226 223 213 232 0000400 215 214 223 217 212 234 220 221 231 226 221 211 0000420 217 214 221 226 231 231 217 214 215 0000440 0000460 233 232 211 217 213 214 0000500 214 221 223 217 215 220 222 223 214 220 231 0000520 217 214 235 221 234 221 232 213 214 213 236 213 0000540 221 232 213 214 213 236 213 233 232 211 217 213 0000560 214 235 226 221 221 232 213 214 213 236 213 0000600 221 232 213 240 231 226 223 213 232 215 214 0000620 223 220 230 226 221 0000640 214 212 240 223 220 234 233 232 211 217 213 214 0000660 235 226 221 214 212 217 226 221 230 0000700 233 232 211 217 213 214 235 226 221 0000720 217 226 221 230 217 236 214 214 210 233 233 232 211 0000740 217 213 214 235 226 221 217 236 214 214 0000760 210 233 214 227 232 223 223 235 226 221 214 227 0001000 214 212 240 217 236 214 214 223 213 227 0001020 207 215 0001024We can also use the Unix strings command to search for readable strings within the file (even though from the previous output we don't expect to see any, but it's good to confirm these things).
copper% strings somefile // nothing is returned from this command, confirming what we thoughtIt's a good idea at this stage to also look at the file in other formats, we again use the od command, but this time tell it to display the file in both Hex (Base 16) and Octal (Base 8) formats. Note I've included the output from these commands here, since I will refer to them later on.
copper% od -b somefile 0000000 244 231 226 223 232 242 365 231 226 221 233 302 320 233 232 211 0000020 320 217 213 214 320 317 316 320 235 226 221 320 231 226 221 233 0000040 365 233 212 302 320 233 232 211 320 217 213 214 320 317 316 320 0000060 235 226 221 320 233 212 365 223 214 302 320 233 232 211 320 217 0000100 213 214 320 317 316 320 235 226 221 320 223 214 365 231 226 223 0000120 232 240 231 226 223 213 232 215 214 302 317 316 323 223 235 223 0000140 226 235 217 214 321 214 220 323 214 221 321 223 323 217 215 220 0000160 222 323 234 223 232 236 221 232 215 323 233 220 214 323 212 234 0000200 220 221 231 321 226 221 211 323 217 214 235 221 234 323 223 217 0000220 236 234 234 213 323 252 254 272 255 365 365 244 217 214 242 365 0000240 217 214 302 320 233 232 211 320 217 213 214 320 317 316 320 235 0000260 226 221 320 217 214 215 365 217 214 240 231 226 223 213 232 215 0000300 214 302 223 217 216 323 223 217 214 234 227 232 233 323 214 227 0000320 316 213 323 217 214 215 323 214 214 227 233 315 323 223 217 214 0000340 232 213 323 223 217 236 234 234 213 323 235 221 234 223 217 323 0000360 223 217 214 206 214 365 223 214 220 231 240 231 226 223 213 232 0000400 215 214 302 223 217 323 212 234 220 221 231 321 226 221 211 323 0000420 217 214 221 226 231 231 323 217 214 215 323 305 316 314 317 317 0000440 317 323 305 315 312 317 317 317 323 305 311 311 311 307 323 305 0000460 311 311 311 310 323 320 233 232 211 320 217 213 214 320 317 316 0000500 323 214 221 321 223 323 217 215 220 222 323 223 214 220 231 323 0000520 217 214 235 221 234 365 365 244 221 232 213 214 213 236 213 242 0000540 365 221 232 213 214 213 236 213 302 320 233 232 211 320 217 213 0000560 214 320 317 316 320 235 226 221 320 221 232 213 214 213 236 213 0000600 365 221 232 213 240 231 226 223 213 232 215 214 302 313 310 317 0000620 316 307 323 311 311 311 307 365 365 244 223 220 230 226 221 242 0000640 365 214 212 240 223 220 234 302 320 233 232 211 320 217 213 214 0000660 320 317 316 320 235 226 221 320 214 212 365 217 226 221 230 302 0000700 320 233 232 211 320 217 213 214 320 317 316 320 235 226 221 320 0000720 217 226 221 230 365 217 236 214 214 210 233 302 320 233 232 211 0000740 320 217 213 214 320 317 316 320 235 226 221 320 217 236 214 214 0000760 210 233 365 214 227 232 223 223 302 320 235 226 221 320 214 227 0001000 365 365 214 212 240 217 236 214 214 302 223 314 314 213 227 313 0001020 207 317 215 365 0001024 copper% od -x somefile 0000000 99a4 9396 a29a 99f5 9196 c29b 9bd0 899a 0000020 8fd0 8c8b cfd0 d0ce 969d d091 9699 9b91 0000040 9bf5 c28a 9bd0 899a 8fd0 8c8b cfd0 d0ce 0000060 969d d091 8a9b 93f5 c28c 9bd0 899a 8fd0 0000100 8c8b cfd0 d0ce 969d d091 8c93 99f5 9396 0000120 a09a 9699 8b93 8d9a c28c cecf 93d3 939d 0000140 9d96 8c8f 8cd1 d390 918c 93d1 8fd3 908d 0000160 d392 939c 9e9a 9a91 d38d 909b d38c 9c8a 0000200 9190 d199 9196 d389 8c8f 919d d39c 8f93 0000220 9c9e 8b9c aad3 baac f5ad a4f5 8c8f f5a2 0000240 8c8f d0c2 9a9b d089 8b8f d08c cecf 9dd0 0000260 9196 8fd0 8d8c 8ff5 a08c 9699 8b93 8d9a 0000300 c28c 8f93 d38e 8f93 9c8c 9a97 d39b 978c 0000320 8bce 8fd3 8d8c 8cd3 978c cd9b 93d3 8c8f 0000340 8b9a 93d3 9e8f 9c9c d38b 919d 939c d38f 0000360 8f93 868c f58c 8c93 9990 99a0 9396 9a8b 0000400 8c8d 93c2 d38f 9c8a 9190 d199 9196 d389 0000420 8c8f 9691 9999 8fd3 8d8c c5d3 ccce cfcf 0000440 d3cf cdc5 cfca cfcf c5d3 c9c9 c7c9 c5d3 0000460 c9c9 c8c9 d0d3 9a9b d089 8b8f d08c cecf 0000500 8cd3 d191 d393 8d8f 9290 93d3 908c d399 0000520 8c8f 919d f59c a4f5 9a91 8c8b 9e8b a28b 0000540 91f5 8b9a 8b8c 8b9e d0c2 9a9b d089 8b8f 0000560 d08c cecf 9dd0 9196 91d0 8b9a 8b8c 8b9e 0000600 91f5 8b9a 99a0 9396 9a8b 8c8d cbc2 cfc8 0000620 c7ce c9d3 c9c9 f5c7 a4f5 9093 9698 a291 0000640 8cf5 a08a 9093 c29c 9bd0 899a 8fd0 8c8b 0000660 cfd0 d0ce 969d d091 8a8c 8ff5 9196 c298 0000700 9bd0 899a 8fd0 8c8b cfd0 d0ce 969d d091 0000720 968f 9891 8ff5 8c9e 888c c29b 9bd0 899a 0000740 8fd0 8c8b cfd0 d0ce 969d d091 9e8f 8c8c 0000760 9b88 8cf5 9a97 9393 d0c2 969d d091 978c 0001000 f5f5 8a8c 8fa0 8c9e c28c cc93 8bcc cb97 0001020 cf87 f58d 0001024
Considering the size of the encrypted file, it is *fairly* unlikely that the file is an executable program, even if the file was compressed before it was encrypted it's unlikely that it was be 532 bytes. I checked this by using the Unix find command on my system for any executable files which were smaller than 2K in size and could find none. Therefore, I proceeded with the assumption that the unencrypted text was likely to be either plain text, a script file or some sort of control file. I was well aware that at this stage my assumptions could prove to be incorrect so I was vary wary of drawing any conclusions too early on.
So, once again, making quite a large assumption that the encryption method *may* be a substitution cipher, we look again at the encrypted text. Note that from now on, we will refer to the encrypted text as the ciphertext and the contents of the encrypted file before it was encrypted as the plaintext.
I decided at this stage to perform what is commonly known as frequency analysis on the encrypted text. What this actually means is that we check the number of occurrences of each distinct "letter" in the encrypted text. This may help us to draw some conclusions about the contents. To do this, I coded the following simple program written in C.
copper% cat freq.c
#include
This program will read the encrypted file and for each character in the file, it will maintain a count of the number of times that character is used. Running the program gives the following output.
copper% ./freq < somefile > somefile.freq
copper% cat somefile.freq
Ascii 134(Hex: 86, Octal:206): 1
Ascii 135(Hex: 87, Octal:207): 1
Ascii 136(Hex: 88, Octal:210): 2
Ascii 137(Hex: 89, Octal:211): 11
Ascii 138(Hex: 8a, Octal:212): 7
Ascii 139(Hex: 8b, Octal:213): 28
Ascii 140(Hex: 8c, Octal:214): 52
Ascii 141(Hex: 8d, Octal:215): 11
Ascii 142(Hex: 8e, Octal:216): 1
Ascii 143(Hex: 8f, Octal:217): 34
Ascii 144(Hex: 90, Octal:220): 10
Ascii 145(Hex: 91, Octal:221): 29
Ascii 146(Hex: 92, Octal:222): 2
Ascii 147(Hex: 93, Octal:223): 28
Ascii 150(Hex: 96, Octal:226): 24
Ascii 151(Hex: 97, Octal:227): 6
Ascii 152(Hex: 98, Octal:230): 3
Ascii 153(Hex: 99, Octal:231): 14
Ascii 154(Hex: 9a, Octal:232): 24
Ascii 155(Hex: 9b, Octal:233): 18
Ascii 156(Hex: 9c, Octal:234): 12
Ascii 157(Hex: 9d, Octal:235): 14
Ascii 158(Hex: 9e, Octal:236): 9
Ascii 160(Hex: a0, Octal:240): 6
Ascii 162(Hex: a2, Octal:242): 4
Ascii 164(Hex: a4, Octal:244): 4
Ascii 170(Hex: aa, Octal:252): 1
Ascii 172(Hex: ac, Octal:254): 1
Ascii 173(Hex: ad, Octal:255): 1
Ascii 186(Hex: ba, Octal:272): 1
Ascii 194(Hex: c2, Octal:302): 14
Ascii 197(Hex: c5, Octal:305): 4
Ascii 199(Hex: c7, Octal:307): 3
Ascii 200(Hex: c8, Octal:310): 2
Ascii 201(Hex: c9, Octal:311): 9
Ascii 202(Hex: ca, Octal:312): 1
Ascii 203(Hex: cb, Octal:313): 2
Ascii 204(Hex: cc, Octal:314): 3
Ascii 205(Hex: cd, Octal:315): 2
Ascii 206(Hex: ce, Octal:316): 13
Ascii 207(Hex: cf, Octal:317): 18
Ascii 208(Hex: d0, Octal:320): 45
Ascii 209(Hex: d1, Octal:321): 5
Ascii 211(Hex: d3, Octal:323): 30
Ascii 245(Hex: f5, Octal:365): 22
This output makes very interesting reading, we can clearly see that certain characters are present far more than others. For example, Ascii 140 is used 52 times, Ascii 208 is used 45 times etc. Now assuming a simple substitution cipher has been used, it's possible that the frequency of characters contained in the plaintext is equivalent to the frequency of characters in the ciphertext.
Again, we have to be very careful about drawing any dramatic conclusions, given that we have made some assumptions along the way. The high incidence of certain characters could correspond with either letters associated with high frequencies in the English language (for example E being the most common), or in the case of a script or control file, with some sort of delimiter or marker text.
I decided at this point to look on the Internet to find out the letter frequencies used in standard English (again note the possibly huge assumption, that the text is infact in English). After a little searching I found an analysis of the King James Bible, which gives the following frequencies in order
e,t,h,o,a,i,s,n,r,l,d,m,u,f,y,g,w,c,p,b,v,k,j,q,x,z
Using these frequencies, I created the following C program to substitute the most frequent characters in the ciphertext for the most frequent characters shown above.
copper% cat swap.c
#include
Using this program, I tried substituting each of frequently found English letters (e,t,h etc) in order for the highest ranking frequencies in the ciphertext. The swap program expects the first parameter to be the character you wish to locate and the second parameter to be the character you wish to substitute it for, For example:
copper% ./swap 140 69 < somefile > somefile.out
This command substituted any occurrences of ASCII 140 with the capital letter "E", at this stage I wasn't really concerned with being case-sensitive, I was purely searching to see if any patterns would emerge. The substitution makes it far easy to see any patterns forming in the text (although it may not look like it at first glance). Using the od command we look at the resulting output file (this time looking in ASCII format).
copper% od -c somefile.out
0000000 231 226 223 232 231 226 221 233 233 232 211
0000020 217 213 E 235 226 221 231 226 221 233
0000040 233 212 233 232 211 217 213 E
0000060 235 226 221 233 212 223 E 233 232 211 217
0000100 213 E 235 226 221 223 E 231 226 223
0000120 232 240 231 226 223 213 232 215 E 223 235 223
0000140 226 235 217 E E 220 E 221 223 217 215 220
0000160 222 234 223 232 236 221 232 215 233 220 E 212 234
0000200 220 221 231 226 221 211 217 E 235 221 234 223 217
0000220 236 234 234 213 217 E
0000240 217 E 233 232 211 217 213 E 235
0000260 226 221 217 E 215 217 E 240 231 226 223 213 232 215
0000300 E 223 217 216 223 217 E 234 227 232 233 E 227
0000320 213 217 E 215 E E 227 233 223 217 E
0000340 232 213 223 217 236 234 234 213 235 221 234 223 217
0000360 223 217 E 206 E 223 E 220 231 240 231 226 223 213 232
0000400 215 E 223 217 212 234 220 221 231 226 221 211
0000420 217 E 221 226 231 231 217 E 215
0000440
0000460 233 232 211 217 213 E
0000500 E 221 223 217 215 220 222 223 E 220 231
0000520 217 E 235 221 234 221 232 213 E 213 236 213
0000540 221 232 213 E 213 236 213 233 232 211 217 213
0000560 E 235 226 221 221 232 213 E 213 236 213
0000600 221 232 213 240 231 226 223 213 232 215 E
0000620 223 220 230 226 221
0000640 E 212 240 223 220 234 233 232 211 217 213 E
0000660 235 226 221 E 212 217 226 221 230
0000700 233 232 211 217 213 E 235 226 221
0000720 217 226 221 230 217 236 E E 210 233 233 232 211
0000740 217 213 E 235 226 221 217 236 E E
0000760 210 233 E 227 232 223 223 235 226 221 E 227
0001000 E 212 240 217 236 E E 223 213 227
0001020 207 215
0001024
For clarity, I've highlight in red where the substitutions took place. Nothing particularly stands out at this point, so I carry on substituting character 140 for the other characters in the list. It's only when we get to replacing it for "s" that we see something interesting.
copper% ./swap 140 83 < somefile > somefile.out
copper% od -c somefile.out
At this point, I spotted two lines (commencing positions 720 and 1000), with two "S" characters side by side, even more interesting than that was the fact that the two proceeding characters were the same (i.e. octal 217 and 236), we've found a pattern here. The first word that jumped into my mind when I saw this was "PASS". To verify this, we need to substitute 217 for the letter "P" and 236 for the letter "A":
copper% ./swap 140 83 < somefile > somefile.out
copper% ./swap 143 80 < somefile.out > somefile.tmp
copper% ./swap 158 65 < somefile.tmp > somefile.out
copper% od -c somefile.out
0000000 231 226 223 232 231 226 221 233 233 232 211
0000020 P 213 S 235 226 221 231 226 221 233
0000040 233 212 233 232 211 P 213 S
0000060 235 226 221 233 212 223 S 233 232 211 P
0000100 213 S 235 226 221 223 S 231 226 223
0000120 232 240 231 226 223 213 232 215 S 223 235 223
0000140 226 235 P S S 220 S 221 223 P 215 220
0000160 222 234 223 232 A 221 232 215 233 220 S 212 234
0000200 220 221 231 226 221 211 P S 235 221 234 223 P
0000220 A 234 234 213 P S
0000240 P S 233 232 211 P 213 S 235
0000260 226 221 P S 215 P S 240 231 226 223 213 232 215
0000300 S 223 P 216 223 P S 234 227 232 233 S 227
0000320 213 P S 215 S S 227 233 223 P S
0000340 232 213 223 P A 234 234 213 235 221 234 223 P
0000360 223 P S 206 S 223 S 220 231 240 231 226 223 213 232
0000400 215 S 223 P 212 234 220 221 231 226 221 211
0000420 P S 221 226 231 231 P S 215
0000440
0000460 233 232 211 P 213 S
0000500 S 221 223 P 215 220 222 223 S 220 231
0000520 P S 235 221 234 221 232 213 S 213 A 213
0000540 221 232 213 S 213 A 213 233 232 211 P 213
0000560 S 235 226 221 221 232 213 S 213 A 213
0000600 221 232 213 240 231 226 223 213 232 215 S
0000620 223 220 230 226 221
0000640 S 212 240 223 220 234 233 232 211 P 213 S
0000660 235 226 221 S 212 P 226 221 230
0000700 233 232 211 P 213 S 235 226 221
0000720 P 226 221 230 P A S S 210 233 233 232 211
0000740 P 213 S 235 226 221 P A S S
0000760 210 233 S 227 232 223 223 235 226 221 S 227
0001000 S 212 240 P A S S 223 213 227
0001020 207 215
0001024
This is starting to look better, but it's hardly conclusive (after all we've still assumed quite a few things to get us this far), however we can see the occurrence of "PS" in quite a few locations in the file, given that ps is a Unix command and one that is commonly trojaned during a compromise this is highly interesting. We can also see in a number of places the sequence "P 213 S", again it's time to chance another assumption. The dev directory and it's subdirectories are commonly used by toolkits to hide control files. If we make the assumption that character 213 could be the substitution for the letter "T" (which would make "PTS" which is commonly found under the /dev directory). We get the following:
copper% ./swap 139 84 < somefile.out > somefile.tmp
copper% od -c somefile.tmp
0000000 231 226 223 232 231 226 221 233 233 232 211
0000020 P T S 235 226 221 231 226 221 233
0000040 233 212 233 232 211 P T S
0000060 235 226 221 233 212 223 S 233 232 211 P
0000100 T S 235 226 221 223 S 231 226 223
0000120 232 240 231 226 223 T 232 215 S 223 235 223
0000140 226 235 P S S 220 S 221 223 P 215 220
0000160 222 234 223 232 A 221 232 215 233 220 S 212 234
0000200 220 221 231 226 221 211 P S 235 221 234 223 P
0000220 A 234 234 T P S
0000240 P S 233 232 211 P T S 235
0000260 226 221 P S 215 P S 240 231 226 223 T 232 215
0000300 S 223 P 216 223 P S 234 227 232 233 S 227
0000320 T P S 215 S S 227 233 223 P S
0000340 232 T 223 P A 234 234 T 235 221 234 223 P
0000360 223 P S 206 S 223 S 220 231 240 231 226 223 T 232
0000400 215 S 223 P 212 234 220 221 231 226 221 211
0000420 P S 221 226 231 231 P S 215
0000440
0000460 233 232 211 P T S
0000500 S 221 223 P 215 220 222 223 S 220 231
0000520 P S 235 221 234 221 232 T S T A T
0000540 221 232 T S T A T 233 232 211 P T
0000560 S 235 226 221 221 232 T S T A T
0000600 221 232 T 240 231 226 223 T 232 215 S
0000620 223 220 230 226 221
0000640 S 212 240 223 220 234 233 232 211 P T S
0000660 235 226 221 S 212 P 226 221 230
0000700 233 232 211 P T S 235 226 221
0000720 P 226 221 230 P A S S 210 233 233 232 211
0000740 P T S 235 226 221 P A S S
0000760 210 233 S 227 232 223 223 235 226 221 S 227
0001000 S 212 240 P A S S 223 T 227
0001020 207 215
0001024
At this point, I have to admit that my heart started to beat a little faster, because suddenly I spotted the sequence "TSTAT" in a few places (see the line commencing 000520 for example). A program that is commonly trojaned on compromised systems is the netstat command. This command is used to display network connections and various network interface statistics, so it's a favorite program to trojan to make sure the blackhat can't be seen. So looking at the output in order to make this sequence read as "NETSTAT" we need to use the swap program to convert octal 221 to "N" and octal 232 to "E".
%copper ./swap 145 78 < somefile.tmp > somefile.out
%copper ./swap 154 69 < somefile.out > somefile.tmp
%copper od -c somefile.tmp
0000000 231 226 223 E 231 226 N 233 233 E 211
0000020 P T S 235 226 N 231 226 N 233
0000040 233 212 233 E 211 P T S
0000060 235 226 N 233 212 223 S 233 E 211 P
0000100 T S 235 226 N 223 S 231 226 223
0000120 E 240 231 226 223 T E 215 S 223 235 223
0000140 226 235 P S S 220 S N 223 P 215 220
0000160 222 234 223 E A N E 215 233 220 S 212 234
0000200 220 N 231 226 N 211 P S 235 N 234 223 P
0000220 A 234 234 T P S
0000240 P S 233 E 211 P T S 235
0000260 226 N P S 215 P S 240 231 226 223 T E 215
0000300 S 223 P 216 223 P S 234 227 E 233 S 227
0000320 T P S 215 S S 227 233 223 P S
0000340 E T 223 P A 234 234 T 235 N 234 223 P
0000360 223 P S 206 S 223 S 220 231 240 231 226 223 T E
0000400 215 S 223 P 212 234 220 N 231 226 N 211
0000420 P S N 226 231 231 P S 215
0000440
0000460 233 E 211 P T S
0000500 S N 223 P 215 220 222 223 S 220 231
0000520 P S 235 N 234 N E T S T A T
0000540 N E T S T A T 233 E 211 P T
0000560 S 235 226 N N E T S T A T
0000600 N E T 240 231 226 223 T E 215 S
0000620 223 220 230 226 N
0000640 S 212 240 223 220 234 233 E 211 P T S
0000660 235 226 N S 212 P 226 N 230
0000700 233 E 211 P T S 235 226 N
0000720 P 226 N 230 P A S S 210 233 233 E 211
0000740 P T S 235 226 N P A S S
0000760 210 233 S 227 E 223 223 235 226 N S 227
0001000 S 212 240 P A S S 223 T 227
0001020 207 215
0001024
Pulling out one line that looks interesting here -
0000040 233 212 233 E 211 P T S
If our earlier assumption about "PTS" is correct, then this REALLY looks like it could read /dev/pts (ignoring the fact that we have the case wrong for a moment). So we try the following substitutions:
copper% ./swap 155 68 < somefile.tmp > somefile.out // swap 155 for "D"
copper% ./swap 137 86 < somefile.out > somefile.tmp // swap 137 for "V"
copper% ./swap 208 47 < somefile.tmp > somefile.out // swap 208 for "/"
copper% od -c somefile.tmp
0000000 231 226 223 E 231 226 N D / D E V
0000020 / P T S / / 235 226 N / 231 226 N D
0000040 D 212 / D E V / P T S / /
0000060 235 226 N / D 212 223 S / D E V / P
0000100 T S / / 235 226 N / 223 S 231 226 223
0000120 E 240 231 226 223 T E 215 S 223 235 223
0000140 226 235 P S S 220 S N 223 P 215 220
0000160 222 234 223 E A N E 215 D 220 S 212 234
0000200 220 N 231 226 N V P S 235 N 234 223 P
0000220 A 234 234 T P S
0000240 P S / D E V / P T S / / 235
0000260 226 N / P S 215 P S 240 231 226 223 T E 215
0000300 S 223 P 216 223 P S 234 227 E D S 227
0000320 T P S 215 S S 227 D 223 P S
0000340 E T 223 P A 234 234 T 235 N 234 223 P
0000360 223 P S 206 S 223 S 220 231 240 231 226 223 T E
0000400 215 S 223 P 212 234 220 N 231 226 N V
0000420 P S N 226 231 231 P S 215
0000440
0000460 / D E V / P T S /
0000500 S N 223 P 215 220 222 223 S 220 231
0000520 P S 235 N 234 N E T S T A T
0000540 N E T S T A T / D E V / P T
0000560 S / / 235 226 N / N E T S T A T
0000600 N E T 240 231 226 223 T E 215 S
0000620 223 220 230 226 N
0000640 S 212 240 223 220 234 / D E V / P T S
0000660 / / 235 226 N / S 212 P 226 N 230
0000700 / D E V / P T S / / 235 226 N /
0000720 P 226 N 230 P A S S 210 D / D E V
0000740 / P T S / / 235 226 N / P A S S
0000760 210 D S 227 E 223 223 / 235 226 N / S 227
0001000 S 212 240 P A S S 223 T 227
0001020 207 215
0001024
It was now that it was fairly clear that the case I've used is incorrect, it's better to use lower case for directory structures (although at this point we don't know whether this is infact correct, perhaps the program that uses this file isn't case sensitive or automatically converts to lowercase). However, I chose at this point to rerun all the conversions so far, using lowercase instead. I also noticed the sequence "/ 235 226 N /" in a few places (for example the line commencing 0000560) which is highly suggestive of being "/bin/". So together with our new lowercase substitutions and our two new "guesses", we have -
copper% cat swaps // This is a script I used to automate things
# s
./swap 140 115 < somefile > somefile.out
# p
./swap 143 112 < somefile.out > somefile.tmp
# a
./swap 158 97 < somefile.tmp > somefile.out
# t
./swap 139 116 < somefile.out > somefile.tmp
# n
./swap 145 110 < somefile.tmp > somefile.out
# e
./swap 154 101 < somefile.out > somefile.tmp
# d
./swap 155 100 < somefile.tmp > somefile.out
# v
./swap 137 118 < somefile.out > somefile.tmp
# /
./swap 208 47 < somefile.tmp > somefile.out
# b
./swap 157 98 < somefile.out > somefile.tmp
# i
./swap 150 105 < somefile.tmp > somefile.out
After running this new script, the output becomes
copper% od -c somefile.out
0000000 231 i 223 e 231 i n d / d e v
0000020 / p t s / / b i n / 231 i n d
0000040 d 212 / d e v / p t s / /
0000060 b i n / d 212 223 s / d e v / p
0000100 t s / / b i n / 223 s 231 i 223
0000120 e 240 231 i 223 t e 215 s 223 b 223
0000140 i b p s s 220 s n 223 p 215 220
0000160 222 234 223 e a n e 215 d 220 s 212 234
0000200 220 n 231 i n v p s b n 234 223 p
0000220 a 234 234 t p s
0000240 p s / d e v / p t s / / b
0000260 i n / p s 215 p s 240 231 i 223 t e 215
0000300 s 223 p 216 223 p s 234 227 e d s 227
0000320 t p s 215 s s 227 d 223 p s
0000340 e t 223 p a 234 234 t b n 234 223 p
0000360 223 p s 206 s 223 s 220 231 240 231 i 223 t e
0000400 215 s 223 p 212 234 220 n 231 i n v
0000420 p s n i 231 231 p s 215
0000440
0000460 / d e v / p t s /
0000500 s n 223 p 215 220 222 223 s 220 231
0000520 p s b n 234 n e t s t a t
0000540 n e t s t a t / d e v / p t
0000560 s / / b i n / n e t s t a t
0000600 n e t 240 231 i 223 t e 215 s
0000620 223 220 230 i n
0000640 s 212 240 223 220 234 / d e v / p t s
0000660 / / b i n / s 212 p i n 230
0000700 / d e v / p t s / / b i n /
0000720 p i n 230 p a s s 210 d / d e v
0000740 / p t s / / b i n / p a s s
0000760 210 d s 227 e 223 223 / b i n / s 227
0001000 s 212 240 p a s s 223 t 227
0001020 207 215
0001024
Looking at the the first line:
0000000 231 i 223 e 231 i n d / d e v
I drew a couple of conclusions, firstly the sequence "231 i n d" was possibly a reference to the Unix find command, again this is a program which is often trojaned, secondly the character after the "231 i n d" sequence (octal 302) appears quite a few times, usually just preceding a line which looks like a path to something. Here I took a leap of faith, if this was indeed a control file, as I was beginning to suspect, then looking at this sequence perhaps it followed the format of "trojaned command = <path to read details from>". This means we need to substitute octal 231 for "f" and octal 302 for "=". Trying these substitutions gives us
0000000 f i 223 e f i n d = / d e v
0000020 / p t s / / b i n / f i n d
0000040 d 212 = / d e v / p t s / /
0000060 b i n / d 212 223 s = / d e v / p
0000100 t s / / b i n / 223 s f i 223
0000120 e 240 f i 223 t e 215 s = 223 b 223
0000140 i b p s s 220 s n 223 p 215 220
0000160 222 234 223 e a n e 215 d 220 s 212 234
0000200 220 n f i n v p s b n 234 223 p
0000220 a 234 234 t p s
0000240 p s = / d e v / p t s / / b
0000260 i n / p s 215 p s 240 f i 223 t e 215
0000300 s = 223 p 216 223 p s 234 227 e d s 227
0000320 t p s 215 s s 227 d 223 p s
0000340 e t 223 p a 234 234 t b n 234 223 p
0000360 223 p s 206 s 223 s 220 f 240 f i 223 t e
0000400 215 s = 223 p 212 234 220 n f i n v
0000420 p s n i f f p s 215
0000440
0000460 / d e v / p t s /
0000500 s n 223 p 215 220 222 223 s 220 f
0000520 p s b n 234 n e t s t a t
0000540 n e t s t a t = / d e v / p t
0000560 s / / b i n / n e t s t a t
0000600 n e t 240 f i 223 t e 215 s =
0000620 223 220 230 i n
0000640 s 212 240 223 220 234 = / d e v / p t s
0000660 / / b i n / s 212 p i n 230 =
0000700 / d e v / p t s / / b i n /
0000720 p i n 230 p a s s 210 d = / d e v
0000740 / p t s / / b i n / p a s s
0000760 210 d s 227 e 223 223 = / b i n / s 227
0001000 s 212 240 p a s s = 223 t 227
0001020 207 215
0001024
Now this is really starting to look promising, we still have to prove that this is the correct algorithm, however it's a good check to see how many (successful) substitutions we can perform first. Firstly we can see from the first line
0000000 f i 223 e f i n d = / d e v
that the sequence "f i 223 e" is a good candidate to be the word "file", since there are very few other words that would fit and it's also a computer related term (of course it could be any other gibberish such as "f1le"). So we'll try substituting Octal 223 for "i". We can also see a trend here, in the pattern of "<keyword>=<path>" there usually appears to be the Octal value 365 immediately preceding the <keyword> part, what could this mean? Well if you think about the typical layout of a configuration file, we would expect the <keyword> part to begin on a newline, for example something like
passwordfile=/etc/password
shadowfile=/etc/shadow
So, once making another assumption that Octal 365 represents a newline (it could simple be some other sort of delimiter, for all we know the entire text may be on one line), we make the following substitutions
copper% ./swap 147 105 < somefile.out > somefile.tmp // substitute Octal 223 for "i"
copper% ./swap 245 10 < somefile.tmp > somefile.out // substitute Octal 365 for NewLine
copper% od -c somefile.out
0000000 f i l e \n f i n d = / d e v
0000020 / p t s / / b i n / f i n d
0000040 \n d 212 = / d e v / p t s / /
0000060 b i n / d 212 \n l s = / d e v / p
0000100 t s / / b i n / l s \n f i l
0000120 e 240 f i l t e 215 s = l b l
0000140 i b p s s 220 s n l p 215 220
0000160 222 234 l e a n e 215 d 220 s 212 234
0000200 220 n f i n v p s b n 234 l p
0000220 a 234 234 t \n \n p s \n
0000240 p s = / d e v / p t s / / b
0000260 i n / p s 215 \n p s 240 f i l t e 215
0000300 s = l p 216 l p s 234 227 e d s 227
0000320 t p s 215 s s 227 d l p s
0000340 e t l p a 234 234 t b n 234 l p
0000360 l p s 206 s \n l s 220 f 240 f i l t e
0000400 215 s = l p 212 234 220 n f i n v
0000420 p s n i f f p s 215
0000440
0000460 / d e v / p t s /
0000500 s n l p 215 220 222 l s 220 f
0000520 p s b n 234 \n \n n e t s t a t
0000540 \n n e t s t a t = / d e v / p t
0000560 s / / b i n / n e t s t a t
0000600 \n n e t 240 f i l t e 215 s =
0000620 \n \n l 220 230 i n
0000640 \n s 212 240 l 220 234 = / d e v / p t s
0000660 / / b i n / s 212 \n p i n 230 =
0000700 / d e v / p t s / / b i n /
0000720 p i n 230 \n p a s s 210 d = / d e v
0000740 / p t s / / b i n / p a s s
0000760 210 d \n s 227 e l l = / b i n / s 227
0001000 \n \n s 212 240 p a s s = l t 227
0001020 207 215 \n
0001024
Now we have the linebreaks in it's worth looking at the file in the normal way, just to see how things look
copper% cat somefile.out
file
find=/dev/pts//bin/find
d=/dev/pts//bin/d
ls=/dev/pts//bin/ls
filefiltes=lblibpsssnlpӜleanedsӊnfinvpsbnlpatӪ
ps
ps=/dev/pts//bin/ps
psfiltes=lplpsedstpsssdlpsetlpatbnlplpss
lsffiltes=lpӊnfinvpsniffps/dev/pts/snlplsfpsbn
netstat
netstat=/dev/pts//bin/netstat
netfiltes=
lin
sl=/dev/pts//bin/s
pin=/dev/pts//bin/pin
passd=/dev/pts//bin/passd
sell=/bin/s
spass=ltˇύ
This is really looking promising now that it's in a fairly readable format. There's a few relatively obvious substitutions we can make, in the line
0000760 210 d \n s 227 e l l = / b i n / s 227
substituting Octal 227 for "h" would make it read "shell=/bin/sh", which looks like a realistic substitution to make.
We can also see the sequence "f i l t e 215 s" in a few places, a quick search of a dictionary reveals that the only likely candidate for this would be the word "filters", therefore we can also attempt to substitute Octal 215 for "r".
Also since most of the <keyword> sections seem to be Unix command names, the lines with the sequences "p i n 230" and "p a s s 210 d" are likely to be "ping" and "passwd" respectively.
Trying these new substitutions all in one go, gives us
copper% cat swaps
./swap 140 115 < somefile > somefile.out #s
./swap 143 112 < somefile.out > somefile.tmp #p
./swap 158 97 < somefile.tmp > somefile.out #a
./swap 139 116 < somefile.out > somefile.tmp #t
./swap 145 110 < somefile.tmp > somefile.out #n
./swap 154 101 < somefile.out > somefile.tmp #e
./swap 155 100 < somefile.tmp > somefile.out #d
./swap 137 118 < somefile.out > somefile.tmp #v
./swap 208 47 < somefile.tmp > somefile.out #/
./swap 157 98 < somefile.out > somefile.tmp #b
./swap 150 105 < somefile.tmp > somefile.out #i
./swap 153 102 < somefile.out > somefile.tmp #f
./swap 194 61 < somefile.tmp > somefile.out #=
./swap 147 108 < somefile.out > somefile.tmp #l
./swap 245 10 < somefile.tmp > somefile.out #NewLine
./swap 151 104 < somefile.out > somefile.tmp #h
./swap 141 114 < somefile.tmp > somefile.out #r
./swap 152 103 < somefile.out > somefile.tmp #g
./swap 136 119 < somefile.tmp > somefile.out #w
copper% sh swaps
copper% od -c somefile.tmp
0000000 f i l e \n f i n d = / d e v
0000020 / p t s / / b i n / f i n d
0000040 \n d 212 = / d e v / p t s / /
0000060 b i n / d 212 \n l s = / d e v / p
0000100 t s / / b i n / l s \n f i l
0000120 e 240 f i l t e r s = l b l
0000140 i b p s s 220 s n l p r 220
0000160 222 234 l e a n e r d 220 s 212 234
0000200 220 n f i n v p s b n 234 l p
0000220 a 234 234 t \n \n p s \n
0000240 p s = / d e v / p t s / / b
0000260 i n / p s r \n p s 240 f i l t e r
0000300 s = l p 216 l p s 234 h e d s h
0000320 t p s r s s h d l p s
0000340 e t l p a 234 234 t b n 234 l p
0000360 l p s 206 s \n l s 220 f 240 f i l t e
0000400 r s = l p 212 234 220 n f i n v
0000420 p s n i f f p s r
0000440
0000460 / d e v / p t s /
0000500 s n l p r 220 222 l s 220 f
0000520 p s b n 234 \n \n n e t s t a t
0000540 \n n e t s t a t = / d e v / p t
0000560 s / / b i n / n e t s t a t
0000600 \n n e t 240 f i l t e r s =
0000620 \n \n l 220 g i n
0000640 \n s 212 240 l 220 234 = / d e v / p t s
0000660 / / b i n / s 212 \n p i n g =
0000700 / d e v / p t s / / b i n /
0000720 p i n g \n p a s s w d = / d e v
0000740 / p t s / / b i n / p a s s
0000760 w d \n s h e l l = / b i n / s h
0001000 \n \n s 212 240 p a s s = l t h
0001020 207 r \n
0001024
copper% cat somefile.out
file
find=/dev/pts//bin/find
d=/dev/pts//bin/d
ls=/dev/pts//bin/ls
filefilters=lblibpsssnlprӜleanerdsӊnfinvpsbnlpatӪ
ps
ps=/dev/pts//bin/psr
psfilters=lplpshedshtpsrsshdlpsetlpatbnlplpss
lsffilters=lpӊnfinvpsniffpsr/dev/pts/snlprlsfpsbn
netstat
netstat=/dev/pts//bin/netstat
netfilters=
lgin
sl=/dev/pts//bin/s
ping=/dev/pts//bin/ping
passwd=/dev/pts//bin/passwd
shell=/bin/sh
spass=lthˇr
We can see from the marked lines, that there are only a couple of other logical substitutions we can make now, on the first marked line if it is to follow the convention that it is a Unix command, then it should be either du or df however since we know we have already used a substitution for "i" combined with the fact that if we choose to use "u" then it will make the last line read "su=". Also we can see that Octal value 220 is probably a substitution for "o" making it read "login".
Now we run the substitution script again to give us
copper% sh swaps
copper$ cat somefile.out
file
find=/dev/pts//bin/find
du=/dev/pts//bin/du
ls=/dev/pts//bin/ls
filefilters=lblibpssosnlproӜleanerdosuonfinvpsbnlpatӪ
ps
ps=/dev/pts//bin/psr
psfilters=lplpshedshtpsrsshdlpsetlpatbnlplpss
lsoffilters=lpuonfinvpsniffpsr/dev/pts/snlprolsofpsbn
netstat
netstat=/dev/pts//bin/netstat
netfilters=
login
sulo=/dev/pts//bin/su
ping=/dev/pts//bin/ping
passwd=/dev/pts//bin/passwd
shell=/bin/sh
supass=lthˇr
We can clearly see that using the substitutions we've made so far, this does look like a reasonable control file of some sort. We have come up with a matrix of substitutions we can use in order to transform the ciphertext into this (almost) plaintext, HOWEVER we have still not come up with a general formula for getting from a particular plaintext character to a particular ciphertext character. In order to do this we need to analyse the results so far.
Cipher Value (ASCII) | Plaintext Value (ASCII) |
---|---|
140 | 115 |
143 | 112 |
158 | 97 |
139 | 116 |
145 | 110 |
158 | 97 |
However, if we look at the binary representations of these characters, things start to look a little different.
Cipher Value (ASCII) | Plaintext Value (ASCII) | Cipher Value (BINARY) | Plaintext Value (BINARY) |
---|---|---|---|
140 | 115 | 10001100 | 01110011 |
143 | 112 | 10001111 | 01110000 |
158 | 97 | 10011110 | 01100001 |
139 | 116 | 10001011 | 01110100 |
145 | 110 | 10010001 | 01101110 |
158 | 97 | 10011110 | 01100001 |
>From this sample of substitutions we can see a clear correlation of the Cipher value and the Plaintext value, in this sample it looks as though each bit in the value is flipped to go from ciphertext to plaintext and vice versa.
In order to test this theory, I wrote a small C program to flip the bits in each character of the encrypted file and write out a new file.
copper% cat flip.c
#include
SUCCESS!!!! I have to admit at this point I felt elated, not only had I found the method of encryption but the logic and the assumptions I'd made to get to this point had proven to be correct. We can clearly see from this file that it is indeed a control file, as we had assumed, we can also see that we were correct about the physical layout of the file, with blocks of commands and a general format of <keyword>=<value;>
The file was encrypted by flipping each binary bit of each character, so each 0 becomes a 1 and each 1 becomes a 0. This method is commonly known as either complement or bitwise NOT.
The encryption method was determined initially by performing frequency analysis on the encrypted file. By using this method we were able to repeatedly try to substitute the most common letters used in the English language against characters in the ciphertext. The big break initially came from recognizing that the sequence "217 236 S S" was likely to be "PASS".
The file was 90% decrypted by using the method described above, however total decryption and proof that the method was correct was done by writing a program to perform bitwise NOT operations on each character in the encrypted file. The program is shown below.
#include
Note that as a double check, we can encrypt the plaintext using the above program and compare the output with the original encrypted file.
copper% ./flip < somefile.out > somefile.enc
copper% diff somefile.enc somefile // returns no output (i.e. files are the same)
copper% md5sum somefile.enc
643be63febcba6f2a9d24a370a861e00 somefile.enc
copper% md5sum somefile
643be63febcba6f2a9d24a370a861e00 somefile
The file is a control file which is used by various programs that are comprised in a rootkit which was installed during the compromise. Some lines in the file detail where the trojaned copies of the binaries installed by the rootkit live, for example the trojaned version of the find command can be found in the /dev/pts/01/bin/find directory. The file contains some lines which are to be used as filters for some of the various trojaned programs, for example the line that reads net_filters=47018,6668 indicates that the ports 47018 and 6668 should be hidden from view if the user uses the trojaned netstat command. This allows the intruder to install programs that will listen on these ports without them showing up.
There is also a line which details what the supervisor password is (more likely another account with UID 0 rather than changing the usual root password), it is set to "l33th4xor", which in script-kiddie language basically means "elite hacker".
The file is encrypted simply for the reason that it would be extremely incriminating if it were accidently found. Rootkits go to great lengths to install trojan binaries that will not show any evidence of intrusion, so if this file was left lying around unencrypted and it was found, a sharp System Administrator would immediately suspect something (at least you would hope so!).
There were a few useful lessons learnt from this challenge. Firstly I was surprised how much information can be derived when initially it looks like none is available. For example initially I knew almost nothing about the file itself, yet by making some quick deductions that it was unlikely to be a binary due to its size alone, I then began to work on the premise that it was going to be some sort of plaintext file.
Secondly, I learnt that sometimes you have to make assumptions (or if you prefer leaps-of-faith) and be prepared to invest quite a bit of time before you can prove any of your assumptions to be correct. I must admit, I was prepared at quite a few stages to hit a brick wall and find that all my work had been in vain and that I'd have to start from scratch again, however I was fortunate that the path I took and the reasoning that I stuck with were correct (in the long run). Sometimes it is very daunting to be faced with a seemingly impossible (to the average person) task, however if you break it down into manageable chunks and know what sort of results you expect from each chunk then it is possible to approach the problem in a structured way.
Thirdly, I learnt that in this sort of investigation there is usually more than one way to get to the correct answer. After completely the challenge I thought of another couple of ways I could have approached it (for example later on I wrote a program that showed me that the leftmost bit on each character was set, which is suggestive of either an OR/XOR operation or bitwise negation) which could have made me arrive at the answer sooner. I look forward to seeing how other people approached the problem, because whilst it's important to achieve the right result, it's often how you get there that's more interesting.
I won't pretend it came to me in a flash, because it didn't, the initial stages of analysis were spent with a piece of paper and a pen doodling for a long time. I'd break the time down as -
Initial Investigation (pen and paper time) - 30 minutes Frequency Analysis - 1 hour Manual Character Substitutions - 2 hours Realizing and Coding the bitwise negation - 30 minutes Writing Report - 2 hours Total Time Taken - 6 hours
After a quick search on a search engine for "/dev/pts/01" (I used this since it's quite a specific phrase). I found a description of a security advisory at Stanford University which details an snmpXdmid exploit which is associated with a Rootkit which installs trojaned binaries in "/dev/pts/01". The port numbers listed in the advisories (namely 47018 and 6668) also match up completely with those listed in our decrypted file (in the net_filters section).
After quite a bit of digging around, searching for phrases like "solaris" and "rootkit", I found a number of rootkits which I downloaded to my machine. It took quite a while to examine each one and determine what it did and whether it met the criteria that I would expect (namely it installs trojans in "/dev/pts/01" and also includes encrypted files). I stumbled across a rootkit known as the "Universal Root Kit", with a filename of urk-dist-0.9.8.tar.gz (note I haven't provided the URL for this file, but it's easily found I just didn't want to promote the website that I found it at). Looking inside this archive shows a couple of interesting things, firstly there is a file called urk.conf which is shown below
[file]
find=/usr/man/man1/xxxxxxbin/find
du=/usr/man/man1/xxxxxxbin/du
ls=/usr/local/bin/ls.gnu
file_filters=xxxxxx,yyyyyy,aaaaaa,mmmmmmmmm
[ps]
ps=/usr/man/man1/xxxxxxbin/ps
ps_filters=nedit,bash
[netstat]
netstat=/usr/man/man1/xxxxxxbin/netstat
net_filters=innu.org
[login]
su_pass=h4x0r
su_loc=/usr/man/man1/xxxxxxbin/su
ping=/usr/man/man1/xxxxxxbin/ping
passwd=/usr/man/man1/xxxxxxbin/passwd
shell=/usr/man/man1/xxxxxxbin/bash
This file looks remarkable similar to our unencrypted file, with the obvious difference of the directory paths, most of the keynames are identical and the value associated with the su_pass keyword is very similar (h4x0r compared with l33th4x0r).
Secondly and even more importantly, the rootkit contains a file called inv.c, the contents of which are shown below.
#include
This file basically performs the same function as the flip program I wrote earlier, in that it will perform bitwise negation on an input file, effectively encrypting it. Infact we can test (I haven't shown it here) by using this inv.c file to encrypt our plaintext file and compare it to the original encrypted file, proving that they are identical.
This leads me to believe that the rootkit used was a variant of the "Universal Root Kit", i.e. someone has used the "Universal Root Kit" as a base and modified it for their needs.