Honeynet "Scan of the Month 24" Analysis by Artjom Grudnitsky 25th October, 2002 System used for analysis was a Linux system (kernel 2.4.19). loopback device support was compiled into the kernel. OpenOffice was used to read the Excel Spreadsheets and the MS Office file ----------------------------- Check MD5 Sum: $ md5sum image.zip b676147f63923e1f428131d59b1d6a72 image.zip $ Unzip the diskimage: $ unzip image.zip Archive: image.zip inflating: image $ Quickcheck the file format: $ file image image: x86 boot sector, system MSDOS5.0, FAT (12 bit) $ Lets try a loopmount of the image and check its contents: $ sudo mount image /mnt/dos1 -t msdos -o loop $ ls -la /mnt/dos1 total 28 drwxr-xr-x 2 root root 7168 Jan 1 1970 ./ drwxr-xr-x 10 root root 4096 Sep 21 09:20 ../ -rwxr-xr-x 1 root root 15585 Sep 11 08:30 coverp~1.jpg -rwxr-xr-x 1 root root 1000 May 24 08:20 schedu~1.exe $ Using file(1) we can check whether these two files are really a JPEG and a DOS/Windows executable as their filename-extensions suggest: $ file /mnt/dos1/* /mnt/dos1/coverp~1.jpg: PC formatted floppy with no filesystem /mnt/dos1/schedu~1.exe: Zip archive data, at least v2.0 to extract $ using xxd(1) on coverp~1.jpg shows that the file contains only 0xf6 and 0x00 values (JPEG images are compressed (except for some JPEG2000), so surely isn't a JPEG), trying $ unzip -t /mnt/dos1/schedu~1.exe fails, so probably only the first few bytes (signature) suggest it is a zip-file. A few words about the FAT filesystem[1] used on the disk-image: Files on the disk are broken into chunks - called clusters. The starting cluster of a file is given in its directory entry. The address of this starting cluster is mapped into a special area on the disk called FAT. The FAT can be divided into 12, 16 or 32 Bit entries (for FAT12, FAT16 and FAT32 respectively). Each entry represents one cluster on the disk. By reading the value of an entry you get - the next cluster of the file, or - a value saying it is the last cluster of a file or a few other meanings (see [1] or the official FAT specification for details). Regarding the corrupt files we saw when mounting the image, there are a few explanations for this: - The starting cluster of the directory entries of the files was altered, - The FAT-chains were altered or - The File data was corrupted/deleted. In the case of the last point, one would have to analyse the physical disk. However, lets see if we can recover the information by using only the imagefile. I wrote a perl-script (fat-recover.pl) which helps us manipulating FAT-chains and using them to read data from the image. $ ./fat-recover.pl > open image open image ok > init Imagefile: image FAT Type: FAT12 Boot sector: jb_addr: eb3c90 OEM_name: MSDOS5.0 bytes_per_sec: 512 secs_per_clus: 1 reserved_secs: 1 FATs: 2 root_ents: 224 total_secs: 2880 media_desc: f0 secs_per_FAT: 9 secs_per_track: 18 heads: 2 hidd_secs: 0 Boot sector signature: 55aa reading FAT 0; starting at 512... reading FAT 1; starting at 1024... got 2 FAT chains init ok > The script has opened the imagefile and read the bootsector, which has some information about the layout of the filesystem. Then it checked the FATs (there are 2 of them) and retrieved two FAT-chains from it. Lets see which clusters are used by those chains. > fcedit show chain 0: 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72 chain 1: 73, 74, 75, 76, 77 fcedit show ok > Using the FAT-chains the script can read the appropriate clusters from the image and dump them to files: > clusdump dumping to clusfile0, starting at 0x5400 ok, finished at 0x9200 dumping to clusfile1, starting at 0x9200 ok, finished at 0x9C00 clusdump ok > Lets see what we recovered: $ ls -l clusfile* -rw-r--r-- 1 artjom users 15872 Oct 26 00:49 clusfile0 -rw-r--r-- 1 artjom users 2560 Oct 26 00:49 clusfile1 $ file clusfile* clusfile0: data clusfile1: JPEG image data, JFIF standard 1.01, resolution (DPI), 96 x 96 $ Trying to view clusfile1 with ImageMagick's "display" gives us only a small fraction of the picture. The text says POT SMOKERS MONTHLY Your monthly g we also get the error message display: Corrupt JPEG data: premature end of data segment (clusfile1). display: Corrupt JPEG data: premature end of data segment (clusfile1). This shows that although the FAT-chain starts correctly, it is "cut off". Now we could try extending the FAT-chains incrementally, looking at a hexdump of the file and taking a guess where the file might end, or looking for the "End of Image" marker, which terminates a JPEG image. Although the last approach is the best in this case, it won't work for file formats which don't have a terminator. Another point is, that we would be able to recover 2 files at most (since there are only 2 FAT-chains). What if there are other files, for which the FAT-chains were removed completely? So we can try the following: Dump every cluster into a seperate file. Use the file(1) program on these files to determine which clusters are the beginnings of files. As files have to be aligned on cluster boundaries (because using the FAT one can only get the starting address of the cluster), this method should work quite well. > clusdump each [lots of output removed] clusdump each ok > We now have 2876 clusters in separate files. using file(1) on them and sorting numerically (the filenames have the format Neachclus where N is the cluster number): $ file *eachclus | sort -n > clusters.file $ Examining clusters.file we see that starting with cluster 109 there is nothing relevant ("PC formatted floppy with no filesystem"). the "data" entries are also just clusters within files, not at their beginning. So lets filter the summary file to get a bit of overview: $ grep -v '^[0-9]\+eachclus: \+\(data\|PC formatted floppy with no filesystem\)$' < clusters.file 10eachclus: SysEx File - 33eachclus: Microsoft Office Document 38eachclus: Non-ISO extended-ASCII English text, with CR line terminators 39eachclus: Non-ISO extended-ASCII English text, with CR line terminators 46eachclus: DBase 3 data file (327682 records) 70eachclus: Hitachi SH big-endian COFF object, not stripped 73eachclus: JPEG image data, JFIF standard 1.01, resolution (DPI), 96 x 96 75eachclus: DBase 3 data file with memo(s) 104eachclus: Zip archive data, at least v2.0 to extract $ Remeber chain 1 we had when examining the FAT-chains in fat-recover.pl? It had clusters 73-77. Here we see a JPEG file starting at cluster 73. The DBase 3 entry at cluster 75 is just JPEG data misinterpreted by file(1). So the next file starts at cluster 104. Lets try dumping a file with a FAT-chain starting at cluster 73 and ending at 103. > fcedit newchain new chain 2 created fcedit newchain ok > fcedit add 2 73-103 adding 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103 at pos 0 fcedit add 2 73-103 ok > clusdump 2 dumping to clusfile2, starting at 0x9200 ok, finished at 0xD000 clusdump 2 ok > let check this file: $ file clusfile2 clusfile2: JPEG image data, JFIF standard 1.01, resolution (DPI), 96 x 96 $ identify clusfile2 clusfile2 JPEG 208x199 DirectClass 8-bit 15872b 0.0u 0:01 $ display clusfile2 $ We have recovered the complete JPEG file (actually we have more than that, but since the display program only reads until the "End of Image" marker, the image is shown as if it would have its correct size). Let's try the same with the other files: for the SysEx - file we get only garbage - another "false positive" by file(1). Using cluster 33-72 (the entries in between have also been misinterpreted by file(1)) gives us the MS Office file (Joe's letter to "Jimmy Jungle"). > fcedit newchain new chain 3 created fcedit newchain ok > fcedit add 3 33-72 adding 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72 at pos 0 fcedit add 3 33-72 ok > clusdump 3 dumping to clusfile3, starting at 0x4200 ok, finished at 0x9200 clusdump 3 ok > The last file left is the Zip-Archive. It starts at cluster 104, and ends at 108, as the "PC formated..." entries start at 109. > fcedit newchain new chain 4 created fcedit newchain ok > fcedit add 4 104-108 adding 104, 105, 106, 107, 108 at pos 0 fcedit add 4 104-108 ok > clusdump 4 dumping to clusfile4, starting at 0xD000 ok, finished at 0xDA00 clusdump 4 ok > $ unzip -t clusfile4 Archive: clusfile4 [clusfile4] Scheduled Visits.xls password: ^C $ Hmm, the Zip Archive is encrypted. However PKZIP encryption is weak, and there exist several plain-text attack tools. Judging from the filename-extension, the File in the archive is an MS Excel spreadsheet, so we could lookup the Excel file format specification and use common header values and signatures in a plain-text attack. However the whole thing isn't necessary for relatively small file like this: $ strings image | less Looking at the printable parts of the diskimage, which are shown by strings(1), we discover the following in line 847: pw=goodtimes Seems the password has been stored along with the encrypted information... Lets try opening the archive, using goodtimes as the password: $ unzip clusfile4 Archive: clusfile4 [clusfile4] Scheduled Visits.xls password: inflating: Scheduled Visits.xls $ so we have the last file, an Excel spreadsheet with Joe's school-visits schedule. ----------------------------- Questions: 1. Who is Joe Jacob's supplier of marijuana and what is the address listed for the supplier? Joe's letter (The MS Office file, clusfile3) is directed to: Jimmy Jungle 626 Jungle Ave Apt 2 Jungle, NY 11111 2. What crucial data is available within the coverpage.jpg file and why is this data crucial? coverpage.jpg seems to be the coverpage of a marijuana smokers' magazine ("High Times Magazine", as said in the letter). It says: POT SMOKERS MONTHLY Your monthly guide to the best pot on the plant! This month's featured pot grower, smoker and seller is Jimmy Jungle. That is another hint (besides the letter), that Jimmy Jungle grows marijuana and sells it. If he is a "featured" grower, it means he sells in big numbers, so probably Joe isn't his only distributor. 3. What (if any) other high schools besides Smith Hill does Joe Jacobs frequent? From "Scheduled Visits.xls" we have: - Smith Hill High School - Key High School - Leetch High School - Birard High School - Richter High School - Hull High School 4. For each file, what processes were taken by the suspect to mask them from others? The starting sector and the filelengths of coverpage.jpg and "Scheduled Visits.xls", stated in the root-directory entries (this had the filename extension of a DOS executable, so it was probably a "self-extracting zip archive") had been altered. The FAT-table was also manipulated, so the both files were truncated after a certain length. The letter to Jimmy Jungle had no FAT-chain and its directory entry had been marked for deletion (starting character 0xe5). (The root directory is at 0x2600 of the hexdump of the imagefile). 5. What processes did you (the investigator) use to successfully examine the entire contents of each file? I described the recovery process in the "Analysis part"; in short: Every sector was dumped to a separate file, the file(1) program was used to determine sectors which could be beginnings of files, one such beginning sector and all the sector until the next beginning sector formed a file. If the files had been fragmented, the recovery process would have been much harder if not impossible. ----------------------------- Footnotes: [1] Information about the FAT filesystem was taken from: http://www.win.tue.nl/~aeb/linux/fs/fat/fat-1.html and http://www.nondot.org/sabre/os/files/FileSystems/fatFilesystem.txt