Jimmy JungleFor convenience, the Word file has been converted to text format: Jimmy_Jungle.txt
626 Jungle Ave Apt 2
Jungle, NY 11111
Furthermore, the directory entry for this file was damaged: its data pointer was set to an unused portion of the disk. It seems unlikely that this was the result of accidental damage to the disk, since nothing else was changed. It is more likely to be a deliberate attempt to prevent the contents of the file from being found. Those contents were still present on the disk, though, so it was possible to recover them.
This file's directory entry was also damaged, but in a different way. The length field was changed, making the file appear to be shorter than it actually is, thereby preventing the entire content of the file from being read. Since the actual data on the disk was not damaged, it was possible to recover this file.
As mentioned earlier, the contents of this file were protected with the zip encryption feature. However, the file password was carelessly left on the disk where it could be found. This allowed the encrypted file to be opened easily.
After downloading the image.zip file, the first step is to verify the MD5 checksum given on the web page, to ensure that we have an undamaged copy. I don't like to compare checksums by eye, because that seems very tedious and error prone. Instead, I cut and pasted the checksum from the webpage and used md5sum's --check option. The command is shown below. It's very important that there are exactly two spaces before "image.zip"!
$ md5sum --check <paste>b676147f63923e1f428131d59b1d6a72 image.zip</paste> image.zip: OK
Good, that checked out. Next, I extracted the contents of the compressed zip archive file.
$ unzip image.zip Archive: image.zip inflating: image $ ls -l image -rw-r--r-- 1 bob users 1474560 Sep 18 09:50 image
The size of this file is right for an image of a floppy disk. I used the Unix "file" utility to check what was in the file.
$ file image image: x86 boot sector, system MSDOS5.0, FAT (12 bit)
Evidently, this is an MSDOS FAT filesystem. (FAT stands for File Allocation Table, in case anyone's interested. Twelve bits is the usual FAT entry size used for small floppies. A hard disk would use 16 or 32 bits.) Since "file" isn't foolproof, I also took a look at the file contents myself. I didn't see any reason to think that there was a mistake.
$ hexdump -C image |less 00000000 eb 3c 90 4d 53 44 4f 53 35 2e 30 00 02 01 01 00 |.<.MSDOS5.0.....| 00000010 02 e0 00 40 0b f0 09 00 12 00 02 00 00 00 00 00 |...@............| 00000020 00 00 00 00 00 00 29 cf cd b1 c4 4e 4f 20 4e 41 |......)....NO NA| 00000030 4d 45 20 20 20 20 46 41 54 31 32 20 20 20 33 c9 |ME FAT12 3.| 00000040 8e d1 bc f0 7b 8e d9 b8 00 20 8e c0 fc bd 00 7c |....{.... .....|| 00000050 38 4e 24 7d 24 8b c1 99 e8 3c 01 72 1c 83 eb 3a |8N$}$....<.r...:| ...
(The rest of the first page is gibberish, so I've omitted it here.)
Next, I decided to take a quick look at this filesystem to see what was there. One way to do that is to copy the image onto a floppy disk, using the command dd if=image of=/dev/fd0 bs=512. Instead, I used the "loopback" feature to mount the image file as if it were a real block I/O device like a disk. Neither of these methods show deleted files or other hidden information, but that is delayed until later.
As root, I did the following. The "ro" option instructs mount that the filesystem should be read-only, to prevent accidentally changing something during the investigation.
# mount -o ro,loop image /mnt # ls -la /mnt drwxr-xr-x 2 root root 7168 Dec 31 1969 ./ drwxr-xr-x 21 root root 4096 Oct 12 15:30 ../ -rwxr-xr-x 1 root root 15585 Sep 11 08:30 cover\ page.jpgc\ \ \ \ \ \ \ \ \ \ \ * -rwxr-xr-x 1 root root 1000 May 24 08:20 schedu~1.exe*
My "ls" command is actually an alias that uses the -b option, which causes special characters to be escaped with a backslash. That's lucky, because otherwise I might not have noticed the trailing spaces in the first filename.
The second filename contains "~1", suggesting that this is actually the short version of a long filename. The file's long name entry may have been damaged somehow, or the file may have been processed by a piece of software that doesn't understand long filenames.
I grabbed a copy of the files from the disk, then unmounted it.
# mkdir files # cp /mnt/* files # umount /mnt
Once again, I used the "file" utility to see what these files are.
$ file files/* files/cover page.jpgc : PC formatted floppy with no filesystem files/schedu~1.exe: Zip archive data, at least v2.0 to extract
The description "PC formatted floppy" didn't make sense to me, so I took a look at the contents of the file.
$ hexdump -C "files/cover page.jpgc " |less 00000000 f6 f6 f6 f6 f6 f6 f6 f6 f6 f6 f6 f6 f6 f6 f6 f6 |................| * 00000200 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00003ce0 00 |.| 00003ce1
The * means that several lines have been omitted because they are just the same as the previous one. There is nothing useful in this file. Someone may have wiped the contents out, or the disk may have been damaged somehow.
The second file is named with the .exe extension, but the "file" utility claims it's a Zip file, not a .exe program file.
$ hexdump -C files/schedu~1.exe |less 00000000 50 4b 03 04 14 00 01 00 08 00 98 5a b7 2c c7 55 |PK.........Z.,.U| 00000010 60 8d ea 08 00 00 00 42 00 00 14 00 00 00 53 63 |`......B......Sc| 00000020 68 65 64 75 6c 65 64 20 56 69 73 69 74 73 2e 78 |heduled Visits.x| 00000030 6c 73 94 c8 31 2a e3 49 0b db a8 10 c2 70 9d fc |ls..1*.I.....p..| 00000040 10 03 31 a2 8e 48 e8 3c 4b 81 75 c9 8b 86 51 af |..1..H.<K.u...Q.| ...
Sure enough, "file" is right. A real .exe file would start with the letters MZ. This file starts with PK, which incidentally are the initials of Phil Katz, the author of the PKZIP utility. It looks like the first file in the Zip archive is an Excel spreadsheet called "Scheduled Visits". I tried listing the contents of the archive to see what else is there.
$ unzip -v files/schedu~1.exe Archive: schedu~1.exe End-of-central-directory signature not found. Either this file is not a zipfile, or it constitutes one disk of a multi-part archive. In the latter case the central directory and zipfile comment will be found on the last disk(s) of this archive.
Bad luck, the file is damaged. I tried repairing it.
$ mkdir fix $ cp files/schedu~1.exe fix $ cd fix $ zip -F schedu~1.exe zip: reading Scheduled Visits.xls zip warning: schedu~1.exe would be truncated. Retry with option -qF to truncate, with -FF to attempt full recovery $ zip -FF schedu~1.exe zip: reading Scheduled Visits.xls compressed size 2282, actual size 950 for Scheduled Visits.xls zip warning: schedu~1.exe has been truncated.
Unfortunately, it looks like I don't have the whole file. I attempted to get what I could out of it, though.
$ unzip -v schedu~1.exe Archive: schedu~1.exe Length Method Size Ratio Date Time CRC-32 Name -------- ------ ------- ----- ---- ---- ------ ---- 16896 Defl:N 938 94% 05-23-02 11:20 8d6055c7 Scheduled Visits.xls -------- ------- --- ------- 16896 938 94% 1 file $ unzip schedu~1.exe Archive: schedu~1.exe [schedu~1.exe] Scheduled Visits.xls password:
The zip file is encrypted with a password. I took a few guesses, but all I got was this:
password incorrect--reenter: skipping: Scheduled Visits.xls incorrect password
It is sometimes possible to guess the password of an encrypted file; see below for more information. However, this process is likely to take a long time. Instead, I decided to examine the disk image more closely, as described in the next section. This marks the end of the first phase of investigation. It did not meet with success.
Next, I examined the disk image directly, hoping to find files that were inaccessible before. I wanted to concentrate on looking for pieces of readable text in the image file. To that end, I used hexdump with a custom format that displays only printable ASCII characters, not hexadecimal notation. Here is the command I used:
$ hexdump -e '"%06_ax " 64/1 "%_p" "\n"' image |less
Much of the output is gibberish, so I'll just skip over it in this presentation. Here's something that looks meaningful.
002600 .d.o.c...........................J.i.m.m.y.... .J.u.n.g.l...e... 002640 .IMMYJ~1DOC .h8F+-+-..Ou.,...P..Bg.c. . . .... . . . . . ... . . 002680 .c.o.v.e.r.... .p.a.g.e.....j.p.COVERP~1JPG .mMF+-+-...C+-...<.. 0026c0 Bi.t.s...e....x.e. . . . ... . ..S.c.h.e.d....u.l.e.d. .V...i.s. 002700 SCHEDU~1EXE .SSF+-+-...B.,I..................................... 002740 ................................................................ *
This is the root directory, which contains the list of files on the disk. There are two files we already know about, "cover page.jpgc" and "schedu~1.exe". Note that there does appear to be a long name entry present for the latter. There's also a reference to a third file, "Jimmy Jungle.doc". The first letter of its short name has been replaced with a non-printable character, indicating that the file has been deleted. There may have been other deleted files on the disk as well, but if there were, the directory entries have been overwritten.
I returned to this directory listing later, to look more closely and make sure I had interpreted it correctly. Continuing with the image dump, I found this:
004c00 Jimmy Jungle.626 Jungle Ave Apt 2.Jungle, NY 11111..Jimmy:..Dude 004c40 , your pot must be the best . it made the cover of High Times Ma 004c80 gazine! Thanks for sending me the Cover Page. What do you put in 004cc0 your soil when you plant the marijuana seeds? At least I know y 004d00 our growing it and not some guy in Columbia.. .These kids, they 004d40 tell me marijuana isn.t addictive, but they don.t stop buying fr 004d80 om me. Man, I.m sure glad you told me about targeting the high s 004dc0 chool students. You must have some experience. It.s like a guara 004e00 nteed paycheck. Their parents give them money for lunch and they 004e40 spend it on my stuff. I.m an entrepreneur. Am I only one you se 004e80 ll to? Maybe I can become distributor of the year!..I emailed yo 004ec0 u the schedule that I am using. I think it helps me cover myself 004f00 and not be predictive. Tell me what you think. To open it, use 004f40 the same password that you sent me before with that file. Talk 004f80 to you later...Thanks,..Joe ....................................
This looks very interesting, not to mention incriminating. After this is more gibberish mixed up with fragments of text, such as this example:
006880 ............,.......8.......D.......P.......X.......`.......h... 0068c0 ................Jimmy Jungle..o..........imm........0000. Ju.... 006900 .....STC.........STC........Normal.u........0000tl.u........9.TC 006940 ........Microsoft Word 10.0.@...........@......P....@......_.... 006980 ................................................................
This appears to be part of the deleted file "Jimmy Jungle.doc" which was mentioned in the directory. Moving along again.
009200 ......JFIF.....`.`.....C................................... $.' 009240 ",#..(7),01444.'9=82<.342...C...........2!.!22222222222222222222 009280 222222222222222222222222222222..........."...................... 0092c0 ......................................}........!1A..Qa."q.2....#
JFIF stands for "JPEG File Interchange Format." This looks like the beginning of a JPEG file, possibly the actual data from "cover page.jpgc". After this, there's a lot more nonsense characters (probably compressed JPEG data), and then something truly interesting.
00cec0 ...(...(...(...(...(...(...(.................................... 00cf00 ................................pw=goodtimes.................... 00cf40 ................................................................
The letters "pw" suggested "password." Could this be the password for the zip file I had found? I tried it out right away.
$ unzip schedu~1.exe Archive: schedu~1.exe [schedu~1.exe] Scheduled Visits.xls password: inflating: Scheduled Visits.xls error: invalid compressed data to inflate $ ls -l "Scheduled Visits.xls" -rwxr-xr-x 1 bob users 0 May 23 11:20 Scheduled\ Visits.xls*
Well, it didn't say that the password was wrong, but it evidently I didn't have enough of the file to recover anything. Too bad. Disappointed, I returned to the image file.
00d000 PK.........Z.,.U`......B......Scheduled Visits.xls..1*.I.....p.. 00d040 ..1..H.<K.u...Q..*6.$..~uF..NVO....`6T....#....R......#-4..HT.b. 00d080 ^.?.Rr..f.J ....x.5kUM....a_...SA#.;.Qk.........I....;.2.VS....t ... 00d900 ...N(.}.H.-......#.vQ..!.!.qPK...........Z.,.U`......B.......... 00d940 .. .......Scheduled Visits.xlsPK..........B..................... 00d980 ................................................................
This looked familiar -- it's the zip file I had been working on. However, notice that the data here is significantly longer than the 1000 bytes found before. This might be the complete contents of the file. There's nothing of interest after this in the image file, so this phase is now over. I now had some good leads to follow up on. In the next phase, I recovered the files that were found.
I used the Linux-based dosfsck program to try to recover the files. Another possibility would be to copy the image to a floppy (using the command given above) and use Windows-based recovery tools like ScanDisk. Here are the steps I followed, as root.
# cp image image.fix # losetup /dev/loop0 image.fix # dosfsck -u /jimmyj~1.doc -f -r /dev/loop0 dosfsck 2.8, 28 Feb 2001, FAT32, LFN Undeleting JIMMYJ~1.DOC Wrong checksum for long file name "Scheduled Visits.exe ". (Short name SCHEDU~1.EXE may have changed without updating the long name) 1: Delete LFN 2: Leave it as it is. 3: Fix checksum (attaches to short name SCHEDU~1.EXE) ? 3 /cover page.jpgc Contains a free cluster (420). Assuming EOF. /cover page.jpgc File size is 15585 bytes, cluster chain length is 0 bytes. Truncating file to 0 bytes. /Scheduled Visits.exe File size is 1000 bytes, cluster chain length is > 1024 bytes. Truncating file to 1000 bytes. Reclaimed 31 unused clusters (15872 bytes) in 1 chain. Perform changes ? (y/n) y /dev/loop0: 4 files, 73/2847 clusters # losetup -d /dev/loop0 # mount -o ro,loop image.fix /mnt # ls -la /mnt total 48 drwxr-xr-x 2 root root 7168 Dec 31 1969 ./ drwxr-xr-x 21 root root 4096 Oct 12 15:30 ../ -rwxr-xr-x 1 root root 1000 May 24 08:20 Scheduled\ Visits.exe\ \ \ \ \ \ * -rwxr-xr-x 1 root root 0 Sep 11 08:30 cover\ page.jpgc\ \ \ \ \ \ \ \ \ \ \ * -rwxr-xr-x 1 root root 15872 Dec 31 1979 fsck0000.rec* -rwxr-xr-x 1 root root 20480 Apr 15 2002 jimmyj~1.doc* # mkdir fix2 # cp /mnt/* fix2 # umount /mnt # file fix2/* fix2/Scheduled Visits.exe : Zip archive data, at least v2.0 to extract fix2/cover page.jpgc : empty fix2/fsck0000.rec: JPEG image data, JFIF standard 1.01, resolution (DPI), 96 x 96 fix2/jimmyj~1.doc: Microsoft Office document data
The long file name for "schedu~1.exe" was recovered (note the trailing spaces), but not the complete contents of the file. An unattached chain of disk blocks was found, which appears to contain a JPEG image. That could be the original contents of "cover page.jpgc". Finally, the deleted file "jimmyj~1.doc" was recovered (there doesn't seem to be any way to make dosfsck restore the long name, "Jimmy Jungle.doc").
In order to check that the files were recovered correctly, I scanned through them with hexdump, but did not notice anything out of the ordinary. Next, I opened "fsck0000.rec" with an image viewer, and it looked fine. I renamed it to "cover_page.jpg", correcting the extension and removing the spaces, since spaces in filenames sometimes cause trouble.
I used Star Office to open "jimmyj~1.doc". It contained the same text I found before, but nicely formatted. I saved a copy as "Jimmy_Jungle.txt" and renamed the Word file to "Jimmy_Jungle.doc".
Next, made another attempt at recovering the zip file. This time, I simply copied a slice right out of the image file using this command:
$ dd if=image of=Scheduled_Visits.zip bs=512 skip=104 count=5
In order to verify that file matches the partial one I found before, I used the following command to get the first 1000 bytes from the new file and compare them with the old file.
$ head -1000c Scheduled_Visits.zip |diff - files/schedu~1.exe
No output means no differences. Good. Next I unzipped the contents, using the password discovered earlier.
$ unzip -v Scheduled_Visits.zip Archive: Scheduled_Visits.zip Length Method Size Ratio Date Time CRC-32 Name -------- ------ ------- ----- ---- ---- ------ ---- 16896 Defl:N 2270 87% 05-23-02 11:20 8d6055c7 Scheduled Visits.xls -------- ------- --- ------- 16896 2270 87% 1 file $ unzip Scheduled_Visits.zip Archive: Scheduled_Visits.zip [Scheduled_Visits.zip] Scheduled Visits.xls password: inflating: Scheduled Visits.xls $ ls -l "Scheduled Visits.xls" -rw-rw-rw- 1 bob users 16896 May 23 11:20 Scheduled\ Visits.xls $ file "Scheduled Visits.xls" Scheduled Visits.xls: Microsoft Office document data $ mv "Scheduled Visits.xls" Scheduled_Visits.xls
It worked! Zip files include an internal CRC32 checksum to ensure file integrity. Since there was no error message about a CRC32 mismatch, it is extremely unlikely that this file was recovered incorrectly. I opened "Scheduled_Visits.xls" with Star Office and saved a copy in text format.
All of the files on the disk had now been recovered. However, to be more certain that everything was done correctly, I next went back to look more closely at the disk image.
In this phase, I wanted to verify that my interpretation of the disk image was correct. Instead of scratching my head over a lot of hexadecimal numbers, I wrote a perl script to do the decoding. Since the MSDOS FAT filesystem is fairly simple, this was not a difficult task. For information on the filesystem structure, I turned to the Linux kernel source code. Most of the information came from the msdos_fs.h header file. The full output of the script is here: readimg.txt.
The filesystem consists of one or more reserved sectors (including the boot sector), followed by one or more copies of the File Allocation Table, then the root directory, and finally the data area. Since each of these has a variable size, the script starts out by reading the boot sector, which contains enough information to calculate the sizes. The root directory was calculated to start at offset 0x2600, which confirms my conclusion from phase two.
The root directory contains several entries for each file: a number of long name entries, and a short name entry which contains the rest of the information about the file. For some reason, the long name entries are stored in reverse order. The filename entries all appear to be as expected.
All of the cluster lists that were found consist of consecutive numbers. This means that the filesystem has not become fragmented through heavy use. Everything is neat and orderly, which makes recovery of files easier. The starting address and length of the chain for the "Scheduled Visits.exe" file agree with the values used to recover that file. The lists for "Jimmy Jungle.doc" and "cover page.jpgc" end with question marks. In the first case, that is because the chain of clusters was dismantled when the file was deleted. In the second case, the list starts at a cluster that is marked "unused", so there's no chain to follow. It is interesting that the number given for the first cluster, 420, is exactly ten times the starting number of the unconnected chain that was found. This seems to point to a deliberate modification to the starting address.
The file lengths are as expected. The file "cover page.jpgc" has a slightly shorter length than the file that was recovered by dosfdsk. This is because space is allocated to files one complete block at a time. If part of the last block is not needed by the file, the unused space left over is called "slack." The dosfsck program did not know about the slack space left by this file, since it was only looking at an unattached chain of blocks, not the directory entry it was once attached to. After truncating the file at the proper length (head -15585c cover_page.jpg >cover_page2.jpg), it no longer contains the password that was discovered. Thus, the password was in the slack space and not in the file itself. It might have been left there by another file which was deleted before "cover page.jpgc" was copied onto the floppy, or it might have been present in a buffer in the computer's memory when "cover page.jpgc" was written.
During this analysis, nothing was found to cast doubt on the previous conclusions. This brings the investigation to a successful completion.
It is somewhat unsatisfying that the success of this investigation appears to depend on the chance of finding the zip file password stored on the captured floppy disk. What if the password had not been found? There are a number of programs available which try to guess the password for an encrypted file. Some of them are commercial or shareware software, while others are free. However, since the guessing process is quite simple, I chose to write my own program instead. I found information on zip encryption in the PKZIP Application Note which is available from the PKWARE Web Site.
There is an extremely large number of possible passwords, so it is important to decide which ones are the most likely, and focus attention on those. If the two suspects, Joe Jacobs and Jimmy Jungle, were in a habit of sampling their wares, it might be expected that they would have a hard time remembering a complicated password. Some simple passwords that come to mind are single words, words with numbers added, and combinations of two words. (Indeed, the password they chose, "goodtimes", is composed of two English words, but there's no way to know that a priori.) I used the word list found in /usr/share/dict/words on my machine, which contains about 40,000 words. More specialized word lists have been created for this purpose. In this case, including drug terms in the list would seem to be a good idea.
The next problem is how to reject incorrect passwords. Encrypted zip files store a byte (or sometimes two) with a known value in the encryption header, as a quick way of detecting mistyped passwords. However, about one in 256 incorrect passwords can be expected to decrypt this known byte to the proper value just by chance. In a dictionary of 40,000 words, over 150 such false positives would be expected, and I'm going to test far more passwords than that. Therefore, it is necessary to have several more known bytes to distinguish the right password from the wrong ones. I created and compressed several small Excel spreadsheet files, and found that the first few bytes always came out the same. (More sophisticated techniques are possible, such as attempting to decompress the decrypted file with each guessed password. However, that would increase the complexity of the program greatly.)
I wrote a small C program to guess passwords for the recovered zip file. It is not heavily optimized, but is able to test around 300,000 passwords per second on my machine (a 350MHz Pentium-II). At that rate, it was able to try over one billion one-word, word-plus-number, and two-word passwords in under two hours. It located the correct password, and stumbled on only one incorrect password: "implyinspected". That mistake can easily be eliminated, because it does not decrypt the zip file successfully.
If the suspects had chosen a stronger password, this password guessing attack would be much more difficult or even impossible. Running times quickly reach into months or years as more possible passwords are considered. Faster computers could be used, or networks of computers working in parallel. However, the computing resources available to a local police department are limited, and they have many other cases that cannot be neglected. Faced by a sufficiently strong password, it may not be possible to recover the password using these techniques.
Other attacks against zip file contents are possible, such as the more advanced known plaintext attack used by the pkcrack program. However, it requires more known plaintext bytes than I have available in this case.