The folks from Digital Forensic Research WorkShop have created a unique challenge for you. Your mission is to analyze a recovered floppy and answer the questions below. What makes this challenge unique, you will need to read the police report before continuing your challenge. Just like an investigation in the real world, you will have some background information and some evidence, but its up to you and your technical skills to dig up the answers. Below is the dd image of the recovered floppy. This is the image that will provide you the answers, providing you can 'extract' the data. |
In any forensic analysis, it is critical to keep a proper chain of custody form in order to prove that the evidence being analyzed has not been tampered with. Since this is a simulation, however, I will assume that this has been properly kept.
The first thing after downloading the floppy image is to verify the MD5 signature.
$ md5sum image.zip |
The MD5 hash matches, so we can unzip the file using the "unzip" command. For integrity purposes, an MD5 hash of the unzipped floppy image is created so we can later verify that the image had not changed at all during forensic analysis. To prevent accidental modification, the file is also chmod'd to be read-only for everyone. Finally, the file is copied to a morgue directory and the MD5 hash is verified one last time.
$ unzip image.zip |
Forensic analysis can now begin. Analysis will be done using The @Stake Sleuth Kit (TASK). As per it's README file, TASK is "an open source forensic toolkit for analyzing Microsoft and UNIX file systems...TASK integrates the file system analysis tools of The Coroner's Toolkit (TCT) by Wietse Venema and Dan Farmer, with TCTUTILS and adds new features." What TASK gives us over TCT and TCTUTILS is support for FAT file systems.
The first thing to do is to run the strings command against the floppy image to pull out any interesting strings that may be present. This will be helpful in determining what file system the image is and may give us some clues later on.
$ strings floppy.image > floppy_strings.txt |
Looking through the strings output, it appears that the image came from a FAT12 file system. The strings output also contains something that looks like an email or document from the suspect to someone named Jimmy and the string "pw=goodtimes" which may be a password.
To verify that the file system is a FAT12 file system, fsstat is run against the image. Fsstat is a program from TASK that will give information on the file system used in an image. Since we believe it is a FAT12 file system, the FAT12 option is passed to the program.
$ fsstat -f fat12 ./floppy.image |
The output is located here and verifies that this is a FAT12 file system. Some other useful information it gives us is that each sector is 512 bytes and each cluster is 512 bytes. Data storage in the FAT file system is divided into blocks of data called clusters. Clusters are made up of blocks of sectors, which are just blocks of data themselves. Since the cluster size on the floppy image is equal to the sector size, clusters only contain one sector. This will be useful for recovering files later on.
Additionally, the output gives us the contents of the FAT table.
The FAT12 file system is the file system Microsoft Windows uses on floppy disks. There are two other FAT subtypes, FAT16 and FAT32. According to Microsoft, "The basic difference in these FAT sub types, and the reason for the names, is the size, in bits, of the entries in the actual FAT structure on the disk. There are 12 bits in a FAT12 FAT entry, 16 bits in a FAT16 FAT entry and 32 bits in a FAT32 FAT entry."
The FAT12 file system is composed of four important regions:
The two regions that contain file information are the FAT and Root Directory Region. The Root Directory region contains a list of all of the files and information associated with them, including file name, time stamps, file size, and most importantly, the first sector the file is located in. By knowing the first sector, we can look at the FAT table to determine every sector the file has data in.
The FAT table is a table that contains an entry for every sector on the disk. In that entry is either a link to the next sector containing the file or an End Of File (EOF) marker, meaning that sector is the last sector for the file. Unused sectors have a zero value in its FAT table entry.
When the operating system wants to access a file, it will first look up the file's first sector in the Root Directory Region. That sector will then be looked up in the FAT table to find the list of sectors the file is stored in. Once this list is complete, the file can be completely accessed.
We can see the FAT tables contents in the output from fsstat. At the bottom of the output, the FAT contents show that sectors 73-103 and 104-108 are used for files. This will be helpful later on when we try to extract the files.
To make forensic analysis easier, we will use the Autopsy Forensic Browser . Autopsy is an HTML-based graphical interface to TASK and some standard UNIX utilities created by the developers at @Stake. Anything we can do using Autopsy can be done at the command line with TASK, but using the browser to do this makes analysis easier. To set Autopsy up, we need to add a line to the fsmorgue configuration file so Autopsy knows what image to look at. The following line is added:
floppy.image fat12 A: EST5EDT |
The first entry is the image's filename, the second is the file system type, the third is the original mount point and the last is the time zone. Since the original mount point is not known, A: is put down because that is typically what the floppy drive on Windows machines is.
Finally, we become root and run the following command:
# autopsy 8888 localhost ============================================================================ |
We can now connect to the URL given above and start my forensic analysis.
The first thing to do is to run an Integrity Check to verify that the MD5 of the image is still good. Selecting the "Image Integrity" mode line and clicking on the "Generate" button does this. The output comes back showing that the MD5 value is still good for the image, so we can continue. From now on, we will be able to verify the MD5 at a touch of a button.
Next, the "File Browsing" tool is run to see what files are present in the image. It does this by looking in the Root Directory Region of the file system and retrieving all of the relevant information, including information on deleted files that still show up in the region. The output is located here.
The output shows we have three files, one of which was deleted. The three files are "cover page.jpgc", "Jimmy Jungle.doc", and "Scheduled Visits.exe".
It may be helpful to look at the time associated with all of the files. In a FAT12 file system, three dates and times are associated with files. They are Creation Time, Last Write Time and Last Access Time. The Creation Time is the date and time the file was created or copied to a disk. The Last Write Time is the date and time that the file was last modified. The Last Access Time is actually only the date that the file was last accessed.
To get these dates/time quickly, we can jump to the command line and run the fls and mactime commands, both from TASK. Fls will list all of the files within a forensic image and mactime will create a timeline based upon this list.
$ fls -f fat12 -m A: ./floppy.image | mactime > times.txt |
The actual output from above is located in times.txt but is summarized below.
Date / Time | Time Attribute | File Name |
Mon Apr 15 2002 14:42:30 | Modified | A:/Jimmy Jungle.doc (_IMMYJ~1.DOC) (deleted) |
Fri May 24 2002 08:20:32 | Modified | A:/Scheduled Visits.exe (SCHEDU~1.EXE) |
Wed Sep 11 2002 00:00:00 | Accessed | A:/Scheduled Visits.exe (SCHEDU~1.EXE) |
Accessed | A:/Jimmy Jungle.doc (_IMMYJ~1.DOC) (deleted) | |
Accessed | A:/cover page.jpgc (COVERP~1.JPG) | |
Wed Sep 11 2002 08:30:52 | Modified | A:/cover page.jpgc (COVERP~1.JPG) |
Wed Sep 11 2002 08:49:48 | Created | A:/Jimmy Jungle.doc (_IMMYJ~1.DOC) (deleted) |
Wed Sep 11 2002 08:50:26 | Created | A:/cover page.jpgc (COVERP~1.JPG) |
Wed Sep 11 2002 08:50:38 | Created | A:/Scheduled Visits.exe (SCHEDU~1.EXE) |
We now have a timeline for the files. Above we see that "Jimmy Jungle.doc" was last modified on Apr 15, 2002 14:42:30, was last accessed on Sept 11, and was created on Sept 11 at 8:49:48. The creation time is actually the time the file was copied to the floppy, not the time it was actually created. Unfortunately, the FAT12 file system does not keep track of when a file is deleted, but we can probably assume that it was shortly after creation date.
According to the output, "Scheduled Visits.exe" was last modified on May 24 at 08:20:32, last accessed on Sept 11 and was created on Sept 11 at 8:50:38. Again, this is when the file was copied to the floppy, not it's actual creation date.
Finally, the output shows that "cover page.jpgc" was last accessed on Sept 11, last modified on Sept 11 at 8:30:53 and was created on Sep 11 at 8:50:26.
We can look at the first file, "cover page.jpgc", by clicking on it's name and then clicking on "ASCII Report". This gives us a summary of the file located here. It should be noted that the file is reported as being a "PC formatted floppy with no filesystem", not as a JPEG image. The file is then exported by clicking on the "Export" link and saving it to disk.
Looking at the ASCII report of the deleted file, "Jimmy Jungle.doc", we see that it is reported to be a Microsoft Office Document. The name of the file is also reported as _immyj~1.doc. This is because of the way that FAT file systems delete files. When a file is deleted in a FAT filesystem, the system will put a sigma character (hex E5) as the first letter of the file in the file's Root Directory Entry to mark it as no longer used and set the sectors used by the file in the FAT table to 0. This is what allows us to easily recover deleted files. Nothing is actually done to the file data itself, only it's FAT table entry.
This file is also exported to disk.
Looking at the ASCII report for the final file, "Scheduled Visits.exe", we see that the file is not reported as an executable, but as a "Zip archive data, at least v2.0 to extract". This probably means that the suspect changed the extension of the file to try to hide its true nature to investigators. Like the previous two, this file is exported.
Back at the command line, we see that the extracted images of "Jimmy Jungle.doc" and "cover page.jpgc" do not match the file sizes as reported in the ASCII report for each file. The file size of "Jimmy Jungle.doc" is reported as 20480 bytes and the extracted file is only 512 bytes. The file size of "cover page.jpgc" is reported as 15585 bytes and the extracted file is only 512 bytes. "Scheduled Visits.exe" is reported and extracted with the same amount of bytes.
This is due to the fact that Autopsy only saw that the files were contained in one or two sectors. To extract the full files, we need to first do a little math.
We know from the ASCII report (and Root Directory), that "Jimmy Jungle.doc" begins in sector 33 and is 20480 bytes long. Each sector is 512 bytes. By dividing 512 into 20480, we get 40, the number of sectors the full document is stored on. Now that we know how many sectors the file contains, we can use the dd command to extract the full image. Note that the FAT table contents displayed in the fsstat output did not show these sectors as being used because the file was deleted, and therefore it's sectors marked as unused in the FAT table.
$ dd if=floppy.image bs=512 count=40 skip=33 of=jimmy.doc |
The file has now been fully extracted and matches the size as reported in the Root Directory. Since the size of the file divided evenly into 512, the file used all of the space in the last sector that contained it. If it had not divided evenly, then it would have only occupied a portion of the sector and there would be slack space available to examine.
Slack space is any unused space at the end of a sector that is occupied by a file. Slack space occurs because the FAT12/16 file system will only allocate whole sectors to files, not partial ones. So, if a file does not take up the entire amount of space in a sector, the remaining portion is not assigned to other files and is wasted. This is slack space. Slack space is important to look at as someone could hide valuable data in this area and it would never normally be seen or data from previous files that resided in that sector could still be present.
The next file that needs to be extracted, "cover page.jpgc", is reported as being 15585 bytes long and as starting is sector 451. Dividing 512 into 15585 gets us 30.4, which means that the file is contained in 31 sectors and has some slack space in the last sector. To extract the exact size of the file, the dd command will be run a little differently. The file starts in sector 451, so if we want to pull the exact size of the file, we need to skip 451*512=230,912 bytes.
$ dd if=floppy.image bs=1 count=15585 skip=230912 of=coverpage.jpg |
However, running the file command against the extracted file still reports it as a "PC formatted floppy with no filesystem". Unless the file is somehow encrypted, we did not pull the right data out of the image. Looking in the output from the strings command against the floppy image, we see the string "JFIF". The JFIF string is what is always seen at the beginning of a JPEG file. Therefore, we probably started pulling the image from the wrong sector. To find out what the correct sector to pull the image from is, we can run the following command:
$ strings -t d floppy.image | grep JFIF |
This shows us the byte offset in the floppy image that the string appears in. Dividing 37382 into 512, we get 73.01. This means that the image file probably begins in sector 73. This matches the fsstat output which told us that sectors 73-103 were in use. To test this theory out, we use the following dd command to pull the file:
$ dd if=floppy.image bs=1 count=15585 skip=37376 of=coverpage2.jpg |
That seemed to do it! Since the file size did not divide evenly into 512, there is some slack space associated with the file. In order to grab that file with all of its slack space included, the following command is used:
$ dd if=floppy.image bs=512 count=31 skip=73 of=coverpage_jpg.slack |
Now the entire file, including it's slack space can be forensically analyzed.
Since we had trouble with the extraction of the other files from the image, we should test and make sure that the extracted file of "Scheduled Visits.exe" works well. Since it is being reported as a zip image, we can use the unzip command to verify that it is a valid zip image. The extracted file is copied to an easier filename to type for convenience.
$ md5sum floppy.dd-A..Scheduled.Visits.exe........SCHEDU.1.EXE..raw |
As suspected, the file was not completely extracted.
Jumping back into Autopsy, we can look at the Data Browsing tool to view the sector that the file starts on, sector 104. Entering 104 in the "Sector Number field", leaving the type as "Regular (dd)" and hitting the Display button pulls up the ASCII contents of sector 104. Clicking on the Hex display link gives us a hex view of the sector, which is what we want. If we click on the next link, we can view the next sector's hex display. We keep hitting next until we see some hex values that look like EOF markers, typically hex 00. This occurs in sector 108, displayed here . The last few hundred bytes of this sector are all hex value 00, the EOF marker. This leads us to believe that the file is stored in sectors 104-108 (5 sectors). Again, this matches the fsstat output that told us that sectors 104-108 were in use. We can use the following dd command to extract the file:
$ dd if=floppy.image bs=512 count=5 skip=104 of=sched.exe |
This seems to work as we now see that the file contains a supposed Excel spreadsheet named "Scheduled Visits.xls". Since we extracted the file to the end of the last sector, there is no extra slack space to get.
The individual files can now be examined. This is discussed in question 5 below.
After examining all of the files, the MD5 hashes are confirmed one more time to verify that no changes have been made to any of the files. The hashes are verified to be correct and our forensic analysis is complete.
This document was extracted as jimmy.doc. First, we verify the MD5 hash on the file to make sure that is has not changed since being extracted from the floppy image.
$ md5sum jimmy.doc
b775eb6a4ccc319759d9aaae1e340acc jimmy.doc
This matches the original MD5 hash so we can proceed. Next, we run the file command to see what type of file it actually is.
$ file jimmy.doc
jimmy.doc: Microsoft Office Document
This verifies that the file is actually a Microsoft Office document. A forensic analyst should never rely upon the extension of a file as it is too easy to modify the file's extension to try to mask what the file truly is.
Next we run strings against the file to pull out anything that may be interesting to is. The output from this shows some of the contents of what was written in the document as well as the strings "Microsoft Word 10.0" and "Word.Document.8". These strings are usually indicative of Microsoft Word 2000 or Word XP being used to create a document.
Now we open a copy of the file in Microsoft Word to see its contents. The file contains a letter from the suspect to his supplier, Jimmy Jungle, which discusses selling pot to high school kids. The document also says to use the same password as sent before to open up the schedule. This is important as it gives us clues as to what to look for in order to unzip up the Scheduled Visits spreadsheet.
Looking at the properties of the file does not give us much information except that the title of the document is "Jimmy Jungle", the author is "0000" and the company is "OOOO".
Opening up the file in a hex editor does not reveal any more information as the metadata that is usually associated with Word documents, such as the MAC address of the computer it was created on, does not exist.
The file was extracted from the image into two files, coverpage2.jpg and coverpage_jpg.slack. Coverpage2.jpg is the actual image and coverpage_jpg.slack is the image with its slack space. Once more we verify the MD5 hashes on both files.
$ md5sum coverpage2.jpg
e30e8ecec4500678f7270e96b1d5663b coverpage2.jpg
$ md5sum coverpage_jpg.slack
28cfe7fe68f5b13071a2ce0b87ff1e9b coverpage_jpg.slack
The hashes match so we can continue. Running the file command against coverpage2.jpg reveals that is a "JPEG image data, JFIF standard 1.01, resolution (DPI), 96 x 96". Running strings against coverpage2.jpg reveals "JFIF", the signature of a JPEG image.
Loading coverpage2.jpg into KDE's KView, reveals that the image a cover of "Pot Smokers Monthly" which is "Your monthly guide to the best pot on the planet!". The cover also tells us that "This month's featured pot grower ,smoker and seller is Jimmy Jungle". This is the cover page that the suspect refers to in his document to Jimmy Jungle. Opening this file up in a hex editor does not reveal any more information.
Since coverpage_jpg.slack contains the same data as coverpage2.jpg, the file command reveals it is a JPEG file. However, running the strings command against coverpage_jpg.slack gives us something interesting. Within the output from strings is the string "pw=goodtimes". This is probably the password that is referred to "Jimmy Jungle.doc". Opening coverpage_jpg.slack in a hex editor reveals that the string is indeed at the end of the file in the slack space.
This file was extracted from the image as sched.exe. Again, the first thing we do is verify the MD5 hash of the file.
$ md5sum sched.exe
4e0be275e3040701145e3235dd43ea4a sched.exe
The hash matches, so we can continue. Running the file command against sched.exe reveals that the file is a "Zip archive data, at least v2.0 to extract", not an executable as the extension would lead us to believe. Running the strings command against the file reveals only the string "Scheduled Visits.xls", which may be a file contained within the archive. Since it is a zip archive, we can use the Linux unzip utility to extract any files in the archive.
$ unzip sched.exe
Archive: sched.exe
[sched.exe] Scheduled Visits.xls password:
inflating: Scheduled Visits.xls
When we ran the unzip utility we were prompted for a password. Using the password we got from the coverpage.jpg slack space, goodtimes, we were able to unzip "Scheduled Visits.xls". To verify that the file is actually a spreadsheet, we run the file command against it:
$ file Scheduled\ Visits.xls
Scheduled Visits.xls: Microsoft Office Document
It looks like the file is an actual spreadsheet. Running the strings command against the xls file, reveals a number of high school names as well as the string "Microsoft Excel".
Since this seems to be an Excel spreadsheet, we make a copy of the file and open it up in Microsoft Excel. The file contains a schedule of high schools that the suspect plans to visit or has visited already. This is the same schedule mentioned in the "Jimmy Jungle.doc" file. The properties of the file do not give us any other information except that the author of the file is "CSTC" and the company is "ARFL".
Loading the file into a hex editor does not reveal any more information as, like the Word document, most of the metadata is missing.
The key to figuring out what Microsoft program created the cover page JPEG is found by first looking at the JPEG file format. According to the JPEG specs, after the initial APP0 file marker used to identify the JPEG FIF, additional APP0 marker segments may be used by applications to hold application-specific information. The information stored here by applications is typically used for internal use and identification. Most graphics programs will put their name in there for identification. For example, as seen here, Adobe Photoshop 3.0 puts it's name as well as a number of application specific options in the header.
The coverpage2.jpg file header, shown here, does not contain any information that reveals what the application that created the image is. However, since we know a Microsoft program created the file (from the question itself), the best way to figure this out then is to hunt down all of the Microsoft programs that can create JPEG images.
The only two I could find were Powerpoint and Microsoft Paint. In Powerpoint, you are able to create a JPEG image from one of the slides, but you are not able to control the size of the image. This makes it unlikely that Powerpoint created the image as the image is only 208x199 pixels. Additionally, any JPEG I created with Powerpoint contained the string "Microsoft Powerpoint" within the header.
Microsoft Paint, however, allows you to specify the size of the image, so an image with the same dimensions as coverpage.jpg was created. Examining this file in a hex editor shows that with the exception of 3 characters (0000:000F-0000:0011), the header is exactly the same as coverpage2.jpg's header.
Since it is highly unlikely that any other software would create a header like this, then coverpage.jpg was in all likelihood created by Microsoft Paint. The difference of the 3 characters is probably due to different versions of Microsoft Paint.