Scan 17 Results

Analysis provided by Jeffery Stutzman, Cisco Corporate Information Security
The Honeynet Project

The Challenge:
In November, 2000 the Honeynet Project collected every probe, attack, and exploit launched against the Honeynet. The challenge is to analyze this month's worth of data and analyze the blackhat's tools, tactics, and motives.

For this exercise, I chose to use the Snort logs to attempt to determine what early warning signs we might see tipping us off of impending attack. Instead of compiling all of the logs together, I chose to use a methodology known as Statistical Process Control (SPC). SPC is a process used very commonly in the manufacturing world to measure defects in products at the factory floor. Very simply, SPC looks at each process individually, and then compares them to the aggregate by finding watching defects, and setting control limits to determine how many defects are normal (or can be tolerated). By looking at each Snort reported rule individually, and comparing them to each other we can easily identify trends that might indicate a future attack. Here's how it works

Initial Parsing
The first step I performed in doing the IDS analysis was to simply parse the data. I first parsed by fixed segments at the Month, Day, Year, and after the Snort IDS rule. I then performed "text to columns" (in Excel), to parse by ":", then "-", and removed all misc. characters. You can find the results as a tab-deliminated text file, called

Nov_Data.txt

Total reports by day
Once all of the snort data was parsed, I counted the number of occurrences of each Snort rule reported per day. By counting only the number of snort alerts, I was able to both maintain an apples to apples count, and maintain sanity. The first think I like to look at (at least with Honeynet) is the RPC activity. RPC attacks have been very popular in the past, and the numbers have reflected that activity in our intrusions. So, as you look at the table below, there are some unfamiliar terms you'll see. Let me explain. Before I go much further though, I must explain though, I'm no statistician, so please spare me the flames.

Here's how it works: Look first at the portion of the table labeled “RPC: The Snort alert rules appear first, followed by something called 3DMA. 3DMA means 3-day moving average. I used the current day, and two days preceding to create a moving average to help identify trends. In the next portion of the analysis those trends will become readily apparent. Next, at the end of the table you’ll see a column labeled “UCL”. UCL stands for upper control limit. Taking the standard deviation of the 30-day sample, and by multiplying by two I derived the upper control limit (UCL). The UCL gives us a marker if you well. If the 3DMA goes above the UCL, we should begin paying attention to that indicator. Also, if the 3DMA increases for 3 or more days, we should also pay attention. In statistics, this is called a run. So, the results of the data analysis can be found in the file

Nov_Parsed_Daily.txt

A picture speaks a thousand words.
The next step in the process is to put everything together in a graphic. Why a graphic? A picture speaks a thousand words. Graphics make it so simple even my boss can understand it. I can take the pictures and show them without explaining beyond the fact that this is a statistical process, and the numbers showed us a warning. Here we go. The first thing I like to look at is the port scanning activity. So, I take the port scanning, calculate the 3DMA and UCL, and then plot them out. Figure 1 is a graph of the port scanning activity. Notice the two runs from days 3-7 and 27-30.

Figure 1

Next, I plotted the rest of the RPC related activity individually, and again, a 3DMA and UCL. In this instance, looking at RPC data, and the port scans shown above; it becomes pretty clear that an RPC related attack is coming. I can say this with relative certainty, because I know that Red Hat 6.2 servers were placed in service on the 4th and 25th of November, and Sparq 2.6 boxes were placed in action on the 5th and 25th. Now, the blackhat knows there are systems online, and will likely know what type they are. Next he’ll try and identify open ports, someplace he know how to hack. In this case, I’m looking at port 111, RPC. SYN-FIN scanning has become a popular means of identifying open ports. The graphic of SYN-FIN scanning to port 111 is shown in Figure 2.

Figure 2

Again, I find it curious that there is a large amount of activity on the 6th. This amount is well above the UCL, and should cause concern. So, lets continue the quest for other indicators. However, it's pretty safe to say at this point that there is an attack to port 111 imminent. By looking at the scans, and then the activity at 111, the trends match. The next Snort rule I looked at was Portmap status queries, shown in Figure 3. Again, activity was noted in the form of one Snort report at day 4 and again on the 7th. My guess is that someone is checking to see if the port is in fact alive. The UCL on this graphic was .4. So, now we have three charts with out of bounds activity from the 3rd through the 7th of November. So, I’m going to take the hard road, and make the call that we will very likely see an RPC attack around the 7th or 8th, and will likely see something else around the 30th (however this is still just a WAG. The guess is based only on the scanning noted in Figure 1, and no further information.)

Figure 3

Putting it all together

What trends did you identify? See Figure 4 above. Although this activity is a bit hard to read because of the amount of data on in the chart, one can quickly see there are trends leading up to each and every attack.
What does this activity tell us about the blackhat community? This activity demonstrates just how active and random the blackhat community is. These systems have little value, yet they were probed and attacked almost daily. They may have little value to the owner, but they have great value to an attacker. This also demonstrates certain blackhats traget the 'easy kill'. Regardless of who you are, if you are connected to the Internet, you are a target.
What if anything happened in the firewall and IDS logs that gave us a clue of what was coming? Could any of the attacks been predicted ahead of time. If so, how? In the case of the Honeynet, each attack was indeed predicted. Because the blackhats had no other reason to be on the network other than to attack it, and because the numbers were so small, we easily saw three days warning on each attack. The exception would have been during the period of the "worms at war", when worms battled for control over the box.
What data did you find more valuable, the Snort alerts or the firewall logs of unique scans? Why? ? I preferred using the snort rules. Because the Honeynet is pure, meaning everyone that comes there is there for only one reason, false positives are minimal. Had I spent more time on the analysis, I would have performed the same analysis using the firewall logs. I'd bet they would have looked the same.
What lesson did you learn from this? On a small (nano) scale, hacks can be predicted. I'm dying to try this out on enterprise wide data.
How long did this challenge take you? The challenge took me approximately 6 hours, plus editing. Had I performed the analysis through scripting, I could have shortened the time considerably.
Bonus Question: What successfull attack was missed? The exploited system we missed was a Win98 honeypot put online in October and left online during the beginning of November, specifically the system outlined in Know Your Enemy: Worms at War. We missed this attack for two reasons. One, we had disabled NetBIOS scans due to the large volume of logs. As such, we did not log nor were we alerted to when the Win98 system was probed and attacked on this port. Second, Snort did not detect the attack because it had no signature for the attack. The attack was nothing more then a Worm transferring somerandomly named files, this demonstrated how an IDS signature based sensor can fail you. So, we had two layers of data capture (firewall logs and IDS sensor) and both failed us. We did NOT detect this successful attack until the compromised honeypot attempted to scan the Internet.

Convinced? Not yet? OK, well, doing post-attack early warning analysis is like doing a crossword puzzle with the answers in the back of the book. Rest assured, the process works. Figure 4 is a trace of all of the noted activity associated with RPC. Figure 4 brings it all together. For simplicity, I’ve left off the 3DMA and UCL lines. The one interesting thing about this graphic is that is shows very clearly that the activity across all rule sets reported by Snort correlate exactly on the 6th. Each of the RPC rules reported on the 6th peaked on the graph. On the 7th, the Red Hat 6.2 box was compromised using an rpc.statd vulnerability, and a backdoor was installed. Interestingly enough, on the 26th, another RH 6.2 box was placed in service. On the 27th, we noted an immediate increase in scanning on the Honeynet, followed again by a compromise on the 30th, again at rpc.statd, with a backdoor installed. On a side note, I found it interesting that a Windows 98 box was placed in service on October 30th, and was compromised on the first of November by a worm. The attacks from various worms battling each other for control over the box lasted four days.

Figure 4

Conclusions:
In this exercise, I used only the Snort logs for the analysis. Firewall logs can be analyzed the same way, but with less detail. The one lesson learned about using Snort at a Honeynet is that there are no false positives. As a result, we get a pretty clear picture of what happened before, and after each attack. In this case, we had about 4 days notice that something was coming. As the days progressed, the picture became clear that someone was going to hack port 111. We had 4 days to ensure our patches were up to date, and the box was as secure as we could make it. At Honeynet we really wanted to find out what would happen, so the box remained default load, but in real world applications, wouldn't a couple of days notice be nice? As a qualification, we realize the Honeynet is a VERY small sample (nano sample?), and that in the real world we're talking about gigabytes of information per day. Also, I only tested this month using RPC data, but the Know Your Enemy: Statistics pulls the top 10 Snort rules reported at Honeynet and examines them as well. At a nano level, the process works. However, bear in mind, at this point this process is considered to be a proof of concept only, and is still under testing by Honeynet members on enterprise wide data. Please feel free to add to the analysis, we would love to hear from you.

Comments are always welcome.

Take care,
Jeff Stutzman
The Honeynet Project