Project.Honeynet.org

Scan of the Month Challenge - Scan 23 - Beginner Analyst

------------

Writeup:  Curtis Sloan (csloan@masters.ab.ca)


The Challenge:
Members from the South Florida Honeynet team manually generated five different types of portscans from the Internet to a single honeypot. These are not portscans captured from the wild. The term "the wild" is used to describe any host we don't know about outside of our network. In other words, any host other than our own connected to the Internet involved in reconnaissance, an intrusion, and/or system compromise is a system in the wild. During each scan, our network intrusion detection sensor captured each scan and saved it to a binary log file. We used snort to capture each scan in tcpdump format. It's important to note that tcpdump and snort use the libpcap library to capture and store packets from off the wire. So you can learn more about the packet capture technologies used to capture the portscans during this challenge, we have provided links to help get you on the right foot. It is up to you-the beginner analyst-to pull the binary file into a packet decoder such as tcpdump, or ethereal to analyze each scan. Your mission, if you choose to accept it is to answer the questions below the best that you can. 


Questions and Answers:

1) What is a binary log file and how is one created?

A binary log file is a log file (which is a record of a program's or system's actions or activities) recorded in binary format (i.e. the data which is logged is recorded without conversion). The data is captured in binary form and perhaps marked up by the program with some log formatting in binary format, e.g. date and time, sequencing numbers, etc. 
Binary logs usually take up less space than a text or ASCII version of the same information, since ASCII characters represent data instead of actually _being_ the data (for instance, it generally takes 2 or more bytes in ASCII to represent 1 byte of binary). Binary logs capture the actual data, as opposed to an ASCII representation of the data; e.g. when you capture actual packets and their payloads, any binary data (programs, files, etc.) transmitted are actually stored in the log, instead of ASCII translations of those bytes into human readable "English" format. 
The binary format is useful in that the data can be manipulated in a number of ways in its native format, including, but not limited to: viewing, replay, extraction, saving, reverse engineering, compilation, execution, etc., etc.. In contrast, a text translation or rendering of the binary data would no longer be actual data, it would simply be letters representing what the data once was before we converted it. It would still be viewable, and sometimes more easily so, but it would lose much of the functionity of its previous format. In any event, a binary file would be much more compact than even a single text file.
In our specific scenario, snort (www.snort.org) was used in packet logging mode to capture each scan in tcpdump binary format. tcpdump (www.tcpdump.org) is a very common program and it uses a binary capture format that is also very commonly used in packet capturing. tcpdump format is a binary log of packet headers and data recorded using libpcap library. Also, when snort logs in binary mode, the entire packet is logged, not just sections of it. 
How is a binary log file created? Well, by configuring a program to redirect or copy the data it deals with to an appending binary file. In our example, snort (which captures in text by default), is configured to log to a binary tcpdump format file. Here is an example: ./snort -l ./log -b. The -l switch tells snort which directory in which to store the file. The -b switch tells snort not to log in default ASCII text mode, but rather to log in binary tcpdump format. (http://www.snort.org/docs/writing_rules/chap1.html#tth_sEc1.3) 


2) What is MD5 and what value does it provide?

MD5 is a digest algorithm that is used to verify data integrity through the creation of a 128-bit message digest from variable data input (which may be a message of any length). The message digest is claimed to be as unique to the specific variable data input as a fingerprint is to the specific individual. (http://searchsecurity.techtarget.com/sDefinition/0,,sid14_gci527453,00.html) 
Basically, MD5 takes any number of strings as input, reads the strings, and uses a special algorithm to create a "fingerprint" that reflects the unique combination of data contained in the strings. For instance, if you specified a filename as the input string to a program utilizing the MD5 algorithm, it would produce a 32 character (128-bit) "fingerprint" based on the file's contents. Here is an example: 
md5sum.exe SotM_September_submit.txt returns: dae8878d48dad096f7fa12441c4d8a3d *SotM_September_submit.txt 
The program md5sum.exe took the contents of the file SotM_September_submit.txt and created a unique fingerprint based on the contents of the file at the moment of processing. That was before I wrote the previous two sentences. Here is the result now: 
md5sum.exe SotM_September_submit.txt returns: 841a006d49e11c3692b978859fda57d0 *SotM_September_submit.txt. 
Because the contents changed, the fingerprint changed to reflect the unique combination of letters and numbers saved in my text document.
MD5 was created for use with large compressed and/or encrypted archives of sensitive data slated for transfer over the Internet. MD5 ensures that the data has not changed in any way since the author made the data available, either by accidental or malicious modification or corruption. In our specific case, MD5 ensures that the scan's binary log file has not been changed since it was posted by the Honeynet Project team.
The value of MD5 is that it provides a consistent, secure method of "fingerprint checking" on large or sensitive files. 


3) What is the attacker's IP address?

192.168.0.9

It is unlikely that the IP address has been spoofed due both to the private IP address block in use and the recorded TTLs (later on we will see that the attacker's connect() scan RSTs have a TTL=255 which probably means the attacker was on the same physical network as the scan recipient in this case). Later, the attacker spoofs or decoys the following addressess as well:

192.168.0.1
192.168.0.199
192.168.0.254

4) What is the destination IP address?

192.168.0.99

As we will see later, the destination IP address is verifiably so, since it never sends any initial communications during the length of the captured scan. The destination IP address only responds to communications initiated from the attacker's IP addresses. 


5) We scanned the honeypot using five different methods. Can you identify the five different scanning methods, and describe how each of the five works?

Method #1: SYN "half-scan"

"TCP SYN scanning: This technique is often referred to as "half-open" scanning, because you don't open a full TCP connection. You send a SYN packet, as if you are going to open a real connection and wait for a response. A SYN|ACK indicates the port is listening. An RST is indicative of a non-listener." (http://www.insecure.org/nmap/nmap_doc.html#syn)

SYN scanning, or "half-scanning" as it is called, is somewhat stealthy due to the fact that a full 3-way TCP handshake is never completed. However, SYN scanning is far from being clandestine. The term 'stealthy' is being used in these descriptions of scan types as a relative term between each of the scan types. 

Method #2: NULL scan

FIN, NULL, and XMAS scans are all classified as "negative" scans since they look for the absence of a response in order to determine whether a port is open or not. They are considered stealthy since they do not set up the initial portion of a 3-way TCP handshake. However, as noted in the previous description, NULL scans are probably easily identified on most firewalls and/or network intrusion detection systems these days. 
A NULL scan sets all TCP flags to 0, which is abnormal TCP behaviour. Certain firewalls have difficulty detecting or filtering these packets, and let them through. 

"FIN (-sF), NULL (-sN) and XMAS (-sX) scans are all similar. They ... work by getting either a RST back (closed port) or a dropped packet (open port). Of course, the other situation where you might get back a dropped packet is if you've got a packet filter blocking access to that port. In that case you will get back a ton of false open ports." (http://www.insecure.org/nmap/data/nmap_manpage.html) 

Method #3: XMAS scan
See above for background information. 
An XMAS scan is the converse of a NULL scan in that all (or most of) TCP flags are set to 1 (FIN, PSH, URG). The effect is the same as the NULL scan. The name "XMAS" comes (presumably) from the fact that so many TCP flags are set, comparable to "being lit up like a Christmas tree".

Method #4: SYN connect()
Also known as vanilla connect() scanning, or just vanilla scanning, this method gets its name from being very run of the mill as far as it compares to other scanning techniques. This is becuase connect() scanning, as mentioned above, uses a standard OS system call to attempt to totally complete a 3-way TCP handshake in order to discover whether a port is opened or closed. This is method is not stealthy, but it is fast and effective.

"If the port is listening, connect() will succeed, otherwise the port isn't reachable. While making a separate connect() call for every targeted port in a linear fashion would take ages over a slow connection, you can hasten the scan by using many sockets in parallel. Using non-blocking I/O allows you to set a low time-out period and watch all the sockets at once. This is the fastest scanning method supported by nmap, and is available with the -t (TCP) option." (http://www.insecure.org/nmap/nmap_doc.html#connect)

Method #5: Decoy scan (XMAS scan with -D "decoy" option)
Decoy scans use additional real or spoofed IP addresses as source IP addresses in order to confuse the scan recipient as to the true source IP address of the scan. Decoy scans are somewhat stealthy in that they provide "camoflauge" for the attacker. In addition, they can sometimes be effective in eluding network intrustion detection systems since the source IP address changes so much -- thereby not triggering alerts that a similar scan from a single source IP address would have set off. It is likely that this capability has been largely negated in current NIDS rulesets, but it is also likely that counter-strategies are being invented and implemented as well. :-)

 -D <decoy1 [,decoy2][,ME],...>
              Causes  a decoy scan to be performed which makes it
              appear to the remote  host  that  the  host(s)  you
              specify  as  decoys are scanning the target network
              too.  

              Note  that the hosts you use as decoys should be up
              or you might accidently  SYN  flood  your  targets.
              Also it will be pretty easy to determine which host
              is scanning if only one is actually up on the  net­
              work.   You  might want to use IP addresses instead
              of names (so the decoy networks don't  see  you  in
              their nameserver logs).

              It  is  worth noting that using too many decoys may
              slow your scan and potentially even  make  it  less
              accurate.   Also,  some  ISPs  will filter out your
              spoofed packets, although many (currently most)  do
              not restrict spoofed IP packets at all.

(http://www.insecure.org/nmap/data/nmap_manpage.html)


6) Which scanning tool was used to scan our honeypot? How were you able to determine this?

Nmap. Just a good guess. ;-)

There were 3 correlating factors that confirmed my initial educated guess regarding (which host operating systems were in use and) the scanning tool that was used:
#1 - Common usage and feature set. Nmap is a standard utility for port scanning/OS fingerprinting (at least in the Open Source world as I know it). ;-)
#2 - Scan types used (and OS fingerprinting). I recognized the scan type options being used in the scan from my very limited experience with nmap.
#3 - Confirming evidence in a couple of general port scanning overviews.

I started by brainstorming and researching a list of a number of well-known and other various port scanners:

- telnet - definitely not telnet :)
- nc (netcat) - i haven't used nc for real, but from my reading on nc usage, i don't think netcat is capable of this kind of diversity in techniques.
- SATAN - again, another tool i haven't used for real, but i don't think SATAN's got all these features; i think nmap got the features from SATAN, et al. :)
- hping - hping is a powerful utility from what I've read on hping usage, but by the same token, this isn't hping.
- NetScanTools 
- IPEye
- SuperScan
- Atelier Web Security Port Scanner (AWSPS)
- PortScan Plus
- wGateScan
- YAPS (Yet Another Port Scanner)
- PortScanner
- UltraScan
- SiteScan (Rhino9)
- CyberCop scanner
- SARA (Security Auditor's Research Assistant)
- strobe - Class B only
- Port Scanner
- PortFlash
- knocker
- WUPS - Windows UDP Port Scanner
- Blue's Port Scanner
- Local Port Scanner

Then I cross-referenced the scan types (and OS fingerprint tests) from Question #5 with available switches for nmap (http://www.insecure.org/nmap/data/nmap_manpage.html). The likelihood of a -sS -O type option scan became evident as I analyzed the OS fingerprinting techniques being used in between a couple of the scan types.

During my research, the following overviews of nmap's capabilities confirmed my suspicions:

"NMAP does three things. First, it will ping a number of hosts to determine if they are alive or not. Second, it will portscan hosts to determine what services are listening. Third, it will attempt to determine the OS of hosts. The default behavior of NMAP is to do both an ICMP ping sweep (the usual kind of ping) and a TCP port 80 ACK ping sweep." (http://www.insecure.org/nmap/lamont-nmap-guide.txt)

"Starting TTL and source port numbers can also give us a hint of what port scanner type (for 'stealth' scans) or operating system (for full TCP connection scans) is used by the attacker. We can never be sure though. For example, nmap sets TTL to 255 and source port to 49724, while Linux kernel sets TTL to 64." (http://www.cs.wright.edu/~pmateti/Courses/499/Probing)

Here is the location of a couple of the OS fingerprinting examples or -O option in nmap being used in conjunction with each of a few of the different scans:

Packet #150639 - #150672 - The SYN|ECN caught my eye. After some reading on ECN, I read on a web board somewhere that the ECN bits are often used in port scans because certain firewalls basically ignore ECN under the wrong and outdated assumption that they are not used. Next, a NULL packet follwed by an XMAS packet. Then it spends a lot of time on port 22 (that's SSH - and, it was shown to be an open port earlier...). Only a few different source ports being used, but the initial TCP seq # is changing... I expect there are a number of different modifications to each different TCP packet... maybe a fingerprint? Yup! That's nmap OS fingerprinting alright. :)
Packet #153149 - #153165 - Another one of those ECN and then SYN packets - heh, strange to see that bit set. ;-) Looks like more fingerprinting.


7) What is the purpose of port scanning?

The purpose of port scanning is to determine what ports are open, and hence what services that may be running on a system are available to the attacker. This result is utilized for good by network and system administrators as a part of network security audits, and for evil by attackers who wish to compromise a box by using an exploit for one of the discovered running services on its open port. Port scanning also provides a number of additional applications and the added bonus of possibly being able to determine what OS a system is using (due to inconsistent or peculiar responses each OS's implementation of the TCP/IP stack returns). Port scanning's additional applications can also tell us what hosts are up on a network and various other network topological details, such as IP addressing, MAC addressing, router and gateway filtering, firewall rules, IP-based trust relatbionships, etc..


8) What ports were found open on our honeypot?

In order to determine which ports were open, I used a display filter in Ethereal (tcp.flags == 18) to find all the SYN|ACKs sent by the honeypot.

The following ports were found to be open: 
22 (SSH)
53 (DNS domain name service)
80 (http)
111 (sunrpc)
443 (https - SSL)
32768 (In this case, rpc.statd (for NFS file locking, called the Network Status Monitor (NSM) RPC).

SYN|ACKs represent an open port because the system responds is responding to an initial Synchronization packet with a Synchronization Acknowledgement packet (which is the 2nd stage of the TCP 3-way handshake). This essentially is saying, "I'm here -- I got your request to start a conversation and I'm listening on this port. Go ahead with what you want to say". Also, since our honeypot is presumably not trojaned (yet?) :-), it shouldn't be initiating any outbound connections, just replying to inbound connection requests. In congruence, all of the source IPs for the SYN|ACKs were 192.168.0.99. No SYNs were recorded from 192.168.0.99. Any ACK|FIN|PSH|NULL|URG? Nope. Nothing that wasn't an expected response (not an exhaustive search, but conclusive).

Some information on the service running on port 32768:

"This rpc based service for nfs has the following function:

The rpc.statd server implements the NSM (Network Status Monitor)
RPC protocol. This service is somewhat misnomed, since it
doesn't actually provide active monitoring as one might suspect;
instead, NSM implements a reboot notification service. It is used
by the NFS file locking service, rpc.lockd, to implement lock
recovery when the NFS server machine crashes and reboots.
http://online.securityfocus.com/archive/91/210862


9) Bonus Question: What operating system was the attacker using?

The attacker is using a machine running Linux using a kernel v2.4.1 - v2.4.14.

I started out by looking at what information the attacker was disclosing during the scan. This was primarily the RST packets in the vanilla connect() scan which had a TTL=255. 

SYN = 48 (attacker)
SYN/ACK = 64
RST = 255 (attacker)

After turning up very little information using Google, I was initially led by this to believe that this meant it might be a Solaris box attacking. However, I eventually realized that the RST TTL=255 because the attacker is on the same physical network segment as the scan recipient (which is the other scenario where the RST packet will have a TTL=255).

I decided to narrow it down by answering the question: "What OSs does nmap run on?"

"The primary focus of Nmap development is on free operating systems such as Linux, FreeBSD, NetBSD, and OpenBSD). Solaris is also a first-tier supported platform because Sun now offers source code to their OS, and also because the Sun GESS Security Team sent me a new Ultra workstation. ... Mac OS X, SunOS, HP-UX, AIX, Digital UNIX, and Cray UNICOS" (http://www.insecure.org/nmap)

This helped to narrow down the candidates, but didn't provide any specifics. So I did some research into learning exactly how nmap implements OS fingerprinting.

After that, I found I was caught in a feast or famine scenario: no specific information available via Google, but a plethora of specific information available in the scan if I wanted to evaluate the nmap fingerprint templates manually. :-P

I narrowed my search on Google for specific information regarding TTLs, Window Sizes, etc., and I eventually found this document via Google: http://www.giac.org/practical/Matthew_Fiddler_GCIA.doc

This document had enough specific examples to make an educated guess from the data in the scan log. So I did some more Ethereal display filtering and found that most of the attacker's TTLs are 64 which indicated a Linux install. Finally, I used this display filter: tcp.window_size == 5840 && ip.src == 192.168.0.9 which revealed enough of the attacker's packets with a Window Size of 5840 which corresponded in the document to Linux kernel versions 2.4.1 - 2.4.14.

In addition, I think the scan recipient OS is running Linux, too, but then I haven't examined the results of the scan log's OS fingerprinting closely enough to confirm (I ran out of time to do that). Suffice it to say that it's running a *nix/*BSD style OS because of the NFS rpc.statd running. :-P


Tools used:  Ethereal, Google.  I tried p0f, but didn't use it in the analysis. I also referred a lot to nmap's own documentation.


P.S.  I could do a lot more analysis work on the road if someone ported WinDump to the Pocket PC.  ;-)