Question 1

Identify and explain the purpose of the binary.

The file "the-binary" is a statically linked (against Libc5.3.12) and stripped ELF binary executable. It is a multi-purpose denial-of-service agent and system backdoor. It allows an attacker to communicate with it via scrambled network packets to either perform automated attacks against specified targets, or to allow access to the host itself to execute commands.

Question 2

Identify and explain the different features of the binary. What are its capabilities?

The binary requires root permissions to run. This is primarily to open raw network sockets. On startup, it tests for euid 0, and if not being executed by a user with root permissions, it quits.

It changes argv[0] to the string "[mingetty]", in order to hide its appearance in the process list.

It then enters a command loop, listening on a raw socket for packets with IP protocol set to 11. If a packet is found, byte 0 of the body following the IP header is checked for value 2, and if the total size of the packet is greater than 200 bytes. If these two conditions are not met, the loop is begun again.

It is important to note that the source address of this packet may be (and probably is) forged. This would make it difficult to identify the source of this type of traffic from only the packet itself.

The contents of the packet body, starting at byte 2 of the payload, are descrambled. Details of the descrambling algorithm are included in answer 3. The value of byte 2 of the descrambled output is checked to be between 1 and 12. If it is outside these boundaries, the loop is begun again.

Depending on the value of byte 2, the following commands are executed and upon exit the loop begun again. Please note, packet contents from the payload byte 3 (command byte) are encrypted, and all additional bytes following the last byte listed are padding, and may be filled with arbitrary content.

Command 1 - Status

Packet Format:

Returns a scrambled packet of the form:

The packet returned is a status packet. The entire response is encryped, from the payload onwards.

Byte 1 (beginning at byte 0) of the payload is set to 1, byte 2 set to 7, the third byte indicates whether a command is in progress, and the fourth will tell the current command in progress.

Command 2 – New Master

Packet Format:

This sets an IP address to become the recipient of any master traffic (non denial-of-service traffic). This traffic may be the output of shell commands executed on the compromised machine, or status packets from Command 1 above. The IP address is contained in network byte order in the unscrambled payload, as indicated by m[0]-m[3] above. If the value of the fourth byte (decoy flag) is 2, the binary creates a list of 10 IP addresses, all of which are randomly generated decoy hosts apart from one randomly selected position to hold the IP address of the new master. Any outbound master communication from the binary is sent to all 10 IP addresses to make it difficult to identify the true master. If the value of this decoy flag is 0, then the binary does not use decoy addresses.

The binary also takes the destination address of this packet and uses it as the source address for further outbound communication. This makes it unnecessary to identify a local IP address as part of the code.

Command 3 – Execute Shell Command and Return Result

Packet Format:

This command executes a null terminated shell command of maximum size 24 bytes. This is executed using ‘/bin/csh –c –f’, redirecting the output to the file /tmp/.hj237349’. The output file is then scrambled and sent back to the master, splitting it across packets if necessary. As part of the sending process, the packets are also sent to all 9 other decoy hosts, if decoy hosts are enabled.

The scrambled reply packets have the entire payload scrambled, as in command 1.

Command 4 – DNS Flood

Packet Format:

Command 3 causes a DNS flood against the hostname or IP address (S[0]-S[3]) provided. If byte 10 of the payload is set to 1, a hostname is used rather than the IP address S. S is in network byte order. S_HI and S_LOW are the high and low bytes of the UDP port of the source to be flooded with DNS replies.

The DNS flood consists of a set of forged queries to be made to a list of DNS servers embedded in the binary. Because the source address of the query is forged to the target, and the query returns far more data than is sent to the DNS server, the net effect is traffic amplification and denial-of-service against the target.

A set of 8000 DNS servers are embedded into the binary, as well as pre-configured queries to request SOA records for .com, .net, .de, .edu, .org, .usc.edu, .es, .gr, and .ie. The flood cycles through all these query types, for each of the servers stored in the binary.

The DNS flood continues indefinitely, unless killed by a Command 7. The DNS flood also resolves the target hostname, if provided, every 40,000 iterations. This enables the flooding to continue should the target host switch IP address to avoid the traffic.

Command 5 – Packet Flood

Packet Format:

This is combination UDP and ICMP flooder. If the payload byte 4 (type) is specified as 0, then the flood consists of a stream of ICMP Ping packets directed at the target from the specified source address. If the type is specified as 1, then the flood sends UDP packets from random source ports to a specified destination port, found in payload byte 5. This destination port can have a value between 1 and 255, as it is contained in a single byte. The Source (S[0]-S[3]) and Destination (D[0]-D[3]) are in network byte order. If the payload byte 13 (resolve) flag is set, the hostname is used.

The hostname is resolved in 40,000 iterations. The source address of the flood packets is also spoofed.

Command 6 – Create Shell

Packet Format:

Command 5 creates a password protected shell on port 23281. The shell can be accessed by telnet, and typing the password ‘SeNiF’ followed by some carriage returns. The password is hard coded into the binary, however the comparison is made by subtracting 1 character from the stored binary. This makes it impossible to immediately identify the password using ‘strings’ or something similar.

Command 7 – Execute Shell Command and Kill

Packet Format:

This command runs a shell command, similarly to Command 3, however the result is not returned. The command is terminated after 20 minutes.

Command 8 – Kill previous Command

Packet Format:

This command terminates a previously running command. This would be used to terminate the denial-of-service attacks, as they run for an infinite period of time.

Command 9 – DNS Flood with Resolver Iterations

Packet Format:

Command 9 is identical to Command 4, however it will resolve the target hostname in 40,000*'ResTime' iterations

Command 10 – TCP SYN Flood

Packet Format:

A TCP SYN Flood is launched against the target. The target may be a hostname or an IP address, this is specified similarly to Command 4 by setting the resolve flag. The target hostname, if used, is resolved every 40,000 iterations of the flood. The source address of the SYN Flood may be specified, or the 'synrand' flag can be set to 0 to use a random source address per SYN packet.

D_HI and D_LOW are the high and low bytes of the target port to be SYN Flooded.

Command 11 – TCP SYN Flood with Resolver Iterations

Packet Format:

Command 11 is identical to Command 10, however it will resolve the target hostname in 40,000*'ResTime' iterations.

Command 12 – DNS Flood with Specified DNS Server

Packet Format:

Command 12 is identical to Command 4, however it takes a user specified DNS server (DNS[0]-DNS[3] in network byte order) rather than using the DNS servers embedded in the binary. Command 12 will also resolve the target hostname in 40,000*'ResTime' iterations.

Question 3

The binary uses a network data encoding process. Identify the encoding process and develop a decoder for it.

The encoding process is a simple stream cipher using the addition of adjacent characters plus a fixed value (23) to generate an output character.
The encoding algorithm can be expressed in pseudocode as:

output[0] = input[0] + 23 modulo 255;
for x in 1..stringlength
output[x] = input[x]+input[x-1] + 23 modulo 255;

The encoder is used to encode outbound traffic, both from client and the-binary. The decoder is used to decode inbound traffic from either of these. Both encoder and decoder can be found in the additional files, as encode.c

Question 4

Identify one method of detecting this network traffic using a method that is not just specific to this situation, but other ones as well.

The network traffic could be generally be identified by logging outbound traffic on egress routers that does not legitimately originate from internal networks. This would identify a significant amount of traffic from compromised hosts using fraudulent source IP addresses. This should be carefully considered however, as this may cause significant load on border routers.

The command network traffic could be identified simply by searching for packets greater than 200 bytes with IP protocol set to 11.

Question 5

Identify and explain any techniques in the binary that protect it from being analyzed or reverse engineered.

The binary was stripped, and as such no symbol information was available. The binary was also statically linked. This combination meant traditional methods of debugging were hampered by the fact that all library functions were not resolvable to a name. The technique we used to resolve the functions to names was effectively binary matching against the libc version identified in ‘strings’ output.

In addition to the binary manipulation, the password for the shell in Command 5 was not stored in plain text, rather slightly encoded to prevent using ‘strings’ output as input to a brute forcer.

The binary also used a significant amount of forking, which may make it difficult to analyse under a debugger. As debugging was not the method we used to reverse the software, it was not encountered as a problem.

Question 6

Identify two tools in the past that have demonstrated similar functionality.

Initial indications would suggest this is a distributed denial-of-service tool. However, from this binary alone it was not possible to identify any form of distributed agent/master software required to command the tool. As such, technically it can only be compared with such multi denial-of-service tools as ‘rape’ or ‘targa’. That being said, there is no reason why a distributed client-master-agent relationship could not exist, and it is highly probable that it does. In this case, the most fair comparison would be with tools such as TFN or Stacheldraht.

Bonus Questions

What kind of information can be derived about the person who developed this tool? For example, what is their skill level?

The person or persons responsible for this tool seem to have a slightly better than average knowledge of networking protocols and their uses. They selected an IP protocol that is virtually unused, and have incorporated packet scrambling techniques into the tool. They used raw sockets to create a stateless connection between the master and the slave. The tool itself has some features that show a more advanced understanding of the types of attacks and prevention, such as re-resolving the target every 40,000 or so packets to avoid the target changing their IP address DNS hostname. The code was written well, probably by someone with good understanding of Unix daemons and network programming. There were some instances however where commands could have been reused rather than creating new commands to handle different cases.

What advancements in tools with similar purposes can we expect in the future?

Advancements in the scrambling routines - substitution of scrambling for more advanced cryptographic routines to better hide the commands being transmitted between the components of the tool.

Hiding traffic inside higher level protocols - such as sending commands and responses inside https requests and responses or DNS requests and responses, which would appear valid to the casual observer but which are never intended to hit any particular server, rather just the same address space as the tool.

Additional methods of hiding the tool - such as sending the traffic to an address on the same network rather than directly to the tool, or use of kernel modules to use the kernels functionality to hide the existence of the tool completely from any users of the system.

Anti-disassembly techniques could be improved to make discovery of the tool's purpose much more difficult, this could include using custom modified libraries to stop tools such as our disassembler from resolving the functions, or packing/encrypting the binary.

More advanced attacks directed at specific OSes - Using less flood type attacks and making more use of more complicated, less easily detected DOS attacks that use less traffic to achieve the same result. ie;