© by Als and Vizzy, 2002.
[Ukraine - United Kingdom] Team
analysis.html
~~~~~~~~~

As was clear from the challenge conditions, binary has unknown origin. Thus, we can't talk about executing binary or running it under debugger before first-time analysis. We've examined disassembling possibility and found that binary is not encrypted and there shouldn't be any problems for the disassembler. We didn't want to mess setting up controlled environment to run hostile binaries and were inspired by a daring idea to solve challenge without a single binary execution...

We had two disasm choices. First is an excellent unix(linux) program -- "ht", by Stefan Weyergraf, stefan@weyergraf.de and Sebastian Biallas, sb@biallas.net (version 0.6.0b). Second, of course, interactive disassembler IDA by Datarescue.

Firstly, binary was analysed in "ht", where we examined ELF-header (has usual structure for 32-bit objects, LSB encoding, System V, executable file, Intel 80386) and ELF section headers, where we haven't found any anomalies. All common sections are present, like .text, .bbs, .data .. etc. Also, we confirmed that binary is statically linked (dynamic libraries are not described) by executing commands "file ./the-binary" and "ldd ./the-binary".

Studying disassembly results is much better to do in IDA (with features like: automatic variables recognition, function arguments, setting of dynamic comments, references..etc), we will not describe here why it helps. We just start IDA v4 and open our binary.

We run automatic analysis process and wait for disassembled code to show. And here we go...

Function identification
-----------------------


As we said already, binary is statically linked and all debug info is stripped. For us that means only one -- we will not be able to see any symbolic procedure names in disassembled code. Experiencing slight grieve.

There are two ways from now. First one is to make signatures of the static libc (string "The Linux C library 5.3.12" found in binary, however we can't rely on that, since it could be faked) and another one -- manual analysis of called functions. We pick second choice as we don't have program FLAIR from the IDA distribution, which is used to create signatures (moreover, as we know, it doesn't work with linux libraries). We gave up trying to find FLAIR and decided to manually find out function names. Will describe in details how it was achieved.

As known, many of libc functions (like read, geteuid ..etc) are only API to corresponding system calls. And system calls in OS Linux performed by calling interrupt 80h with a system call number in EAX register. Definition of those system calls numbers could be found in a file "/usr/src/linux/include/asm/unistd.h". For instance, system call with a number 31h is __NR_geteuid. This means that it could be called from function geteuid(). Also, need to notice, that libc functions return results mostly in EAX register, this is a topic of programming with C convention.

Starting from the entrypoint of ELF binary, we move further down into code and enter inside all called functions to see whether "int 80h" calls are present. Thus, we find out all functions which are used to be API for system calls. Let's name some of them:
geteuid(), fork(), setsid(), chdir(), close(), unlink(), time(), kill(), dup2().

Also, many functions refer to a system call with the number 66h (__NR_socketcall). Via this call most of the socket functions are performed. Upon calling int 80h, register EAX is set to 66h and number of the sub-function passed in EBX. File "/usr/src/linux/include/linux/net.h" shows function symbolic names and their corresponding numbers definiton. For example, sub-function number 1 is call of function sys_socket(). Now we can identify some more libc function calls: socket(), recv(), bind(), listen(), accept(), send(). Moving further down the code we understand (or intuition suggests) that we are in libc library code already. Ok, enough, we stop here.

Now exploring disassembled text we see known library function names instead of strange "call sub_xxxxxxx". A little progress. But we still have enough undefined functions. We do another move starting from entrypoint and looking for unknown calls, which we suspect could be libc functions. For that, we run another copy of IDA and open libc binary (static one is better, 'cos binary was compiled with it, but we had only libc.so.5.3.12 handy). Now, performing manual compare, i.e. looking for signatures. Let's see an example with funciton located at address .text:8057764h. We try to choose unique code there, which could be then found into our disassembly. Following instructions looks like a good sequence to pick:

.text:0805777E   shl     edx, 8			; Shift Logical Left
.text:08057781   or      eax, edx		; Logical Inclusive OR
.text:08057783   mov     edx, eax
Hex-signature for them is: C1 E2 08 09 D0 89 C2.
Looking for that signature with a binary-search in libc disassembly. And, thanks god, we found only one match. Now, if we visually compare function from our binary with found code we will see that they are almost identical. Thus, with a high probability function could be identified as the same. And name of it "memset"!. But it's not that smooth sometimes. Functions from library and from the binary could differ, since it depends on library compilation manner, optimization of them, etc. (for instance, in shown above, instead of EDX register could be used EBX, which would result in different hex-signature and we won't be able to find nothing like memset in libc disasm). So, search should be made with different signatures or with masks, something like "XX XX ? ? ? XX XX". Finally, after several hours of libc functions definition, they were all unambiguously defined. Was noticed, that function:
.text:0804A2A8 setenv proc near
is the first libc function, which located just after main program code. This made libc functions separation much easier.

Now first part of the work is done.

Analysing program behavior
-----------------------------


Function main() was found:
.text:08048134 main       proc near
This is one suitable function, which was called between program entrypoint and exit part with an address located in program code (not in linked library).
However, there was another call:
.init:08048080 sub_08048080 proc near
.init:08048080 call    sub_80675A8     ; Call Procedure
.init:08048085 retn    0               ; Return Near from Procedure
But sub_80675A8 is located in libc code and if we take a closer look at this function, we'll see this is opening initialisation of ELF execution. So we name it _init_proc and return to understand what's going on in main().

At the begining main() performs initialisation of pointers to different buffers and data structures, then it checks geteuid() and if result is zero (we are not root), program terminates. So, program requires root permission to be executed. Then it fakes name of the running process changing it to "[mingetty]". Sets handler of SIG_IGN for signal SIGUSR2 (this just means signal will be ignored), process does fork() and parent happily terminates. Now, what the child is up for? We continue to follow.. First action is setsid(0) (anyone requires comments?), then it again sets handler SIG_IGN for SIGUSR2 and fork's() again. Another child is spawned. You ask what the heck it's all about? Well, at a stretch we can call this "anti-debugging". If you had to perform analysis with 'gdb', it wouldn't want to follow the fork. But as we just reading thru disassembled code, it does nothing to us.

Continuing with child actions. It sets current dir to "/" (chdir("/")), closes stdin, stdout, stderr and inits (sets to 0) three variables:
p_id, used to store pid of created child processes (our current process is main now, p_id = 0); p_pid, used to kill on timeout fork()'ed system("") processes; called_feat_number, keeps unique number of called binary feature. Next is random number generator is inited with srandom(time()) and raw socket is opened. SIG_IGN handler is set for signals SIGHUP, SIGTERM and twice(?) for SIGUSR2. All these signals will be ignored. Then program enters infinite cycle, where every iteration is finished with a delay:
.text:08048EB8  push    2710h         ; 10000 uS
.text:08048EBD  call    usleep        ; Call Procedure
Main cycle description
-----------------------

- recv(socket,*buffer2,len,flags) - function waits for any IP packet to come (raw socket) and stores this packet into buffer2 (len = 2048 bytes);
- checks protocol value and options of IP-header. If protocol not equal to 0Bh or options value is not 02, repeat cycle. Also, size of received packet should be more than 200 bytes. If not, repeat cycle.

- pointer to data from IP packet (offset +16h of IP-header) is sent to decode_packet(*rcv_buffer,*buffer2+16h,rcv_packet_size-16h) funciton;
This function performs simple decoding and will be described later. Decoded packet is now stored at rcv_buffer.

- takes value of second byte of rcv_buffer (ofs +1), decrements it by 1, checks that it's not greater 11 and makes jump func_table[value]. As we see, func_table keeps addresses of different implemented features, which could be called depending on second byte of decoded packet, when IP protocol = 0x0B, field options set to 02 and packet size greater than 200 bytes.

Before we go to features description, we recall what some of our data variables are:

byte rcv_buffer[]; // incoming packet, received and decoded data;
byte IP_addr[]; // buffer to keep IP addresses;

Detailed description of all 12 features
---------------------------------------


We will describe features not by order but rather by their similiarity and dependency.

1) .text:08048894 bind_root_shell_on_port

Incoming packet structure:
+0: not used
+1: feature number (0x06)
Fork's child which listens to TCP port 23281. After established connection to this port, spawns another process to receive data. Parent continues to listen port 23281. Encodes received data by incrementing all bytes by one and changing all 0x0D, 0x0A (CR, LF) to 0x00. If received string equals to "TfOjG" ("SeNiF" before encoding) then it accepts password and starts "/bin/sh" on this open connection (dup2 stdin, stdout, stderr). For that, sets shell environment variables PATH, TERM and removes HISTORY variable. When someone using this shell exits, process dies, but listening process will continue to run. It can be killed only with "kill_called_proc" feature.


2) .text:08048590 exec_some_csh_cmd_and_send_result
Incoming packet structure:
+0: not used
+1: feature number (0x03)
+2: command for ("/bin/csh -f -c \"%s\" 1> %s 2>&1")
Function fork's and parent continues with main cycle. Child fork's again. Parent sleeps for 10 seconds and kills itself. Child executes: system("/bin/csh -f -c \"%s\" 1> %s 2>&1",rcv_buffer[2],"/tmp/.hj237349");
So, it executes command taken from received buffer and directs output to "/tmp/.hj237349". Then data from this file is copied back to rcv_buffer[2] to send back to client. Data manipulations are performed with a chunks of size 0x18D, which could be transmited in multiple packets. In first packet rcv_buffer[1] set to 3 and in any next to 4. Function calls encode packet and result is stored in buffer5. Then it calls transmit_data and encoded data from buffer5 is sent to IP addresses from IP_addr buffer (must be filled with feature 0x02 before). Then process sleeps for 400000uS, closes "/tmp/.hj237349" file, erases it and exits.

3) .text:08048ACC exec_some_csh_cmd
Incoming packet structure:
+0: not used
+1: feature number (0x07)
+2: command for ("/bin/csh -f -c \"%s\" 1> %s 2>&1")
Function fork's and parent continues with main cycle. Child fork's again. Parent sleeps for 1200 seconds and kills itself. Child executes: system("/bin/csh -f -c \"%s\" ", rcv_buffer[2]); Thus, function is almost similar to above function, which executes received command with root privileges except it does not return command output.

4) .text:08048B58 kill_called_proc
Incoming packet structure:
+0: not used
+1: feature number (0x08)
Execution of many features in the binary starts with spawning process (fork), which then continues feature execution. PID of spawned process is stored into variable p_id. Some processes do not terminate themselves, thus kill_called_proc is used to kill process of previously called feature. It does just "kill -9 p_id" and nothing more.

5) .text:080483F0 get_10_IPs_from_packet_or_random
Incoming packet structure:
+0: not used
+1: feature number (0x02)
+2: IP_flag (0x00, 0x02 or others)
+3: ipA
+4: ipB
+5: ipC
+6: ipD
... etc. 10 IP addresses in total.
...
Function fills IP_addr buffer with 10 IP addresses received from a client or picked randomly. Also it stores IP_flag and destination IP address from IP header of the received packet from a client.

Algorithm of filling IP_addr is a bit weird:
R = rand(10);
If (IP_flag == 2) {
 Fill IP_addr with 10 IP addresses stored starting from rcv_buffer[3].
 One IP address is skipped, which position number equals to random R.
} 
 else 
{ Fill IP_addr with 10 random IP addresses, one IP is skipped the same way as above. }

if (IP_flag == 2) exit;
if (IP_flag == 0) { Copy first IP address from rcv_buffer[3] to first element of IP_addr }
else { Copy first IP address from rcv_buffer[3] to R element of IP_addr }
Now what conclusions we can make from this?
1. If IP's picked randomly, one will be always changed to specified IP sent from client. Since those IP's are used as recipients of data in features 0x03 and 0x01, it is obvious, that data should reach someone real and who requests for it. So, first IP is IP of a client.

2. IP_addr structure should be filled with 10 IP's. However, in case when IP_flag = 2, one IP at R position will be skipped and not initialised. The same occurs when IP_flag = 0, but in this case only first IP will be used in transmit_data function, so the rest is not really important. We have no idea what author was thinking of coding this, it might be just a mistake.

3. Why broadcast data to random IP's? With a non-standard protocol? We can only think this is a not implemented yet feature, which is supposed to probe and control other hosts running the-binary, with intentions to perform distributed denial-of-service attacks.

6) .text:0804835C transmit_status_data
Incoming packet structure:
+0: not used
+1: feature number (0x01)
This function seems like to report the-binary status back to a client. It transmits data from buffer2 to one (if IP_flag = 0) or ten IP addresses previously initialised with 0x02 feature. Source IP address of the outgoing packet is set to destination IP address of the packet received from a client. Thus, it's IP of our host running the-binary. Buffer2 has no actual data inside and is filled only with some constant values and if we have any feature process running, it reports it's feature number (stored in variable called_feat_number). So the reply packet is:
+0: 0x00
+1: 0x01
+2: 0x07
+3: 0x01 if child running, 0x00 otherwise
+4: called_feat_number


Packet is encoded before sending with encode_packet and size set to (0x190 + random(0xC8)) bytes. Transmission performed with a subroutine transmit_data.

7) .text:08048D08 SYN_flood
Incoming packet structure:
+0: not used
+1: feature number (0x0B)
+2: dst_ipA (destination IP)
+3: dst_ipB
+4: dst_ipC
+5: dst_ipD
+6: dst_portH 
+7: dst_portL
+8: ip_src_not_random_flag
+9: src_ipA (source IP)
+A: src_ipB
+B: src_ipC
+C: src_ipD
+D: repeat counter
+E: 0x01 if destination address in str. format (like '128.0.0.1')
+F: dst_addr_in_str_format
...
This function sends TCP packets to specified IP and port number. Analysing TCP packet parameters we found that each packet comes with SYN bit set.

.text:08049EAF mov [ebp+flag2], 2 ; TH_SYN

This bit has meaning only when establishing connection e.g. in the handshaking procedure. Both sides of the connection needs to send this special packet with SYN flag on. Thus, it appears function is trying to perform so-called "SYN Flood" denial-of-service attack. Source IP, depending on ip_src_not_random_flag, may be real or picked randomly (spoofed). Most of the TCP parameters are picked randomly (src_port, seq, window, ttl, window, identifier).
As we see, function was intended to send (40,000 * (repeat_counter+1)) packets, however this was not coded completely and it sends packets in infinite loop, unless it's able to create sockets. So, repeat_counter parameter is useless.

8) .text:08048C34 syn_flood_0
Incoming packet structure:
+0: not used
+1: feature number (0x0A)
...
...
This function is identical to previous (0x0B) and would only differ by repeat_counter always set 0, thus sending only 40,000 packets. However, it uses the same subroutine as previous one, where loop is infinte despite of the counter.

9) .text:080487C8 udp_icmp_flood
Incoming packet structure:
+0: not used
+1: feature number (0x05)
+2: UDP_flag (0x01 -- UDP, other ICMP flood)
+3: dst_port 
+4: dst_ipA
+5: dst_ipB
+6: dst_ipC
+7: dst_ipD
+8: src_ipA
+9: src_ipB
+A: src_ipC
+B: src_ipD
+C: 0x01 if destination address in str. format (like '128.0.0.1')
+D: dst_addr_in_str_format
.....
Function performs UDP or ICMP flood. Destination IP is mandatory and dst_port is used only for UDP flood. Source IP is taken from incoming packet.
UDP flood is performed with packet size 900h , source port is random. Flooding loop is infinite as in every feature.

10) .text:08048B80 dns_flood
Incoming packet structure:
+0: not used
+1: feature number (0x09)
+2: src_ipA
+3: src_ipB
+4: src_ipC
+5: src_ipD
+6: repeat_counter
+7: src_portH
+8: src_portL
+9: TRUE, if source address in str. format (like '128.0.0.1')
+A: src_addr_in_str_format
...
Function sends UDP packets to IP:53 UDP, where IP are all IP addresses from the internal address list (.data:0806D22C ip_list) Packets are sent with specified source IP address and source port (if not specified, picked randomly). So, what those packets and those IP's are?

We will take closer look on DNS queries sent by the-binary.

All packets to UDP 53 port are stored in the-binary data section starting at offset:
.rodata:080676BC udp_packets db 'Gn'

There are 9 packets in total. They are all indentical, except of the domain names: com, net, de, edu, org, usc.edu, es, gs, ie. Let's examine one of them. (N/B: lame way to keep 9 same structures only with different domains.)
080676BC udp_packets     db 'Gn' ; ID        
080676BE                 db    1 ; flagsL
080676BF                 db    0 ; flagsH
080676C0                 db    0 ; QDCOUNT_H
080676C1                 db    1 ; QDCOUNT_L
080676C2                 db    0 ; ANCOUNT_H
080676C3                 db    0 ; ANCOUNT_L
080676C4                 db    0 ; NSCOUNT_H
080676C5                 db    0 ; NSCOUNT_L
080676C6                 db    0 ; ARCOUNT_H
080676C7                 db    0 ; ARCOUNT_L
080676C8                 db    3 ;  
080676C9 aCom            db 'com',0
080676CD                 db    0 ;  
080676CE                 db    6 ; QTYPE_L
080676CF                 db    0 ; QTYPE_H
080676D0                 db    1 ; QCLASS_L
080676D1                 db    0 ; QCLASS_H
080676D2                 db    0 ;
...

As we can find in rfc1035 (where DNS messages are described) format of the message is:

 +---------------------+
 |        Header       |
 +---------------------+
 |       Question      | the question for the name server
 +---------------------+
 |        Answer       | RRs answering the question
 +---------------------+
 |      Authority      | RRs pointing toward an authority
 +---------------------+
 |      Additional     | RRs holding additional information
 +---------------------+

Now let's see what we have. Message requires header (see rfc1035). 
Header structure is:

                                    1  1  1  1  1  1
      0  1  2  3  4  5  6  7  8  9  0  1  2  3  4  5
 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
 |                      ID                       |
 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
 |QR|   Opcode  |AA|TC|RD|RA|   Z    |   RCODE   |
 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
 |                    QDCOUNT                    |
 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
 |                    ANCOUNT                    |
 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
 |                    NSCOUNT                    |
 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
 |                    ARCOUNT                    |
 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

   ID = "Gn",  doesn't matter in our case;
   QR = 0, our message is a query;
   OPCODE = 0, our message is a standard query (QUERY);
   AA = 0, valid in responses and specifies that NS is authority 
        for domain name in request;
   TC = 0, if 1, specifies that this message was truncated due to length greater
	 		than that permitted on transmission channel.  
   RD = 1, asks for recursive query!;
   RA = 0, defined in response (shows whether recursion for that NS available). 
        Doesn't matter for us;
   Z = 0, reserved for future use;
   RCODE = 0, defined in response. Response code;
	 
   QDCOUNT = 1, specifies the number of entries in question section;
   ANCOUNT = 0, specifies the number of resource records in the answer section (we have only requests);
   NSCOUNT = 0, specifies the number of name servers resource records in authority records section (doesn't matter);
   ARCOUNT = 0, specifies the number of resource records in additional records section (doesn't matter);

   We examine now format of question section only:

                                    1  1  1  1  1  1
      0  1  2  3  4  5  6  7  8  9  0  1  2  3  4  5
 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
 |                                               |
 /                     QNAME                     /
 /                                               /
 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
 |                     QTYPE                     |
 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
 |                     QCLASS                    |
 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

   QNAME = 3,'com',0,0. A domain name represented as a sequence of labels, 
	 where each label consists of a length octet followed by that number of 
	 octets. The domain name terminates with the zero length octet for the null 
	 label of the root.  

   QTYPE = 0x06, specifies the type of the query, 6 - SOA;
   QCLASS = 0x01, specifies the class of the query, 1 - IN;
Conclusion:
Packet is standard recursive query for a domain name. Internal IP list consist of hosts supposed to be name servers. By sending UDP packets to these name servers with a spoofed source address of victim, it initiates a packet storm of name server's replies, targeted back to victim IP. With just a few bytes (20-30) it can achieve responses of around 400-500 bytes. Thus, this method gives a real advantage against victim.
This attack is widely used, we call it "DNS flood" and we know existing exploits with names "DoomDNS", "dnsabuser.c" and others.
The same as for previous attacks, function was intended to send (40,000 * (repeat_counter+1)) packets, but this is not working and packets are sent into infinite loop; repeat_counter parameter is useless.

11) .text:0804871C dns_flood_0
Incoming packet structure:
+0: not used
+1: feature number (0x04)
+2: src_ipA
+3: src_ipB
+4: src_ipC
+5: src_ipD
+6: src_portH
+7: src_portL
+8: TRUE, if source address in str. format (like '128.0.0.1')
+9: src_addr_in_str_format
This function is identical to previous (0x09) and would only differ by repeat_counter always set 0, thus sending only 40,000 packets. However, it uses the same subroutine as previous one, where loop is infinte despite of the counter.

12) .text:08048DE4 dns_flood_on_bunch_domains
Incoming packet structure:
+0: not used
+1: feature number (0x0C)
+2: dst_ipA
+3: dst_ipB
+4: dst_ipC
+5: dst_ipD
+6: src_ipA
+7: src_ipB
+8: src_ipC
+9: src_ipD
+A: repeat_counter
+B: src_portH
+C: src_portL
+D: TRUE, if destination address in str. format (like '128.0.0.1')
+E: dst_addr_in_str_format
...
Sends query packets (080676BC udp_packets) to destination IP, port UDP 53. Source IP of the flood victim, if not specified by client, picked randomly. However, feature loses to achieve it's goal to flood this random IP, since it generates new IP for every query packet. Rest is all the same -- 40,000 packets, repeat_ counter is not used, infinite loop.

Subroutines
-------------

.text:08048ECC transmit_data

Used to transmit data from features 0x01 and 0x03. Destination IP address(es) should be inited before with 0x02 feature. Depending on IP_flag value, sends data to one or ten IP addresses. Communication protocol 0x0B , the same as for incoming packets.

.text:0804A1E8 decode_packet

Used to decode incoming packets from the client. Algorithm is very simple:
void decode (char *dst, char *src, int len) {
  int i = len;
  int j = i;
  int b;
  
  while(i >= 0) {
    if (i == 0) {
      b = src[i] - 0x17;
    } else {
      b = (src[i] - src[i-1]) - 0x17;
    }
    while (b < 0) { 
      b = b + 0x100;
    }
    dst[j] = b;
    i--;
    j--;
  }
}
The most strange thing that author used sprintf(buffer, "%c%s", char1, str1) to manipulate with data bytes there. Is that Visual Basic programming hard childhood?


.text:0804A194 encode_packet

See above and try to reverse ;)



The End
--------


Did we miss something? :)