analysis.html
~~~~~~~~~
As was clear from the challenge conditions, binary has unknown origin.
Thus, we can't talk about executing binary or running it under debugger
before first-time analysis. We've examined disassembling possibility and
found that binary is not encrypted and there shouldn't be any problems
for the disassembler. We didn't want to mess setting up controlled
environment to run hostile binaries and were inspired by a daring idea
to solve challenge without a single binary execution...
We had two disasm choices. First is an excellent
unix(linux) program -- "ht", by Stefan Weyergraf, stefan@weyergraf.de
and Sebastian Biallas, sb@biallas.net (version 0.6.0b). Second, of course,
interactive disassembler IDA by Datarescue.
Firstly, binary was analysed in "ht", where we examined ELF-header (has
usual structure for 32-bit objects, LSB encoding, System V, executable
file, Intel 80386) and ELF section headers, where we haven't found
any anomalies. All common sections are present, like .text, .bbs, .data .. etc.
Also, we confirmed that binary is statically linked (dynamic libraries
are not described) by executing commands "file ./the-binary" and "ldd ./the-binary".
Studying disassembly results is much better to do in IDA (with features like: automatic variables recognition,
function arguments, setting of dynamic comments, references..etc), we will
not describe here why it helps. We just start IDA v4 and open our binary.
We run automatic analysis process and wait for disassembled code to show.
And here we go...
Function identification
-----------------------
As we said already, binary is statically linked and all debug info is stripped.
For us that means only one -- we will not be able to see any symbolic procedure names
in disassembled code. Experiencing slight grieve.
There are two ways from now. First one is to make signatures of the static
libc (string "The Linux C library 5.3.12" found in binary, however we
can't rely on that, since it could be faked) and another one --
manual analysis of called functions. We pick second choice as we
don't have program FLAIR from the IDA distribution, which is used
to create signatures (moreover, as we know, it doesn't work with linux
libraries). We gave up trying to find FLAIR and decided to manually find out
function names. Will describe in details how it was achieved.
As known, many of libc functions (like read, geteuid ..etc) are
only API to corresponding system calls. And system calls in OS Linux
performed by calling interrupt 80h with a system call number in EAX register.
Definition of those system calls numbers could be found in a file
"/usr/src/linux/include/asm/unistd.h". For instance, system call with
a number 31h is __NR_geteuid. This means that it could be called from
function geteuid(). Also, need to notice, that libc functions
return results mostly in EAX register, this is a topic
of programming with C convention.
Starting from the entrypoint of ELF binary, we move further down into code and
enter inside all called functions to see whether "int 80h" calls are
present. Thus, we find out all functions which are used to be API
for system calls. Let's name some of them:
geteuid(), fork(), setsid(), chdir(), close(), unlink(), time(), kill(),
dup2().
Also, many functions refer to a system call with the number 66h
(__NR_socketcall). Via this call most of the socket functions are performed.
Upon calling int 80h, register EAX is set to 66h and number of the sub-function
passed in EBX. File "/usr/src/linux/include/linux/net.h" shows function
symbolic names and their corresponding numbers definiton. For example, sub-function
number 1 is call of function sys_socket(). Now we can identify some more
libc function calls: socket(), recv(), bind(), listen(), accept(), send().
Moving further down the code we understand (or intuition suggests)
that we are in libc library code already. Ok, enough, we stop here.
Now exploring disassembled text we see known library function names
instead of strange "call sub_xxxxxxx". A little progress. But we still
have enough undefined functions. We do another move starting from
entrypoint and looking for unknown calls, which we suspect could be libc functions.
For that, we run another copy of IDA and open libc binary (static
one is better, 'cos binary was compiled with it, but we had only
libc.so.5.3.12 handy). Now, performing manual compare, i.e. looking
for signatures. Let's see an example with funciton located at address
.text:8057764h. We try to choose unique code there, which could be then
found into our disassembly. Following instructions looks like
a good sequence to pick:
.text:0805777E shl edx, 8 ; Shift Logical Left
.text:08057781 or eax, edx ; Logical Inclusive OR
.text:08057783 mov edx, eax
Hex-signature for them is: C1 E2 08 09 D0 89 C2.
Looking for that signature with a binary-search in libc disassembly. And, thanks god, we found only one match.
Now, if we visually compare function from our binary with found
code we will see that they are almost identical. Thus, with
a high probability function could be identified as the same.
And name of it "memset"!. But it's not that smooth sometimes.
Functions from library and from the binary could differ, since
it depends on library compilation manner, optimization of them,
etc. (for instance, in shown above, instead of EDX register could be used EBX, which would result in different hex-signature and we won't be able to find nothing like
memset in libc disasm). So, search should be made with
different signatures or with masks, something like "XX XX ? ? ? XX XX". Finally, after several hours of libc functions definition, they were all unambiguously defined.
Was noticed, that function:
.text:0804A2A8 setenv proc near
is the first libc function, which located just after
main program code. This made libc functions separation much easier.
Now first part of the work is done.
Analysing program behavior
-----------------------------
Function main() was found:
.text:08048134 main proc near
This is one suitable function, which was called between program entrypoint and exit part with an address located in program code (not in linked library).
However, there was another call:
.init:08048080 sub_08048080 proc near
.init:08048080 call sub_80675A8 ; Call Procedure
.init:08048085 retn 0 ; Return Near from Procedure
But sub_80675A8 is located in libc code and if we take
a closer look at this function, we'll see this is opening
initialisation of ELF execution. So we name it _init_proc
and return to understand what's going on in main().
At the begining main() performs initialisation of pointers
to different buffers and data structures, then it checks
geteuid() and if result is zero (we are not root), program
terminates. So, program requires root permission to be
executed. Then it fakes name of the running process
changing it to "[mingetty]". Sets handler of SIG_IGN for
signal SIGUSR2 (this just means signal will be ignored),
process does fork() and parent happily terminates.
Now, what the child is up for? We continue to follow..
First action is setsid(0) (anyone requires comments?),
then it again sets handler SIG_IGN for SIGUSR2 and fork's()
again. Another child is spawned. You ask what the heck it's
all about?
Well, at a stretch we can call this "anti-debugging".
If you had to perform analysis with 'gdb', it wouldn't want to follow the fork. But as we just reading thru disassembled code,
it does nothing to us.
Continuing with child actions. It sets current dir to "/" (chdir("/")), closes stdin, stdout, stderr and inits
(sets to 0) three variables:
p_id, used to store pid of created child processes (our
current process is main now, p_id = 0);
p_pid, used to kill on timeout fork()'ed system("") processes;
called_feat_number, keeps unique number of called binary feature. Next is random number generator is inited with
srandom(time()) and raw socket is opened.
SIG_IGN handler is set for signals SIGHUP, SIGTERM and
twice(?) for SIGUSR2. All these signals will be ignored.
Then program enters infinite cycle, where every iteration
is finished with a delay:
.text:08048EB8 push 2710h ; 10000 uS
.text:08048EBD call usleep ; Call Procedure
Main cycle description
-----------------------
- recv(socket,*buffer2,len,flags) - function waits for any IP packet to come (raw socket) and stores this packet into buffer2 (len = 2048 bytes);
- checks protocol value and options of IP-header. If protocol
not equal to 0Bh or options value is not 02, repeat cycle.
Also, size of received packet should be more than 200 bytes.
If not, repeat cycle.
- pointer to data from IP packet (offset +16h of IP-header) is sent to decode_packet(*rcv_buffer,*buffer2+16h,rcv_packet_size-16h) funciton;
This function performs simple decoding and will be described
later. Decoded packet is now stored at rcv_buffer.
- takes value of second byte of rcv_buffer (ofs +1),
decrements it by 1, checks that it's not greater 11 and makes
jump func_table[value]. As we see, func_table keeps addresses
of different implemented features, which could be called
depending on second byte of decoded packet, when IP protocol = 0x0B, field options set to 02 and packet size greater than 200 bytes.
Before we go to features description, we recall what some of our data variables are:
byte rcv_buffer[]; // incoming packet, received and decoded data;
byte IP_addr[]; // buffer to keep IP addresses;
Detailed description of all 12 features
---------------------------------------
We will describe features not by order but rather by their similiarity and dependency.
1) .text:08048894 bind_root_shell_on_port
Incoming packet structure:
+0: not used
+1: feature number (0x06)
Fork's child which listens to TCP port 23281. After established connection to this port, spawns
another process to receive data. Parent continues to listen port 23281. Encodes
received data by incrementing all bytes by one and changing all 0x0D, 0x0A (CR, LF) to 0x00. If received string equals to "TfOjG" ("SeNiF" before encoding) then it accepts password and
starts "/bin/sh" on this open connection (dup2 stdin, stdout, stderr). For that, sets shell environment variables PATH, TERM and removes HISTORY variable.
When someone using this shell exits, process dies, but listening process
will continue to run. It can be killed only with "kill_called_proc" feature.
2) .text:08048590 exec_some_csh_cmd_and_send_result
Incoming packet structure:
+0: not used
+1: feature number (0x03)
+2: command for ("/bin/csh -f -c \"%s\" 1> %s 2>&1")
Function fork's and parent continues with main cycle. Child fork's again.
Parent sleeps for 10 seconds and kills itself. Child executes:
system("/bin/csh -f -c \"%s\" 1> %s 2>&1",rcv_buffer[2],"/tmp/.hj237349");
So, it executes command taken from received buffer and directs output to "/tmp/.hj237349".
Then data from this file is copied back to rcv_buffer[2] to send back to client.
Data manipulations are performed with a chunks of size 0x18D, which could be transmited in multiple packets.
In first packet rcv_buffer[1] set to 3 and in any next to 4.
Function calls encode packet and result is stored in buffer5.
Then it calls transmit_data and encoded data from buffer5 is sent
to IP addresses from IP_addr buffer (must be filled with feature 0x02 before).
Then process sleeps for 400000uS, closes "/tmp/.hj237349" file, erases it and exits.
3) .text:08048ACC exec_some_csh_cmd
Incoming packet structure:
+0: not used
+1: feature number (0x07)
+2: command for ("/bin/csh -f -c \"%s\" 1> %s 2>&1")
Function fork's and parent continues with main cycle. Child fork's again.
Parent sleeps for 1200 seconds and kills itself. Child executes:
system("/bin/csh -f -c \"%s\" ", rcv_buffer[2]);
Thus, function is almost similar to above function, which executes received command with root privileges except it does not return command output.
4) .text:08048B58 kill_called_proc
Incoming packet structure:
+0: not used
+1: feature number (0x08)
Execution of many features in the binary
starts with spawning process (fork), which then continues feature execution.
PID of spawned process is stored into variable p_id.
Some processes do not terminate themselves, thus kill_called_proc is
used to kill process of previously called feature. It does just "kill -9 p_id"
and nothing more.
5) .text:080483F0 get_10_IPs_from_packet_or_random
Incoming packet structure:
+0: not used
+1: feature number (0x02)
+2: IP_flag (0x00, 0x02 or others)
+3: ipA
+4: ipB
+5: ipC
+6: ipD
... etc. 10 IP addresses in total.
...
Function fills IP_addr buffer with 10 IP addresses received from a client
or picked randomly. Also it stores IP_flag and destination IP
address from IP header of the received packet from a client.
Algorithm of filling IP_addr is a bit weird:
R = rand(10);
If (IP_flag == 2) {
Fill IP_addr with 10 IP addresses stored starting from rcv_buffer[3].
One IP address is skipped, which position number equals to random R.
}
else
{ Fill IP_addr with 10 random IP addresses, one IP is skipped the same way as above. }
if (IP_flag == 2) exit;
if (IP_flag == 0) { Copy first IP address from rcv_buffer[3] to first element of IP_addr }
else { Copy first IP address from rcv_buffer[3] to R element of IP_addr }
Now what conclusions we can make from this?
1. If IP's picked randomly, one will be always changed to specified IP sent from client.
Since those IP's are used as recipients of data in features 0x03 and 0x01,
it is obvious, that data should reach someone real and who requests for it.
So, first IP is IP of a client.
2. IP_addr structure should be filled with 10 IP's. However, in case when
IP_flag = 2, one IP at R position will be skipped and not initialised. The same
occurs when IP_flag = 0, but in this case only first IP will be used in
transmit_data function, so the rest is not really important.
We have no idea what author was thinking of coding this,
it might be just a mistake.
3. Why broadcast data to random IP's? With a non-standard protocol?
We can only think this is a not implemented yet feature, which is supposed
to probe and control other hosts running the-binary, with intentions to perform
distributed denial-of-service attacks.
6) .text:0804835C transmit_status_data
Incoming packet structure:
+0: not used
+1: feature number (0x01)
This function seems like to report the-binary status back to a client.
It transmits data from buffer2 to one (if IP_flag = 0) or ten IP addresses previously initialised with 0x02 feature.
Source IP address of the outgoing packet is set to destination IP address of the packet received from a client. Thus, it's IP of our host running the-binary.
Buffer2 has no actual data inside and is filled only with some constant values
and if we have any feature process running, it reports it's feature number (stored in variable called_feat_number).
So the reply packet is:
+0: 0x00
+1: 0x01
+2: 0x07
+3: 0x01 if child running, 0x00 otherwise
+4: called_feat_number
Packet is encoded before sending with encode_packet and size set to
(0x190 + random(0xC8)) bytes. Transmission performed with a subroutine
transmit_data.
7) .text:08048D08 SYN_flood
Incoming packet structure:
+0: not used
+1: feature number (0x0B)
+2: dst_ipA (destination IP)
+3: dst_ipB
+4: dst_ipC
+5: dst_ipD
+6: dst_portH
+7: dst_portL
+8: ip_src_not_random_flag
+9: src_ipA (source IP)
+A: src_ipB
+B: src_ipC
+C: src_ipD
+D: repeat counter
+E: 0x01 if destination address in str. format (like '128.0.0.1')
+F: dst_addr_in_str_format
...
This function sends TCP packets to specified IP and port number.
Analysing TCP packet parameters we found that each packet comes with SYN bit set.
.text:08049EAF mov [ebp+flag2], 2 ; TH_SYN
This bit has meaning only when establishing connection e.g. in the handshaking procedure. Both sides of the connection needs to send this special packet with SYN flag on.
Thus,
it appears function is trying to perform so-called "SYN Flood" denial-of-service attack.
Source IP, depending on ip_src_not_random_flag, may be real or picked randomly (spoofed). Most of the TCP parameters are picked randomly (src_port, seq, window, ttl,
window, identifier).
As we see, function was intended to send (40,000 * (repeat_counter+1)) packets, however this was not
coded completely and it sends packets in infinite loop, unless it's able to create sockets. So, repeat_counter parameter is useless.
8) .text:08048C34 syn_flood_0
Incoming packet structure:
+0: not used
+1: feature number (0x0A)
...
...
This function is identical to previous (0x0B) and would only differ
by repeat_counter always set 0, thus sending only 40,000
packets. However, it uses the same subroutine as previous one,
where loop is infinte despite of the counter.
9) .text:080487C8 udp_icmp_flood
Incoming packet structure:
+0: not used
+1: feature number (0x05)
+2: UDP_flag (0x01 -- UDP, other ICMP flood)
+3: dst_port
+4: dst_ipA
+5: dst_ipB
+6: dst_ipC
+7: dst_ipD
+8: src_ipA
+9: src_ipB
+A: src_ipC
+B: src_ipD
+C: 0x01 if destination address in str. format (like '128.0.0.1')
+D: dst_addr_in_str_format
.....
Function performs UDP or ICMP flood. Destination IP is mandatory and
dst_port is used only for UDP flood. Source IP is taken from incoming
packet.
UDP flood is performed with packet size 900h , source port is random.
Flooding loop is infinite as in every feature.
10) .text:08048B80 dns_flood
Incoming packet structure:
+0: not used
+1: feature number (0x09)
+2: src_ipA
+3: src_ipB
+4: src_ipC
+5: src_ipD
+6: repeat_counter
+7: src_portH
+8: src_portL
+9: TRUE, if source address in str. format (like '128.0.0.1')
+A: src_addr_in_str_format
...
Function sends UDP packets to IP:53 UDP, where IP are all IP addresses
from the internal address list (.data:0806D22C ip_list)
Packets are sent with specified source IP address and source port (if not
specified, picked randomly). So, what those packets and those IP's are?
We will take closer look on DNS queries sent by the-binary.
All packets to UDP 53 port are stored in the-binary data section
starting at offset:
.rodata:080676BC udp_packets db 'Gn'
There are 9 packets in total. They are all indentical, except
of the domain names: com, net, de, edu, org, usc.edu, es, gs, ie.
Let's examine one of them.
(N/B: lame way to keep 9 same structures only with different domains.)
080676BC udp_packets db 'Gn' ; ID
080676BE db 1 ; flagsL
080676BF db 0 ; flagsH
080676C0 db 0 ; QDCOUNT_H
080676C1 db 1 ; QDCOUNT_L
080676C2 db 0 ; ANCOUNT_H
080676C3 db 0 ; ANCOUNT_L
080676C4 db 0 ; NSCOUNT_H
080676C5 db 0 ; NSCOUNT_L
080676C6 db 0 ; ARCOUNT_H
080676C7 db 0 ; ARCOUNT_L
080676C8 db 3 ;
080676C9 aCom db 'com',0
080676CD db 0 ;
080676CE db 6 ; QTYPE_L
080676CF db 0 ; QTYPE_H
080676D0 db 1 ; QCLASS_L
080676D1 db 0 ; QCLASS_H
080676D2 db 0 ;
...
As we can find in rfc1035 (where DNS messages are described) format of the message is:
+---------------------+
| Header |
+---------------------+
| Question | the question for the name server
+---------------------+
| Answer | RRs answering the question
+---------------------+
| Authority | RRs pointing toward an authority
+---------------------+
| Additional | RRs holding additional information
+---------------------+
Now let's see what we have. Message requires header (see rfc1035).
Header structure is:
1 1 1 1 1 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| ID |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|QR| Opcode |AA|TC|RD|RA| Z | RCODE |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| QDCOUNT |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| ANCOUNT |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| NSCOUNT |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| ARCOUNT |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
ID = "Gn", doesn't matter in our case;
QR = 0, our message is a query;
OPCODE = 0, our message is a standard query (QUERY);
AA = 0, valid in responses and specifies that NS is authority
for domain name in request;
TC = 0, if 1, specifies that this message was truncated due to length greater
than that permitted on transmission channel.
RD = 1, asks for recursive query!;
RA = 0, defined in response (shows whether recursion for that NS available).
Doesn't matter for us;
Z = 0, reserved for future use;
RCODE = 0, defined in response. Response code;
QDCOUNT = 1, specifies the number of entries in question section;
ANCOUNT = 0, specifies the number of resource records in the answer section (we have only requests);
NSCOUNT = 0, specifies the number of name servers resource records in authority records section (doesn't matter);
ARCOUNT = 0, specifies the number of resource records in additional records section (doesn't matter);
We examine now format of question section only:
1 1 1 1 1 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| |
/ QNAME /
/ /
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| QTYPE |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| QCLASS |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
QNAME = 3,'com',0,0. A domain name represented as a sequence of labels,
where each label consists of a length octet followed by that number of
octets. The domain name terminates with the zero length octet for the null
label of the root.
QTYPE = 0x06, specifies the type of the query, 6 - SOA;
QCLASS = 0x01, specifies the class of the query, 1 - IN;
Conclusion:
Packet is standard recursive query for a domain name.
Internal IP list consist of hosts supposed to be name servers.
By sending UDP packets to these name servers with a spoofed
source address of victim, it initiates a packet storm of name
server's replies, targeted back to victim IP.
With just a few bytes (20-30) it can achieve responses of around 400-500 bytes.
Thus, this method gives a real advantage against victim.
This attack is widely used, we call it
"DNS flood" and we know existing exploits with names "DoomDNS", "dnsabuser.c" and others.
The same as for previous attacks, function was intended to send (40,000 * (repeat_counter+1)) packets, but this is not working and packets are sent into infinite loop; repeat_counter parameter is useless.
11) .text:0804871C dns_flood_0
Incoming packet structure:
+0: not used
+1: feature number (0x04)
+2: src_ipA
+3: src_ipB
+4: src_ipC
+5: src_ipD
+6: src_portH
+7: src_portL
+8: TRUE, if source address in str. format (like '128.0.0.1')
+9: src_addr_in_str_format
This function is identical to previous (0x09) and would only differ
by repeat_counter always set 0, thus sending only 40,000
packets. However, it uses the same subroutine as previous one,
where loop is infinte despite of the counter.
12) .text:08048DE4 dns_flood_on_bunch_domains
Incoming packet structure:
+0: not used
+1: feature number (0x0C)
+2: dst_ipA
+3: dst_ipB
+4: dst_ipC
+5: dst_ipD
+6: src_ipA
+7: src_ipB
+8: src_ipC
+9: src_ipD
+A: repeat_counter
+B: src_portH
+C: src_portL
+D: TRUE, if destination address in str. format (like '128.0.0.1')
+E: dst_addr_in_str_format
...
Sends query packets (080676BC udp_packets) to destination IP,
port UDP 53. Source IP of the flood victim, if not specified
by client, picked randomly. However, feature loses to achieve
it's goal to flood this random IP, since it generates new IP for
every query packet. Rest is all the same -- 40,000 packets, repeat_
counter is not used, infinite loop.
Subroutines
-------------
.text:08048ECC transmit_data
Used to transmit data from features 0x01 and 0x03.
Destination IP address(es) should be inited before with 0x02 feature.
Depending on IP_flag value, sends data to one or ten IP addresses.
Communication protocol 0x0B , the same as for incoming packets.
.text:0804A1E8 decode_packet
Used to decode incoming packets from the client.
Algorithm is very simple:
void decode (char *dst, char *src, int len) {
int i = len;
int j = i;
int b;
while(i >= 0) {
if (i == 0) {
b = src[i] - 0x17;
} else {
b = (src[i] - src[i-1]) - 0x17;
}
while (b < 0) {
b = b + 0x100;
}
dst[j] = b;
i--;
j--;
}
}
The most strange thing that author used sprintf(buffer, "%c%s", char1, str1) to manipulate
with data bytes there. Is that Visual Basic programming hard childhood?
.text:0804A194 encode_packet
See above and try to reverse ;)
The End
--------
Did we miss something? :)
|