Library function identifier based on correlation
Using library function identification ala fenris was very useful, but
a bunch of functions remained unidentified, although we thought that
some of them were also libc functions. After some investigation we
thought that this task could still be automatized: if we get first 100
bytes of a function and correlate them with the first 100 of every
library function, the maximum correlation could indicate a real match
for this type of functions.
A real application should compute a mathematical correlation, but we
were in a hurry: we developed some scripts to just compare these 100
bytes, one by one. These are the new utilities:
- afprint2.c, a C program that generates
the 100 bytes used for signature, reading from standard input. It is
not just a pipe, because it tries to change/remove non-permanent
values (just as fenris fprints does).
- getfprints2, based on fenris getfprints, generates signature (just 100 bytes processed with fprint2) for each function in a static library.
- checka2, a shell script that, given a
certain address and a binary file, dumps the first 100 bytes from a
binary in the specified address, computes this kind of signature of
100 and compare it with the databases produced with getfprints2,
showing any possible matches (functions where matches bytes where more
than a certain limit, with a default of 80).
Its performance is really bad: we should transform it in a C program
as soon as we have some time.