With what are we dealing with?
If you converted yourself to Ghidra in favour of other RE tools I am sure you are sometime missing a solid detection of the most common functions (sprintf
, printf
, __security_check_cookie
, memset
and tons of others ), which will bring you inevitably to reverse engineering the same library code, again and again.
Lucky for us, Ghidra offers FunctionID, which could be seen as an equivalent solution offered by IDA F.L.I.R.T signatures. Now, a part from that, FunctionID plays well also for other uses cases, such as fingerprinting common functions observed in malware families, on this topic Mich - @0x6d696368
was one of the first exploring this approach while analyzing Duqu 2.0
and it’s xor decoding function.
The idea
Long story short, I was talking to Mich the other day about this topic and wondering if Ghidra offers any out-of-the-box option to single hash a function instead of exporting every single function with it’s equivalent hash.
It turned out it’s kind of doable, but, it requires the user to input a list of functions to skip, while generating the export. It sounded that Ghidra did not really had a “click-and-go” solution.
In addition, supposing we live with that, the exported database (.fidb
) is not one of the most friendly format (Java serialization data, version 5
) to work with, nor exists - as far as I could see - a usable interface to edit it’s content. There are some libraries around that helps you speak to/from Java serialization data … but I didn’t really want to go down that road.
Introducing FID facilitators
I came up with two scripts in the end, one for generating FunctionID hashes on the fly, and the other for quickly scanning the whole binary and check for FunctionID matches stored in a wanna-be database, everything packed in JSON format.
FunctionIdHashFunction.py
once the mouse cursor is placed in a function body, it can be called with keybindingShift+H
(ifIn Tool
is checked). Log will be shown in theConsole
and data will be added to the database (if not already present)
Output example
[+] FunctionName: f__Memset FunctionID: 0x7f2af2530168894d added to database
FunctionIdMatcher.py
will read the fiddb.json file, that acts as the database, and report any matches within the binary
Output example
FunctionEntryPoint: 00401370 FunctionID: 0x68ab3a20e0806cb9
OriginalFunctionName: FUN_00401370 NewFunctionName: f__Strcpy
FunctionEntryPoint: 00401660 FunctionID: 0x7ebc9f242301cb7f
OriginalFunctionName: FUN_00401660 NewFunctionName: f__Strlen
FunctionEntryPoint: 004072a0 FunctionID: 0x1c9ee9e0121d1c7a
OriginalFunctionName: FUN_004072a0 NewFunctionName: f__Strncat
FunctionEntryPoint: 00407660 FunctionID: 0xada18d5e21b35f37
OriginalFunctionName: FUN_00407660 NewFunctionName: f__ToUpper
FunctionEntryPoint: 00408dd0 FunctionID: 0xe272f3501413fcbf
OriginalFunctionName: FUN_00408dd0 NewFunctionName: f__Itoa
For starting populating the database, with common library functions, I referred to the in depth analysis of the Hermes ransomware, h/t to @AGDCservices
, were libraries such as strcmp
, memcpy
, itoa
, strcat
, strcpy
and many others were recognized at a glance of an eye by the researcher.
As of the time of writing, fiddb.json, looks as follows
|
|
Please note that, both scripts currently uses a not really elegant way of extracting FunctionID from the binary, while experimenting, the proper way of approaching this would be something like
from ghidra.feature.fid.service import FidService as fs
fid = fs.hashFunction(function).getFullHash()
print("0x%x" % fid)
But for some reasons - I could not find the root cause - this does not always work (e.g. the number of hashed functions mismatches with the one generated by the Ghidra FID embedded tool), instead calling the hashing function as below, does the trick
from ghidra.feature.fid.service import FidService as fs
fid = fs.hashFunction(function) # string formatting omitted here
print(fid)
One last note, when running FunctionIdMatcher.py
for the first time, if it does not find a fiddb.json in the same folder, it will ask for one to the user, the submitted database will be copied into the same folder where the script is stored, leaving untouched the original one.
You can fetch both scripts here, whereas, fiddb.json is hosted at this repository.
– Enjoy!