Ghidra FunctionID facilitators

August 15, 2021

With what are we dealing with?

If you converted yourself to Ghidra in favour of other RE tools I am sure you are sometime missing a solid detection of the most common functions (sprintf, printf, __security_check_cookie, memset and tons of others ), which will bring you inevitably to reverse engineering the same library code, again and again.

Lucky for us, Ghidra offers FunctionID, which could be seen as an equivalent solution offered by IDA F.L.I.R.T signatures. Now, a part from that, FunctionID plays well also for other uses cases, such as fingerprinting common functions observed in malware families, on this topic Mich - @0x6d696368 was one of the first exploring this approach while analyzing Duqu 2.0 and it’s xor decoding function.

The idea

Long story short, I was talking to Mich the other day about this topic and wondering if Ghidra offers any out-of-the-box option to single hash a function instead of exporting every single function with it’s equivalent hash.

It turned out it’s kind of doable, but, it requires the user to input a list of functions to skip, while generating the export. It sounded that Ghidra did not really had a “click-and-go” solution.

In addition, supposing we live with that, the exported database (.fidb) is not one of the most friendly format (Java serialization data, version 5) to work with, nor exists - as far as I could see - a usable interface to edit it’s content. There are some libraries around that helps you speak to/from Java serialization data … but I didn’t really want to go down that road.

Introducing FID facilitators

I came up with two scripts in the end, one for generating FunctionID hashes on the fly, and the other for quickly scanning the whole binary and check for FunctionID matches stored in a wanna-be database, everything packed in JSON format.

FunctionIdHashFunction.py once the mouse cursor is placed in a function body, it can be called with keybinding Shift+H (if In Tool is checked). Log will be shown in the Console and data will be added to the database (if not already present)

Output example

[+] FunctionName: f__Memset	FunctionID: 0x7f2af2530168894d added to database

FunctionIdMatcher.py will read the fiddb.json file, that acts as the database, and report any matches within the binary

Output example

FunctionEntryPoint: 00401370	FunctionID: 0x68ab3a20e0806cb9	
OriginalFunctionName: FUN_00401370	NewFunctionName: f__Strcpy

FunctionEntryPoint: 00401660	FunctionID: 0x7ebc9f242301cb7f
OriginalFunctionName: FUN_00401660	NewFunctionName: f__Strlen

FunctionEntryPoint: 004072a0	FunctionID: 0x1c9ee9e0121d1c7a
OriginalFunctionName: FUN_004072a0	NewFunctionName: f__Strncat

FunctionEntryPoint: 00407660	FunctionID: 0xada18d5e21b35f37
OriginalFunctionName: FUN_00407660	NewFunctionName: f__ToUpper

FunctionEntryPoint: 00408dd0	FunctionID: 0xe272f3501413fcbf
OriginalFunctionName: FUN_00408dd0	NewFunctionName: f__Itoa

For starting populating the database, with common library functions, I referred to the in depth analysis of the Hermes ransomware, h/t to @AGDCservices, were libraries such as strcmp, memcpy, itoa, strcat, strcpy and many others were recognized at a glance of an eye by the researcher.

As of the time of writing, fiddb.json, looks as follows

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24


{
  "version": "0.1", 
  "database": {
    "last_update_utc": "2021-08-15T13:30:37.933000", 
    "functions": {
      "0xaaee86a55bbecd83": "f__Strcat", 
      "0x68ab3a20e0806cb9": "f__Strcpy", 
      "0xe272f3501413fcbf": "f__Itoa", 
      "0x21017015108e0c04": "f__Strstr", 
      "0xbf443ceebacff199": "f__Strchr", 
      "0xada18d5e21b35f37": "f__ToUpper", 
      "0x7ebc9f242301cb7f": "f__Strlen", 
      "0xbf90083caac8913d": "f__Itoa", 
      "0x7f2af2530168894d": "f__Memset", 
      "0x31bd1ac1d875fd12": "f__Strcat", 
      "0x4ba5d7eb31aaefa1": "f__Strlen", 
      "0xb1fe34907af36e2d": "f__Strcmp", 
      "0x1209b4d702111d40": "f__Strstr", 
      "0x1c9ee9e0121d1c7a": "f__Strncat", 
      "0x5fee2a4f48a9cc24": "f__Strcpy"
    }, 
    "entries": 15
  }
}

Please note that, both scripts currently uses a not really elegant way of extracting FunctionID from the binary, while experimenting, the proper way of approaching this would be something like

from ghidra.feature.fid.service import FidService as fs
fid = fs.hashFunction(function).getFullHash()
print("0x%x" % fid)

But for some reasons - I could not find the root cause - this does not always work (e.g. the number of hashed functions mismatches with the one generated by the Ghidra FID embedded tool), instead calling the hashing function as below, does the trick

from ghidra.feature.fid.service import FidService as fs
fid = fs.hashFunction(function) # string formatting omitted here
print(fid)

One last note, when running FunctionIdMatcher.py for the first time, if it does not find a fiddb.json in the same folder, it will ask for one to the user, the submitted database will be copied into the same folder where the script is stored, leaving untouched the original one.

You can fetch both scripts here, whereas, fiddb.json is hosted at this repository.

– Enjoy!

raw-data memdumps

Ghidra FunctionID facilitators

With what are we dealing with?

The idea

Introducing FID facilitators