There is not yet much talking about Ghidra plugins and malware analysis .. so here we go … what follows is a 100 feet view, on how to quickly craft together a Ghidra python snippet, malware analysis oriented.
Objectives
Decrypt Artra Downloader v1 encrypted strings with
- Ghidra Jython
- (plus - because, why not) IDAPython
Quick background about Artra Downloader
I am using the same malware label that was given by PaloAltoNetworks’s Unit42
team to easly identify the sample family, but it might be also known as “CTF Loader
”, as ironically reported by @VK_Intel ;) .
A full analysis - also covering variant 2
and 3
- can be found here.
Firstly observed in this tweet from @malwrhunterteam
, the decryption routine is fairly easy, thus plays well for testing it with the new RE tool.
To The Batmobile!
Sample information:
- file name: winsvc
- md5: 7cc0b212d1b8ceb808c250495d83bae4
- sha1: d2c161ce52240b61d632607a2262890327d82502
- sha256: ef0cb0a1a29bcdf2b36622f72734aec8d38326fc8f7270f78bd956e706a5fd57
(sample pwd commonly used in the malware research sector for sharing samples)
Here the main steps we have to follow for reaching our goals:
- Overcome the string obfuscation
- Identify code referencing the decryption function
- Get function arguments - one will be the enc string offset addr
- Extract encrypted strings from offset address
- Apply string decryption
- Return results
Artra v1 strings decryption function
Graph view vs Pseudo-C view
Ghidra malware analysis scripting 101
First things first
1. Overcome the string obfuscation
This is straight forward - link
1
2
3
|
def decrypt_string(enc_str):
mapping = (enc_str, ''.join([chr(ord(char) - 1) for char in enc_str]))
return(mapping[1])
|
2. Identify code referencing the decryption function
For this particular sample, the decryption function is located @0x004026b0
, for the test, we will just hardcode this value.
1
2
3
|
def run():
xrefs = getReferencesTo(toAddr(0x004026b0))
extract_encrypted_str(xrefs)
|
3. and 4. Get function arguments + Extract encrypted strings from offset address
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
|
def get_function_args(addr):
while True:
# get instruction at given address
ins = getInstructionBefore(toAddr(addr))
# get instruction offset address
ins_addr = ins.getAddress()
# check pattern
get_ins = getInstructionAt(toAddr(addr))
op = get_ins.toString().split()[0]
if "MOV" == op and get_ins.getDefaultOperandRepresentation(0) == "EAX" and \
"0x" in get_ins.getDefaultOperandRepresentation(1):
enc_string = getDataAt(toAddr(get_ins.getDefaultOperandRepresentation(1)))
if enc_string:
# map encrypted string and its offset address
mapping = (toAddr(get_ins.getDefaultOperandRepresentation(1)), enc_string)
enc_buffer.append(mapping)
break
else:
get_function_args(ins_addr.toString())
break
def extract_encrypted_str(xrefs):
for xref in xrefs:
ref_addr = (xref.getFromAddress())
get_function_args(ref_addr.toString())
decrypt_enc_str_and_comment()
|
The code above is basically scanning backward every functions detected via xref.getFromAddress()
, and for each of them, it is checking its
arguments, looking for the pattern below
1
2
3
|
...
MOV EAX, <encrypted_string_addr>
...
|
If the instructions we are looking for are not detected at first try, we will move backward once again to the previous instructions set and so on, until we reach the pattern we are searching.
Once we have a match, a mapping - for later use - is created considering:
- address where the encrypted string is located
- the encrypted string itself
1
2
|
mapping = (toAddr(get_ins.getDefaultOperandRepresentation(1)), enc_string)
enc_buffer.append(mapping)
|
5. Apply string decryption
Iterating through the pre-filled enc_buffer
, the decryption function is called for every gathered string
1
2
3
4
5
6
|
def decrypt_enc_str_and_comment():
for enc_str_addr, enc_str in enc_buffer:
enc_str = enc_str.toString().split()[1].strip("\"")
dec_str = decrypt_string(enc_str)
...
...
|
6. Return results
Decoded strings are returned to Ghidra’s console
and also comments
are placed beside the encrypted strings inside the listing
view.
1
2
3
4
5
6
7
8
9
10
11
12
13
|
def decrypt_enc_str_and_comment():
for enc_str_addr, enc_str in enc_buffer:
enc_str = enc_str.toString().split()[1].strip("\"")
dec_str = decrypt_string(enc_str)
# add comments
codeUnit = listing.getCodeUnitAt(toAddr(enc_str_addr.toString()))
ds_string = getDataAt(toAddr(enc_str_addr.toString()))
ds_string.setComment(codeUnit.EOL_COMMENT, dec_str)
# print results to console
print("Address: %-40s Enc string: %-40s Dec string: %-40s" % \
(toAddr(enc_str_addr.toString()), enc_str, dec_str))
|
Console view
Listing view
… some strings …
Putting everything together
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
|
# ArtraDownloader v1 - strings decryptor
#
# Ref sample:
# file name: winsvc
# md5: 7cc0b212d1b8ceb808c250495d83bae4
# sha1: d2c161ce52240b61d632607a2262890327d82502
# sha256: ef0cb0a1a29bcdf2b36622f72734aec8d38326fc8f7270f78bd956e706a5fd57
#
# Ref links:
# 2018.12.19 https://twitter.com/malwrhunterteam/status/1075454863008382976
# 2018.12.21 https://gist.github.com/raw-data/14915eca4e5e2963a9056f935442358d
# 2019.02.25 https://unit42.paloaltonetworks.com/multiple-artradownloader\
# -variants-used-by-bitter-to-target-pakistan/
#@author raw-data
#@category malware strings decryptor
#@keybinding
#@menupath
#@toolbar
import ghidra.app.script.GhidraScript
import exceptions
enc_buffer = []
listing = currentProgram.getListing()
def decrypt_string(enc_str):
mapping = (enc_str, ''.join([chr(ord(char) - 1) for char in enc_str]))
return(mapping[1])
def get_function_args(addr):
while True:
# get instruction at given address
ins = getInstructionBefore(toAddr(addr))
# get instruction offset address
ins_addr = ins.getAddress()
# check pattern
get_ins = getInstructionAt(toAddr(addr))
op = get_ins.toString().split()[0]
if "MOV" == op and get_ins.getDefaultOperandRepresentation(0) == "EAX" \
and "0x" in get_ins.getDefaultOperandRepresentation(1):
enc_string = getDataAt(toAddr(get_ins.getDefaultOperandRepresentation(1)))
if enc_string:
# map encrypted string and its offset address
mapping = (toAddr(get_ins.getDefaultOperandRepresentation(1)), enc_string)
enc_buffer.append(mapping)
break
else:
get_function_args(ins_addr.toString())
break
def extract_encrypted_str(xrefs):
for xref in xrefs:
ref_addr = (xref.getFromAddress())
get_function_args(ref_addr.toString())
decrypt_enc_str_and_comment()
def decrypt_enc_str_and_comment():
for enc_str_addr, enc_str in enc_buffer:
enc_str = enc_str.toString().split()[1].strip("\"")
dec_str = decrypt_string(enc_str)
# add comments
codeUnit = listing.getCodeUnitAt(toAddr(enc_str_addr.toString()))
ds_string = getDataAt(toAddr(enc_str_addr.toString()))
ds_string.setComment(codeUnit.EOL_COMMENT, dec_str)
# print results to console
print("Address: %-40s Enc string: %-40s Dec string: %-40s" % \
(toAddr(enc_str_addr.toString()), enc_str, dec_str))
def run():
xrefs = getReferencesTo(toAddr(0x004026b0))
extract_encrypted_str(xrefs)
run()
|
Out of curiosity I just translated the code above to IDAPython, resulting in
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
|
from idautils import *
from idc import *
############################################################################
# ArtraDownloader v1 - strings decryptor
#
# Ref sample:
# file name: winsvc
# md5: 7cc0b212d1b8ceb808c250495d83bae4
# sha1: d2c161ce52240b61d632607a2262890327d82502
# sha256: ef0cb0a1a29bcdf2b36622f72734aec8d38326fc8f7270f78bd956e706a5fd57
#
# Ref links:
# 2018.12.19 https://twitter.com/malwrhunterteam/status/1075454863008382976
# 2018.12.21 https://gist.github.com/raw-data/14915eca4e5e2963a9056f935442358d
# 2019.02.25 https://unit42.paloaltonetworks.com/multiple-artradownloader\
# -variants-used-by-bitter-to-target-pakistan/
############################################################################
__author__ = 'raw-data'
def decrypt_string(enc_str):
mapping = (enc_str, ''.join([chr(ord(char) - 1) for char in enc_str]))
return(mapping[1])
def get_string(addr):
return GetString(addr)
def get_function_args(addr):
while True:
addr = idc.PrevHead(addr)
if GetMnem(addr) == "mov" and "eax" in GetOpnd(addr, 0):
return GetOperandValue(addr, 1)
break
def extract_encrypted_str(xrefs):
for addr in xrefs:
ref = get_function_args((addr.frm))
enc_str = get_string(ref)
dec_str = decrypt_string(enc_str)
# add comments
MakeComm(addr.frm, dec_str)
MakeComm(ref, dec_str)
# print results to console
print("Address: %-40s Enc string: %-40s Dec string: %-40s" % \
(addr.frm, dec_str, dec_str))
def run():
xrefs = XrefsTo(0x004026b0, flags=0)
extract_encrypted_str(xrefs)
run()
|
If you quickly check both versions of the script, you will see that there are not many (obviously) differences - not counting specific tools API calls.
Just the backward scanning function get_function_args
was implemented slightly differently, but I am sure there are more elegant ways to get to the same result (ghidra side) … but as a first try I think it is not too bad and it did the trick!
Sample signatures
One the one hand, I am not going to reinvent the wheel, so here you can find @James_inthe_box’s #snort
/ #suricata
and #yara
signatures.
On the other hand, if you want to track Artra Downloader v1
string decryption function (you will miss v2
and v3
), I got decent results with the following
rule memory_win_trojan_downloader_artra_v1
{
meta:
author = "raw-data"
tlp = "WHITE"
version = "1.0"
created = "2019-03-26"
modified = "2019-03-26"
description = "Detects Artra string decryption routine"
reference1 = "https://twitter.com/malwrhunterteam/status/1075454863008382976"
reference2 = "https://gist.github.com/raw-data/14915eca4e5e2963a9056f935442358d"
reference3 = "https://unit42.paloaltonetworks.com/multiple-artradownloader-variants-used-by-bitter-to-target-pakistan/"
sha256_sample1 = "523a17f6892c2558ac4765959df4af938e56a94fa6ed39636b8b7315def3a1b4"
sha256_sample2 = "ef0cb0a1a29bcdf2b36622f72734aec8d38326fc8f7270f78bd956e706a5fd57"
strings:
$hex1 = { 8a 08 40 84 c9 75 ?? 2b c2 8b f0 8d 46 01 50 e8 27 04 00 00
83 c4 04 33 c9 85 f6 7e ?? 55 8b c8 }
$hex2 = { 8a 14 0f fe ca 88 11 41 83 ed 01 75 ?? 5d 5f c6 04 06 00 }
condition:
any of ($hex*)
}