Analysis of a Trojan downloader
Introduction
This time I wanted to analyse an obfuscated and/or encrypted malware. I chose a random sample from malwr.com and luckily it was exactly what I was looking for (well, almost…).
The malware is a MS Word document, which means the attack vector is probably email.
Before I begin, I want to say that if you can’t read the text in the screenshots, because it’s too small, open them in a new tab.
OK, let’s begin.
Triage analysis
Strings
The first thing to do when analysing malware is to check the strings. Looking at the screenshots below, you can see strings like "Public Declare Function..."
, or "NtWriteVirtualMemory"
which means it probably uses VBA script (as expected), and also makes use of low level native API functions for writing and allocating memory.
I used olevba to further analyze the document.
olevba -d 846fe7d28d9134a06a3de32d7a102e481824cca8155549c889fb6809aedcbc2c.doc
You can see the results from olevba below. Basically it confirmed the suspicion that the document has VBA macros. On the first screenshot you can see a summary of the analysis.
It also has a large encoded string, which is probably a file or a very long shellcode.
On these screenshots you can see part of the VBA script, which uses Document_Open()
function, to automatically start the script when the document is opened (works only the user enables macros).
Virustotal
To make the analysis easier and gain some additional information, it’s good to check the results from online malware analysis services like virustotal, malwr or hybrid-analysis Many AV solutions classify it as Trojan/Downloader.
I also took the chance to make a little experiment. First I searched for the malwre by hash. You can compare with the hash from malwr to verify that it’s the same sample. The last time it was analysed was 30.08.2017 with 34 detections.
Virustotal also finds the VBA code and detect the code page as Cyrillic.
I rescanned the file, and the number of AV solutions that detect the malware, at the time I’m writing this, is now 38.
Then, I changed only the modification timestamp of the document (added a title, saved, then removed title), effectivly also changing the hash.
And now only 19 AV solutions sucessfully detect it. This goes to show how ineffective many AV programs are. With a simple modification the malware author can cut the detection rate in half!
Below is the full list of AV programs that successfully detect it after the timestamp modification. I’m actually surprised that ESET and Bitdfender are not on the list.
Sandbox
The sandbox analysis at malwr.com is shown below. You can see the original filename and the hashes.
The malware connects to several domains and IP addresses. It probably uses api.ipfy.org
and checkip.dyndns.org
to find the public IP address of the infected machine. The rest are likely C2 domains.
It also spawns several processes:
Sends 18 HTTP requests.
Screenshot of the opened document.
VM detonation
To gather more information, I also ran it in my VM (although it won’t be any different from the results at malwr.com
).
On my VM it creates only one process - svchost.exe
. You’ll see later why.
Checking the strings of svchost.exe
, with Process Hacker, shows interesting domains. Some of them (the russian ones) weren’t shown in the mawlr.com
analysis.
The trace from Process Monitor doesn’t show anything I don’t know already. The malware starts a new svchost.exe
process and the new process tries to connect to some IP addresses.
API monitor shows that the Word process allocates memory with NtAllocateVirtualMemory
and RWX permissions, then writes 5883 bytes
with NtWriteVirtualMemory
and after that calls CreateTimerQueueTimer
which can execute code and one of its arguments is an address that points inside the previously written memory.
One of the things svchost.exe
probably does is process enumeration. You can see that it iterates through all processes.
TcpLogView logs only one connection.
With Wireshark you can see why. One of the Command and Control domains doesn’t exist anymore, the other two resolve sucessfully, but the servers are down. This means I won’t be able to analyse the other modules of the malware, but only the dropper.
Static analysis (MS Word document)
The VBA script is heavily obfuscated, so I’ll go directly to dynamic analysis. I thank the IT gods, that the VBA script editor has a debugger.
Dynamic analysis (MS Word document)
The VBA script loads some functions from several DLLs. The only one that can spawn a process is CreateTimerQueueTimer
which you saw earlier in the output from API monitor. I could stop the execution right before calling it and dump the memory contents that are going to be executed, but I need to know where the buffer starts and how big it is.
On the screenshot below, between the lines of code are the lyrics of the song Hurricane by Luke Combs written as comments.
The function Document_Open()
is automatically executed when the document is opened (if the macros are enabled). This function calls another one called abraham()
.
I renamed Document_Open()
to Disabled_Document_Open()
, to prevent the automatic execution every time I open the document.
Stepping through the code with the debugger, I found where the large string, that olevba showed, is loaded.
The Right
function removes the 4 leading spaces.
The next line decodes the string to binary format. I added a function to convert the bytes of the decoded string to hex and print it, then used a hex editor attached to the process to find the location and contents of the buffer holding the decoded string.
Note: My function omits leading zeros in the hex output (08
is printed as 8
)… my knowledge of VBA is poor.
I don’t know if this is the final transformation of the buffer so I’ll still not dump it. I’ll have to go all the way until CreateTimerQueueTimer
is called
Buffer that holds the decoded bytes is passed to the function arch
. Before continuing the analysis of arch
I’ll first analyse the functions that it uses.
The function birmingham
is an alias for NtWriteVirtualMemory
.
birmingham (NtWriteVirtualMemory)
is called from policeman
. If you follow the arguments, you can see that the first one (kola
) is pointer to the address where data is going to be written. The second argument (haft
) is pointer to a buffer that contains the data to be written and the third (restrengthen
) is the number of bytes to write. So policeman
is just a wrapper for NtWriteVirtualMemory
Now let’s return to arch
. arch
accepts our decoded bytes as an argument. First it calls policeman
to store a pointer (4 bytes in size) to the argument (the buffer) in the variable militarized
.
Below you can see that militarized
(accusation
is a pointer to it) holds an address, which points the buffer.
The address if reversed because of the endianness.
Then, arch
uses NtAllocateVirtualMemory
to allocate 9593 bytes
with Read,Write and Execute permissions. The bowing
variable stores the pointer to that memory
Again policeman (NtWriteVirtualMemory)
is called and 5883 bytes
from the buffer are written to the newly allocated memory.
Finally arch
returns a pointer to the executable memory that now holds the bytes of the decoded string.
Below you can see that arch
indeed returns a pointer to memory that holds the buffer, and stores it in the variable humbler
.
A few lines later it calls the function windzors
, which takes 3
arguments, one of which is a pointer to a memory inside the buffer at an offset of 0x1090
bytes from the beginning.
windzors
calls quartertone
which is an alias for CreateTimerQueueTimer
. MSDN tells us that
CreateTimerQueueTimer
- “Creates a timer-queue timer.” and “When the timer expires, the callback function is called.”.
The third argument is a pointer to the callback function and it is the same one which point inside the buffer with decoded bytes.
What’s left is to dump 5883 bytes
from the beginning of the buffer (the whole buffer). For the purpose I use HxD hex editor, attach it to the word process, locate the memory of the buffer, copy it and save it to a new file, that I called shellcode.bin
.
So in summary, this stage of the malware decodes, injects and executes shellcode in its own process.
Static analysis (shellcode)
I open the shellcode.bin
in IDA and tell IDA to treat address 0x1090
as a function.
With its first few instructions, the shellcode locates the base address of the first loaded module (DLL) in the process, which is ntdll.dll
. Then it calls find_function
(you’ll see why I called it that way) with a 4 byte value as an argument.
Before I explain the purpose of find_function
, I’ll analyse the functions it uses. The first one is get_pointer_to_PE_signature
. It takes eax
as argument, which points to the base address of the DLL passed to find_function
and returns a pointer to the PE signaturem, which is at constant offset (0x3c
bytes) from the beginning of the file.
get_pointer_to_PE_signature
is called from get_export_table
. This functions uses the pointer to the PE signature to find the address of the Export Table.
Now you can see find_function
below. It iterates through the functions of the DLL, calcules a value (hash) based on their name, and compares it to the 4 byte value that was passed as an argument. If the values match, a pointer to that function is returned.
On the screenshot below is the hashing function.
All functions that are used by the shellcode are hashed and dynamically resolved with find_function
.
I wrote a simple python script to decode all the hashes in the shellcode.
# 'DLLstrings.txt' is generated with "strings -a *.dll"
# from the system directory
# which is SysWow64 on 64bit system or System32 on 32bit system.
file = open('DLLstrings.txt','r').read().split('\n')
def hash(s):
eax = 0
for i in range(len(s)):
esi = eax
eax = eax << 7
eax = 0xffffffff & eax
esi = esi >> 0x18
esi = eax | esi
if (0x80 & s[i]):
eax = 0xffffff00 | s[i]
else:
eax = s[i]
eax = eax ^ esi
return eax
input_hash = raw_input("Enter hash value: ").lower()
for function_name in file:
hashed_name = hex( hash( bytearray(function_name) ) )
if hashed_name.find(input_hash) != -1:
print('Success! The function is:\n')
print(function_name)
break
Example output:
LdrLoadDLL
is used to load other libraries.
Some of the functions it loads are typical for the process injection technique called process hollowing
,
which steps are:
1) Start a new and legitimate process in suspended state
.
2) Save the context of the remote process with GetThreadContext
3) Unmap the memory of the remote process starting from the base address with UnmapViewOfSection
4) Allocate memory with RWX permission in the remote process, replacing the unmapped memory.
5) Write the malicious code in the remote process at the allocated memory.
6) Set the context to the one that was saved earlier.
7) Resume execution with ResumeThread
.
After these steps the code of the legitimate process is replaced with a malicious one, but the context is preserved and it will continue to look like a legitimate process (doing some bad things, though).
The screenshots below shows that the malware does exactly the steps for process hollowing. I didn’t show it but the shellcode decodes part of it’s memory and loads it in a buffer, that’s going to be injected in a remote process.
The process to be used for injection is…. svchost.exe
(surprise, surprise).
The base address of the remote process is 0x400000
.
The memory to allocate in svchost.exe
is SizeOfImage
bytes (this value is taken from the PE headers of the buffer, holding the already decoded malicous code, which appears to be a PE executable). The allocation starts from the base address of the remote process.
After the PE Headers are written, the shellcode loops through the sections of the malicous code, and writes them at the appropriate addresses in svchost.exe
.
And finally the now malicous svchost.exe
resumes execution.
Dumping the memory
To dump the injected code, I have to break right before it executes (before ResumeThread
). I use x64dbg for debugging and attach it to the MS Word process. Because I disabled the automatic execution of the VBA script, the malware won’t start until I manually execute the script.
Set a breakpoint at SetThreadContext
function. It’s unlikely that MS Word uses this function, so I’m sure the only place where a breakpoint will be hit is in the shellcode.
Running the VBA macro and immediately the breakpoint is hit.
With Process Hacker you can see that svchost.exe
is still in a suspended state (it’s highlighted in gray). I also use it to dump the memory region at 0x400000
, where the malicious code resides.
The sections of an executable file are mapped at different offsets from the beginning of the file, depending if it’s loaded in memory or it’s staying on disk. To be able to run the dumped code, I have to unmap it, using the tool pe_unmapper.
And now to load it in IDA :)
To my surprize it has very few functions. Maybe there is yet another stage?
Static analysis (svchost.exe)
Below you can see where the last call in the start
function leads. These instructions look like gibberish. My bet is that this code is encrypted or packed.
After I reversed the functions, my suspicion was right. It gets a pointer to its own base address with get_pointer_to_MZ_signature
, loads different libraries and functions (similar to the way the shellcode did, but without the use of hashes) and then decrypts the memory to which the last call jumps.
The memory is decrypted with 0x59
as key.
Dump decrypted svchost.exe
To dump the fully decrypted binary, I’ll again use a debugger. If you can’t see the screenshots well, open them in a new tab.
I set the permissions of the .text section to RWX, so the code can modify (decrypt) itself.
There is a check right before the decryption routine that fails and I don’t know why, but I manually bypass it, by changing the value of the Zero Flag.
When I reach the last call in the start
function, the code should be fully decrypted and I can use Process Hacker again to dump the memory.
Unmap the file.
Aaaaand now it looks better. As you can see there are many functions now.
The stages of the malware until now can be summarised in the following steps:
1) The word document decodes a large shellcode
2) Then injects and executes the shellcode in its own process
3) The shellcode decodes a buffer that is a malicious PE executable
4) Injects the malicious code in a remote process (svchost.exe
) via process hollowing
5) The code of the new process is almost entirely encrypted, so it decrypts itself.
Static and Dynamic analysis (decrypted svchost.exe)
The call graph looks really big and it’s going to take me a lot of time to reverse the whole binary. That’s why I’ll only analyse parts of it, like those used for networking stuff.
Below you can see the imported functions. There are no surprizes here, considering that we already knew that it connects to remote hosts, downloads files and executes them.
Some strings that I missed in the beginning of the analysis are HTTP Request headers and two format strings.
The main function is an endless loop.
At the beginning of the loop, the first thing this stage of the malware does is to communicate with the C2 servers.
This function, collects information such as:
- OS Version
- MAC address
- Volume Serial Number of the C: drive
- Public IP address (by using
api.ipfy.org
- Hostname and the domain
MAC address and the volume serial number are used to uniquely identify the machine.
The hostname and the domain are retrieved with the WinAPI function LookupAccountSid
, which
“accepts a security identifier (SID) as input. It retrieves the name of the account for this SID and the name of the first domain on which this SID is found.”. The SID is taken from the explorer.exe
process, and to find explorer.exe
the malware iterates through the running processes
(do you remember the output of API monitor? This is what I thought was process enumeration).
Then it decrypts RC4 encrypted string, that holds the malware build version and the list of C2 domains separated by the pipe | symbol.
The malware tries to connect to the first C2 domain and if successful sends the collected information in a HTTP POST request. If the connection fails it tries the next server in the list.
Because the C2 servers are down (for this build at least) I spoofed the DNS response to point to my machine.
You can see all the information it sends in the body of the HTTP POST request.
It also expects an answer (a command), which I think is encoded, I haven’t reversed that part, because it’s harder when I don’t know how the response should look like.
Anyway, after the command is decoded, it enters a switch statement with several cases. Depending on the command it can:
- Download a file (in memory) and execute/inject it via process hollowing (again using
svchost.exe
) - Download a DLL (in memory), load it, and call some function from it or start a new thread.
- Download a file to the
%TEMP%
directory and execute it.
Yara Rule
The encoded shellcode, in the word document, is stored in a tab which is part of a form and starts with 4 spaces. This format is uniqe and for some reason I don’t think it’ll change across versions. The shellcode is encoded as long contiuous string (7000+ characters), which are rare, but embedded in a tab even more. That’s why I think this is a good thing to use to detect this malware. Of course combined with the function names NtWriteVirtualMemory
, NtAllocateVirtualMemory
and CreateTimerQueueTimer
which should be very rare in a legitimate word document.
rule trojan_downloader
{
meta:
description = "Detects MS Office document with embedded VBA trojan dropper"
author = "Iliya Dafchev idafchev [4t] mail [dot] bg"
date = "2017-09-21"
strings:
$ole_file_signature = { D0 CF 11 E0 A1 B1 1A E1 }
$function1 = "CreateTimerQueueTimer"
$function2 = "NtWriteVirtualMemory"
$function3 = "NtAllocateVirtualMemory"
$vba_project = "VBA_PROJECT" wide
// match the encoded shellcode, inserted in a Tab
// format: Tab<number> <size[4k-10k]> 0x00 0x80 <four_spaces> <at_least_15_printable_characters>
$encoded_shellcode = /Tab\d[\x00-\xff][\x0f-\x27]\x00\x80\x20{4}[\x21-\x7e]{15}/
condition:
$ole_file_signature at 0 and all of ($function1, $function2, $function3, $vba_project) and $encoded_shellcode in (100000..filesize) and filesize > 100KB and filesize < 1MB
}
Snort rule
alert tcp $HOME_NET any -> $EXTERNAL_NET $HTTP_PORTS (msg:"Trojan installed on internal network!"; content:"/ls5/forum.php"; nocase; pcre:"/setedranty.com|attotperat.ru|robtetoftwas.ru/i"; pcre:"/GUID=\d+&BUILD=\d+&INFO=\N+&IP=\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}&TYPE=\d&WIN=\N+/i"; sid:1;)
Indicators of Compromise
The dropper isn’t writing anything to disk (unless instructed by the hackers), so besides hashes there isn’t anything else.