15 minute read

Introduction


This time I wanted to analyse an obfuscated and/or encrypted malware. I chose a random sample from malwr.com and luckily it was exactly what I was looking for (well, almost…).

The malware is a MS Word document, which means the attack vector is probably email.

Before I begin, I want to say that if you can’t read the text in the screenshots, because it’s too small, open them in a new tab.

OK, let’s begin.

Triage analysis


Strings

The first thing to do when analysing malware is to check the strings. Looking at the screenshots below, you can see strings like "Public Declare Function...", or "NtWriteVirtualMemory" which means it probably uses VBA script (as expected), and also makes use of low level native API functions for writing and allocating memory.

malware_00
malware_01

I used olevba to further analyze the document.

olevba -d 846fe7d28d9134a06a3de32d7a102e481824cca8155549c889fb6809aedcbc2c.doc

You can see the results from olevba below. Basically it confirmed the suspicion that the document has VBA macros. On the first screenshot you can see a summary of the analysis.

malware_02

It also has a large encoded string, which is probably a file or a very long shellcode.

malware_03

On these screenshots you can see part of the VBA script, which uses Document_Open() function, to automatically start the script when the document is opened (works only the user enables macros).

malware_04
malware_05

Virustotal

To make the analysis easier and gain some additional information, it’s good to check the results from online malware analysis services like virustotal, malwr or hybrid-analysis Many AV solutions classify it as Trojan/Downloader.

I also took the chance to make a little experiment. First I searched for the malwre by hash. You can compare with the hash from malwr to verify that it’s the same sample. The last time it was analysed was 30.08.2017 with 34 detections.

malware_07a

Virustotal also finds the VBA code and detect the code page as Cyrillic.

malware_07c

I rescanned the file, and the number of AV solutions that detect the malware, at the time I’m writing this, is now 38.

malware_07d

Then, I changed only the modification timestamp of the document (added a title, saved, then removed title), effectivly also changing the hash.

malware_07w
malware_07x

And now only 19 AV solutions sucessfully detect it. This goes to show how ineffective many AV programs are. With a simple modification the malware author can cut the detection rate in half!

malware_07y

Below is the full list of AV programs that successfully detect it after the timestamp modification. I’m actually surprised that ESET and Bitdfender are not on the list.

malware_07z

Sandbox

The sandbox analysis at malwr.com is shown below. You can see the original filename and the hashes.

malware_06

The malware connects to several domains and IP addresses. It probably uses api.ipfy.org and checkip.dyndns.org to find the public IP address of the infected machine. The rest are likely C2 domains.

malware_09

It also spawns several processes:

malware_10

Sends 18 HTTP requests.

malware_11

Screenshot of the opened document.

malware_08

VM detonation

To gather more information, I also ran it in my VM (although it won’t be any different from the results at malwr.com).

malware_12

On my VM it creates only one process - svchost.exe. You’ll see later why. Checking the strings of svchost.exe, with Process Hacker, shows interesting domains. Some of them (the russian ones) weren’t shown in the mawlr.com analysis.

malware_13
malware_14
malware_15
malware_16

The trace from Process Monitor doesn’t show anything I don’t know already. The malware starts a new svchost.exe process and the new process tries to connect to some IP addresses.

malware_17
malware_18
malware_19

API monitor shows that the Word process allocates memory with NtAllocateVirtualMemory and RWX permissions, then writes 5883 bytes with NtWriteVirtualMemory and after that calls CreateTimerQueueTimer which can execute code and one of its arguments is an address that points inside the previously written memory.

malware_20

One of the things svchost.exe probably does is process enumeration. You can see that it iterates through all processes.

malware_21
malware_22

TcpLogView logs only one connection.

malware_25

With Wireshark you can see why. One of the Command and Control domains doesn’t exist anymore, the other two resolve sucessfully, but the servers are down. This means I won’t be able to analyse the other modules of the malware, but only the dropper.

malware_26
malware_27
malware_28

Static analysis (MS Word document)


The VBA script is heavily obfuscated, so I’ll go directly to dynamic analysis. I thank the IT gods, that the VBA script editor has a debugger.

Dynamic analysis (MS Word document)


The VBA script loads some functions from several DLLs. The only one that can spawn a process is CreateTimerQueueTimer which you saw earlier in the output from API monitor. I could stop the execution right before calling it and dump the memory contents that are going to be executed, but I need to know where the buffer starts and how big it is.

On the screenshot below, between the lines of code are the lyrics of the song Hurricane by Luke Combs written as comments.

malware_29

The function Document_Open() is automatically executed when the document is opened (if the macros are enabled). This function calls another one called abraham().

I renamed Document_Open() to Disabled_Document_Open(), to prevent the automatic execution every time I open the document.

malware_30

Stepping through the code with the debugger, I found where the large string, that olevba showed, is loaded.

malware_31

The Right function removes the 4 leading spaces.

The next line decodes the string to binary format. I added a function to convert the bytes of the decoded string to hex and print it, then used a hex editor attached to the process to find the location and contents of the buffer holding the decoded string.

Note: My function omits leading zeros in the hex output (08 is printed as 8)… my knowledge of VBA is poor.

I don’t know if this is the final transformation of the buffer so I’ll still not dump it. I’ll have to go all the way until CreateTimerQueueTimer is called

malware_32

Buffer that holds the decoded bytes is passed to the function arch . Before continuing the analysis of arch I’ll first analyse the functions that it uses.

malware_33

The function birmingham is an alias for NtWriteVirtualMemory.

malware_34

birmingham (NtWriteVirtualMemory) is called from policeman. If you follow the arguments, you can see that the first one (kola) is pointer to the address where data is going to be written. The second argument (haft) is pointer to a buffer that contains the data to be written and the third (restrengthen) is the number of bytes to write. So policeman is just a wrapper for NtWriteVirtualMemory

malware_35
malware_36

Now let’s return to arch. arch accepts our decoded bytes as an argument. First it calls policeman to store a pointer (4 bytes in size) to the argument (the buffer) in the variable militarized.

malware_37

Below you can see that militarized (accusation is a pointer to it) holds an address, which points the buffer.

malware_38

The address if reversed because of the endianness.

malware_39

Then, arch uses NtAllocateVirtualMemory to allocate 9593 bytes with Read,Write and Execute permissions. The bowing variable stores the pointer to that memory

malware_40
malware_41

Again policeman (NtWriteVirtualMemory) is called and 5883 bytes from the buffer are written to the newly allocated memory.

Finally arch returns a pointer to the executable memory that now holds the bytes of the decoded string.

malware_42

Below you can see that arch indeed returns a pointer to memory that holds the buffer, and stores it in the variable humbler.

malware_43

A few lines later it calls the function windzors, which takes 3 arguments, one of which is a pointer to a memory inside the buffer at an offset of 0x1090 bytes from the beginning.

malware_44

windzors calls quartertone which is an alias for CreateTimerQueueTimer. MSDN tells us that
CreateTimerQueueTimer - “Creates a timer-queue timer.” and “When the timer expires, the callback function is called.”.

The third argument is a pointer to the callback function and it is the same one which point inside the buffer with decoded bytes.

malware_45

What’s left is to dump 5883 bytes from the beginning of the buffer (the whole buffer). For the purpose I use HxD hex editor, attach it to the word process, locate the memory of the buffer, copy it and save it to a new file, that I called shellcode.bin.

malware_46

So in summary, this stage of the malware decodes, injects and executes shellcode in its own process.

Static analysis (shellcode)


I open the shellcode.bin in IDA and tell IDA to treat address 0x1090 as a function.

malware_47

With its first few instructions, the shellcode locates the base address of the first loaded module (DLL) in the process, which is ntdll.dll. Then it calls find_function (you’ll see why I called it that way) with a 4 byte value as an argument.

malware_48

Before I explain the purpose of find_function, I’ll analyse the functions it uses. The first one is get_pointer_to_PE_signature. It takes eax as argument, which points to the base address of the DLL passed to find_function and returns a pointer to the PE signaturem, which is at constant offset (0x3c bytes) from the beginning of the file.

malware_52

get_pointer_to_PE_signature is called from get_export_table. This functions uses the pointer to the PE signature to find the address of the Export Table.

malware_51

Now you can see find_function below. It iterates through the functions of the DLL, calcules a value (hash) based on their name, and compares it to the 4 byte value that was passed as an argument. If the values match, a pointer to that function is returned.

malware_49
malware_50

On the screenshot below is the hashing function.

malware_53

All functions that are used by the shellcode are hashed and dynamically resolved with find_function.

I wrote a simple python script to decode all the hashes in the shellcode.

# 'DLLstrings.txt' is generated with "strings -a *.dll" 
# from the system directory 
# which is SysWow64 on 64bit system or System32 on 32bit system.

file = open('DLLstrings.txt','r').read().split('\n')

def hash(s):
	eax = 0
	for i in range(len(s)):
		esi = eax
		eax = eax << 7
		eax = 0xffffffff & eax
		esi = esi >> 0x18
		esi = eax | esi
		if (0x80 & s[i]):
			eax = 0xffffff00 | s[i]
		else:
			eax = s[i]
		eax = eax ^ esi
	return eax

input_hash = raw_input("Enter hash value: ").lower()

for function_name in file:
	hashed_name = hex( hash( bytearray(function_name) ) )
	if hashed_name.find(input_hash) != -1:
		print('Success! The function is:\n')
		print(function_name)
		break

Example output:

malware_57

LdrLoadDLL is used to load other libraries.

malware_54

Some of the functions it loads are typical for the process injection technique called process hollowing, which steps are:

1) Start a new and legitimate process in suspended state.
2) Save the context of the remote process with GetThreadContext
3) Unmap the memory of the remote process starting from the base address with UnmapViewOfSection
4) Allocate memory with RWX permission in the remote process, replacing the unmapped memory.
5) Write the malicious code in the remote process at the allocated memory.
6) Set the context to the one that was saved earlier.
7) Resume execution with ResumeThread.

After these steps the code of the legitimate process is replaced with a malicious one, but the context is preserved and it will continue to look like a legitimate process (doing some bad things, though).

malware_55
malware_56

The screenshots below shows that the malware does exactly the steps for process hollowing. I didn’t show it but the shellcode decodes part of it’s memory and loads it in a buffer, that’s going to be injected in a remote process.

The process to be used for injection is…. svchost.exe (surprise, surprise).

The base address of the remote process is 0x400000.

malware_58

The memory to allocate in svchost.exe is SizeOfImage bytes (this value is taken from the PE headers of the buffer, holding the already decoded malicous code, which appears to be a PE executable). The allocation starts from the base address of the remote process.

malware_59

After the PE Headers are written, the shellcode loops through the sections of the malicous code, and writes them at the appropriate addresses in svchost.exe.

malware_60

And finally the now malicous svchost.exe resumes execution.

malware_61

Dumping the memory

To dump the injected code, I have to break right before it executes (before ResumeThread). I use x64dbg for debugging and attach it to the MS Word process. Because I disabled the automatic execution of the VBA script, the malware won’t start until I manually execute the script.

malware_62

Set a breakpoint at SetThreadContext function. It’s unlikely that MS Word uses this function, so I’m sure the only place where a breakpoint will be hit is in the shellcode.

malware_63

Running the VBA macro and immediately the breakpoint is hit.

malware_64

With Process Hacker you can see that svchost.exe is still in a suspended state (it’s highlighted in gray). I also use it to dump the memory region at 0x400000, where the malicious code resides.

malware_65
malware_66

The sections of an executable file are mapped at different offsets from the beginning of the file, depending if it’s loaded in memory or it’s staying on disk. To be able to run the dumped code, I have to unmap it, using the tool pe_unmapper.

malware_67

And now to load it in IDA :)

To my surprize it has very few functions. Maybe there is yet another stage?

malware_68

Static analysis (svchost.exe)


Below you can see where the last call in the start function leads. These instructions look like gibberish. My bet is that this code is encrypted or packed.

malware_69

After I reversed the functions, my suspicion was right. It gets a pointer to its own base address with get_pointer_to_MZ_signature, loads different libraries and functions (similar to the way the shellcode did, but without the use of hashes) and then decrypts the memory to which the last call jumps.

malware_70

The memory is decrypted with 0x59 as key.

malware_70a

Dump decrypted svchost.exe

To dump the fully decrypted binary, I’ll again use a debugger. If you can’t see the screenshots well, open them in a new tab.

malware_71

I set the permissions of the .text section to RWX, so the code can modify (decrypt) itself.

malware_72

There is a check right before the decryption routine that fails and I don’t know why, but I manually bypass it, by changing the value of the Zero Flag.

malware_73
malware_74

When I reach the last call in the start function, the code should be fully decrypted and I can use Process Hacker again to dump the memory.

malware_75

Unmap the file.

malware_76

Aaaaand now it looks better. As you can see there are many functions now.

malware_77

The stages of the malware until now can be summarised in the following steps:

1) The word document decodes a large shellcode
2) Then injects and executes the shellcode in its own process
3) The shellcode decodes a buffer that is a malicious PE executable
4) Injects the malicious code in a remote process (svchost.exe) via process hollowing
5) The code of the new process is almost entirely encrypted, so it decrypts itself.

Static and Dynamic analysis (decrypted svchost.exe)


The call graph looks really big and it’s going to take me a lot of time to reverse the whole binary. That’s why I’ll only analyse parts of it, like those used for networking stuff.

malware_78

Below you can see the imported functions. There are no surprizes here, considering that we already knew that it connects to remote hosts, downloads files and executes them.

malware_81
malware_82

Some strings that I missed in the beginning of the analysis are HTTP Request headers and two format strings.

malware_79
malware_80

The main function is an endless loop.

malware_83

At the beginning of the loop, the first thing this stage of the malware does is to communicate with the C2 servers.

malware_84

This function, collects information such as:

  • OS Version
  • MAC address
  • Volume Serial Number of the C: drive
  • Public IP address (by using api.ipfy.org
  • Hostname and the domain

MAC address and the volume serial number are used to uniquely identify the machine.

The hostname and the domain are retrieved with the WinAPI function LookupAccountSid, which “accepts a security identifier (SID) as input. It retrieves the name of the account for this SID and the name of the first domain on which this SID is found.”. The SID is taken from the explorer.exe process, and to find explorer.exe the malware iterates through the running processes (do you remember the output of API monitor? This is what I thought was process enumeration).

malware_90

Then it decrypts RC4 encrypted string, that holds the malware build version and the list of C2 domains separated by the pipe | symbol.

The malware tries to connect to the first C2 domain and if successful sends the collected information in a HTTP POST request. If the connection fails it tries the next server in the list.

malware_86
malware_85

Because the C2 servers are down (for this build at least) I spoofed the DNS response to point to my machine.

malware_88
malware_89

You can see all the information it sends in the body of the HTTP POST request.

malware_87

It also expects an answer (a command), which I think is encoded, I haven’t reversed that part, because it’s harder when I don’t know how the response should look like.

Anyway, after the command is decoded, it enters a switch statement with several cases. Depending on the command it can:

  • Download a file (in memory) and execute/inject it via process hollowing (again using svchost.exe)
  • Download a DLL (in memory), load it, and call some function from it or start a new thread.
  • Download a file to the %TEMP% directory and execute it.

malware_91

Yara Rule


The encoded shellcode, in the word document, is stored in a tab which is part of a form and starts with 4 spaces. This format is uniqe and for some reason I don’t think it’ll change across versions. The shellcode is encoded as long contiuous string (7000+ characters), which are rare, but embedded in a tab even more. That’s why I think this is a good thing to use to detect this malware. Of course combined with the function names NtWriteVirtualMemory, NtAllocateVirtualMemory and CreateTimerQueueTimer which should be very rare in a legitimate word document.

malware_92

rule trojan_downloader
{
	meta:
		description = "Detects MS Office document with embedded VBA trojan dropper"
		author = "Iliya Dafchev idafchev [4t] mail [dot] bg"
		date = "2017-09-21"

	strings:
		$ole_file_signature = { D0 CF 11 E0 A1 B1 1A E1 }

		$function1 = "CreateTimerQueueTimer"
		$function2 = "NtWriteVirtualMemory"
		$function3 = "NtAllocateVirtualMemory"

		$vba_project = "VBA_PROJECT" wide

		// match the encoded shellcode, inserted in a Tab
		// format: Tab<number> <size[4k-10k]> 0x00 0x80 <four_spaces> <at_least_15_printable_characters>
		$encoded_shellcode = /Tab\d[\x00-\xff][\x0f-\x27]\x00\x80\x20{4}[\x21-\x7e]{15}/

	condition:
		$ole_file_signature at 0 and all of ($function1, $function2, $function3, $vba_project) and $encoded_shellcode in (100000..filesize) and filesize > 100KB and filesize < 1MB
}

malware_93

Snort rule


alert tcp $HOME_NET any -> $EXTERNAL_NET $HTTP_PORTS (msg:"Trojan installed on internal network!"; content:"/ls5/forum.php"; nocase; pcre:"/setedranty.com|attotperat.ru|robtetoftwas.ru/i"; pcre:"/GUID=\d+&BUILD=\d+&INFO=\N+&IP=\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}&TYPE=\d&WIN=\N+/i"; sid:1;)

Indicators of Compromise


The dropper isn’t writing anything to disk (unless instructed by the hackers), so besides hashes there isn’t anything else.