Introduction
In a world full of threats that target indiscriminately every bit and byte of our society, it is curial to have decent intelligence and respond accordingly. These threats often use specialized tools, named malicious software or malware, to achieve from cybercrime to espionage and destructive purposes. In this cat and mouse game, VirusTotal, which was created in 2004, has become the source of malware intelligence, and it provides myriads of information. By the platform's maturation, it has gained advanced capabilities that the analyst uses to enlighten the knowledge gaps.
The other nice-to-have thing on the defender's side is YARA, a tool primarily used in malware research and detection. It is a rule-based approach to create descriptions of malware families based on textual or binary patterns. Virustotal Hunting module provides to run YARA rules against its huge dataset.
vti-cosplay is a solution to the problem due to the lack of a VirusTotal Enterprise license. First, it parses the YARA rule, maps each atomic entry to VirusTotal API endpoints, and merges individual results. Subsequently, it mimics the YARA scan on the Virustotal platform.
r00tten@vti-cosplay VTI-Cosplay % python3 vti-cosplay.py -h
,(#*
,(#*.
*********(##* ,**********.
.%%#////////*, .,///////(%#,
.%%* *%#,
.%%* *%#,
.%%* *%#/,,,,,,
,(%%/. ,(((((((((.
./#%%%%%%#*
*#%%%%(,
/((((((((*. ,(*.
,,*,*,*#%/. .*(*.
.(%/. ./%/.
.(%/. ./%/.
.(%#///////*. .*/////////#%/.
**////////*. .#%#/////////,.
.##/
.##/
,,.
██╗ ██╗████████╗██╗ ██████╗ ██████╗ ███████╗██████╗ ██╗ █████╗ ██╗ ██╗
██║ ██║╚══██╔══╝██║ ██╔════╝██╔═══██╗██╔════╝██╔══██╗██║ ██╔══██╗╚██╗ ██╔╝
██║ ██║ ██║ ██║ █████╗ ██║ ██║ ██║███████╗██████╔╝██║ ███████║ ╚████╔╝
╚██╗ ██╔╝ ██║ ██║ ╚════╝ ██║ ██║ ██║╚════██║██╔═══╝ ██║ ██╔══██║ ╚██╔╝
╚████╔╝ ██║ ██║ ╚██████╗╚██████╔╝███████║██║ ███████╗██║ ██║ ██║
╚═══╝ ╚═╝ ╚═╝ ╚═════╝ ╚═════╝ ╚══════╝╚═╝ ╚══════╝╚═╝ ╚═╝ ╚═╝
usage: VTI-Cosplay [-h] -y YARA_FILE [-k API_KEY] [-l LIMIT] [-a ACTION]
[--livehunt] [-f] [-v] [-i I_DONT_TRUST_YOU]
optional arguments:
-h, --help show this help message and exit
-y YARA_FILE, --yara-file YARA_FILE
YARA file
-k API_KEY, --api-key API_KEY
Virustotal API key
-l LIMIT, --limit LIMIT
Limit total matched sample count
-a ACTION, --action ACTION
Action module to trigger for matched samples
--livehunt Create scheduled task for the YARA file provided. When
a new sample is out there it prints and stores
-f, --fast Fast scan by reducing the data that is transferred
-v, --verbose Verbose output
-i I_DONT_TRUST_YOU, --i-dont-trust-you I_DONT_TRUST_YOU
At the end, it downloads matched files and does YARA
scan against them
Working Principle
VirusTotal's Content Search(VTGrep) capability provides pattern search in its database. On the other hand, a YARA rule is a combination of the patterns and their conditions. Therefore, a YARA rule can be mapped to a couple of Content Search queries to a certain extent. vti-cosplay interprets a rule and then evaluates different results to combine them with respect to the rule.
According to the YARA documentation, these are the possible keywords that are usable.
all | and | any | ascii | at | base64 | base64wide | condition |
contains | endswith | entrypoint | false | filesize | for | fullword | global |
import | icontains | iendswith | in | include | int16 | int16be | int32 |
int32be | int8 | int8be | istartswith | matches | meta | nocase | not |
of | or | private | rule | startswith | strings | them | true |
uint16 | uint16be | uint32 | uint32be | uint8 | uint8be | wide | xor |
Even though covering all of the possible YARA rules is not be the objective of vti-cosplay ever, VirusTotal API doesn't provide that much opportunity. Most handy ones and the ones necessary during the time of development are the primary concern.
To use less quota, there is an optimization procedure in the pipeline. It tries to combine to reduce the total request count that is necessary. It is possible to join separate queries with binary operators, OR AND. So if the rule has such joints, it optimizes them.
Concerning the limit set by -l, --limit command line parameter, it iteratively makes requests until the result is satisfied.
At the end of the interpretation, the VirusTotal queries are searched. This provides to create hybrid rules, rules that contain plain VT queries in its condition part. In this way hunting process's range can be broadened.
rule Stuxnet_Malware_4
{
meta:
description = "Stuxnet Sample - file 0d8c2bcb575378f6a88d17b5f6ce70e794a264cdc8556c8e812f0b5f9c709198"
author = "Florian Roth"
reference = "Internal Research"
date = "2016-07-09"
hash1 = "0d8c2bcb575378f6a88d17b5f6ce70e794a264cdc8556c8e812f0b5f9c709198"
hash2 = "1635ec04f069ccc8331d01fdf31132a4bc8f6fd3830ac94739df95ee093c555c"
strings:
$x1 = "\\objfre_w2k_x86\\i386\\guava.pdb" ascii
$x2 = "MRxCls.sys" fullword wide
$x3 = "MRXNET.Sys" fullword wide
condition:
"similar-to:0d8c2bcb575378f6a88d17b5f6ce70e794a264cdc8556c8e812f0b5f9c709198"
or
(filesize < 80KB and 1 of them )
or
( all of them )
}
The YARA scan functionality depends on vti-cosplay's interpretation and evaluation. If the working principle doesn't convince the result's legitimacy, then -i, --i-dont-trust-you capability might be helpful. By supplying this parameter, vti-cosplay downloads each sample that it detects and makes an actual YARA scan against them.
vti-cosplay's capability can be extended with action modules; further procedures can apply to the result. The goal can vary from sending a Slack message or adding a VT comment to downloading and running complex algorithms against the samples.
Conclusion
Taking advantage of the vti-cosplay's capabilities, I'd tracked the Emotet and Remcos malware families for months. I've created a pipeline on the AWS that vti-cosplay was scheduled on a Linux machine. On each iteration, IOCs were extracted from hunted samples and added comments on the VirusTotal.
[Remcos malware family uses a png file inside it to continue its attack vector(https://r00tten.com/in-depth-analysis-attack-vector-triggered-by-risk/). I've created Remcos's png unpacking action module and left those details as a comment on VirusTotal.
Emotet's attack vector starts with phishing Microsoft Office documents (this is the range of my YARA rule). Then, some steps later, it executes obfuscated Powershell script to download and trigger the next stage of the vector. Again, with the Emotet parsing action module, I've extracted the Powershell scripts, remote server addresses and left those details as a comment on VirusTotal.
After accomplishing my goal and using it as a base to track two heavily active malware families, I can't say anything but:
Shortage brings innovation.