Understanding malware classification

Understanding this isn’t required for the CompTIA Security+ exam, meaning that we’re going beyond the scope of this exam for a few minutes, but I wanted to include this explanation in my course because I believe that it is helpful and practical knowledge that you can apply to your personal life but also on the job. So if you only want to learn the bare minimum to pass the exam, feel free to skip this article. But if you’re interested, let’s dive in.

Let’s talk about understanding malware classification.

Whenever an anti-virus or anti-malware tool detects something potentially malicious, it uses a specific type of format to help you or other professionals know exactly what kind of threat you are dealing with. The format might look a little bit like this:

Type:Platform/Family.Variant!Suffixes

It won’t always look exactly like this, because as you’ll remember, not every company or tool uses the same standards, but in general, they all look pretty similar.

This format is referred to as the Computer Antivirus Research Organization (CARO) malware naming scheme, and it’s been widely adopted, even if with slight variations.

For example, here are some real examples:

HEUR:Backdoor.MSIL.Broide.gen

Trojan.MSIL.Broide.m!c
Code language: CSS (css)

Both of those are actually the same malware that was detected in a scan but recognized by different tools. As you can see, it’s talking about the same malware, it’s just labeling it a tad differently.

The first label is being more specific by saying that it’s a Backdoor Trojan, while the second just says Trojan.

Type

The type of malware — which is the first classification you’ll see in results, helps provide an initial overview of what threat you are dealing with. Is it a type of trojan? Is it a virus? Is it a worm or backdoor or ransomware?

This label helps define that.

Platform

Next, we have the platform classification. This is used to describe the operating system that this threat is designed to work on, like specific to Windows, Linux, Android, or MacOS versions.

It can also be used to describe programming languages, macros, or file types.

In our prior example, MSIL stands for Microsoft Intermediate Language.

Family

The family classification tends to be all over the place, but it’s used to try and differentiate between malware that shares a common code base.

For example, if you’ve ever heard the terms “WannaCry,” “Stuxnet,” or even “CryptoLocker,” these names do not represent types of malware — they are family names.

Stuxnet is considered a worm because of what it was designed to do. It’s not the first worm to ever have existed, but it was developed in such a way that made it unique. When it was identified and researched after having been used in the wild, researchers decided to give it a family name of Stuxnet.

Variant

We also have variants. Following along with our Stuxnet example, if someone were to grab the original implementation of Stuxnet and make a few modifications, meaning that it essentially shared the same codebase but with a few differences, then that would become a variant of Stuxnet.

Without getting even further into the weeds, family and variant information can be different when talking about detection versus identification of malware…

For example, detection software will typically use incremental letters and numbers or even hashes to set the variant, while malware identifiers may use actual names.

Malware named Stars, Flame, and Nitro Zeus are examples of what’s thought to be variants of Stuxnet. But if software were to detect those variants, it would probably use incremental letters, numbers, or hashes to uniquely describe them with more specificity.

Suffixes

Finally, we have suffixes. Suffixes are used to provide additional details about a specific threat. For example, suffixes can be used to denote how a specific malware threat is packaged. Is it compressed? If so, it may have a suffix of !lnk if it uses that file format, which is a Windows extension that can be used to execute Powershell scripts, and those Powershell scripts could be used to download the malware itself while evading detection.

Or, as another example, a suffix might be .DLL, which represents malware that uses DLLs. DLL stands for Dynamic Link Library, which is just a simple file extension that can be used by applications in Windows to enhance functionality. If you were to browse through applications that you use on a daily basis, you would likely see a large number of DLL files stored on your computer.

Malware can use the DLL format to get transported and then executed, and in those cases, detection software may add the .DLL suffix to represent that.

Conclusion

Keep in mind that different vendors will sometimes change how they use or display this type of information. Unfortunately, there doesn’t seem to be any updated convention that everyone uses and sticks to, and it often feels like a pretty big mess. It certainly doesn’t help the media, which then amplifies the problem by commonly misusing terms or classifications, which then confuses everybody else.

All that to say — it’s entirely possible that you’ll come across different formats over time. With that said, hopefully, this gave you a general sense of how to read and understand this type of information, especially as we move along and discuss malware in more detail.

While you don’t have to understand this for the exam, I thought it would be a good addition to the course because it is very useful knowledge to have.

Reference Material

More on malware naming: https://docs.microsoft.com/en-us/windows/security/threat-protection/intelligence/malware-naming
Malware family, variants, and signatures examples: https://analyze.intezer.com/analyses/e601743b-7d74-4b90-9914-74a1b1de088b
Example malware detection used in this article: https://www.virustotal.com/gui/file/c58c8305284b7002bc4edfa8e311ee59cad74ee61aae3011e0420379409abfa6

Understanding malware classification

Understanding malware classification