Incident response process
There’s been a security incident in your organization. What do you do? Panic? No ;)! You follow an incident response process.
In this article, we’re going to talk about this incident response process and the 6 steps that you need to remember for the CompTIA Security+ exam.
In fact, understanding the order of the process is important for the exam, so you need to not only remember the name of each step in the process, but you also need to remember the order of those steps.
Those steps are:
- Preparation
- Identification
- Containment
- Eradication
- Recovery
- Lessons learned
This is based on the guidelines published by NIST which you can view here: https://nvlpubs.nist.gov/nistpubs/specialpublications/nist.sp.800-61r2.pdf
With that said, the terminology of the steps changes a bit depending on the source. The steps we just listed use terminology directly from the CompTIA Security+ exam objectives, so those terms are the ones we’re going to use in this course.
Preparation
The first step in the incident response process is preparation.
We not only want to make sure our organization has an internal process in place to respond to any sort of incident, but we also want to take preventative steps by ensuring that our networks, systems, and applications are properly secured.
Preparing to handle incidents
Let’s start by talking about preparing to handle incidents.
One of the most important aspects of responding to an incident is ensuring proper communication and coordination.
If you don’t have information about who you should call for what, and how you can contact them, then you won’t be able to effectively respond.
At this stage, you want to gather information such as:
- Contact information – which team member within and outside of the organization needs to be involved, and how can they be reached? Do you have multiple emails and phone numbers for each person in case one system of communication becomes unavailable?
- What is the escalation procedure? – depending on the severity of the incident, you would want to know the proper escalation procedure
- Which issue-tracking system should you use? – having a centralized issue tracking system that the incident response team can use is important, and everyone should be on the same page as to what that system is and how to properly use it
- Where and how can you access needed tools? – as an example, which tool(s) are you supposed to use for digital forensics? How can you access those tools, especially if they require a paid license?
- Where and how can you store incident data? – as you collect digital forensics, you will need to have a safe place to store that data, and you’ll also need to understand the policies of how to handle that type of data
These are just some examples of questions you need to answer and resources you need to gather as part of the preparation phase.
The better prepared you are, the more effective you will be at handling incidents.
Preventing incidents
The other aspect of preparation includes preventing incidents in the first place.
This is where everything we’ve learned about defense come into play. We’ll want to:
- Harden hosts
- Implement network security
- Perform risk assessments
- Provide user awareness and training
- Etc…
And this can all be considered to be part of our incident response preparations.
Identification
The next step in the process is identification. Identification is about determining whether you’ve been breached or not.
To do this, we need to answer multiple questions, like:
- Who or what reported this event?
- When did the event happen?
- How was it discovered?
- What is the scope of this incident?
- Does it affect business operations?
Answering these questions can turn out to be quite difficult because we may have a lot of data coming from many different sources such as our host and network-based IDPSs, our SIEM, our log analyzers, our third-party monitoring services, etc… [See page 27 of the NIST guidelines for more examples]. We may get flooded with so many alerts that we don’t know where to look or where to find valuable information.
We may also be dealing with alerts that are precursors, meaning that they’re providing a sign that an incident may occur in the future but hasn’t occurred yet. This is different from an indicator that is a sign an incident may have occurred, or may be occurring at that exact moment.
Whoever is responding to a potential incident has to work with a team to figure all of this out. This requires having a clearly defined process and documentation of each step taken.
As we start to answer these sorts of questions, we may come to realize that the report is not an actual incident, and so it can be de-escalated.
On the flip side, if the incident is real, then we’ll need to start diving deeper by evaluating the attack vectors.
The reason we want to narrow down the attack vectors is because it can completely change how we respond to the incident. For example, is the attack vector an attack executed against our web application? Is it executed via an email attachment? Or was malware delivered via an external USB flash drive?
Our response to the incident will vary greatly depending on the answer, and answering some of these initial questions as well as bringing in the right people to handle the incident will lead to initial analysis of the incident, which is very important to do before jumping into containment. We’ll explain why in a moment.
Containment
Once we’ve run through the identification phase and we’ve determined that there is a legitimate incident that’s either occurred, is actively occurring, or is likely to occur in the near future, we need to move on to the containment phase.
A natural response to an incident is to take immediate action by getting rid of the threat from your devices and networks. After all, you want to prevent them from accessing sensitive data.
The problem is that you can’t really do that, because that reaction could turn out to hurt your incident response more than it could help. As an example, it could erase valuable data that your analysts could use to better understand the attack and it could erase or modify evidence that is required for legal proceedings.
Instead, a more appropriate step to take is to contain the threat so that it can’t continue spreading or doing more damage. Containment buys us more time to figure out what else needs to happen and which steps we need to take, while still protecting the rest of our data and networks.
To properly contain a threat, we have to decide what to do:
- Should we shut down the system entirely?
- Can we just contain the threat within that system by sandboxing it?
- Should we disconnect the threat from the main network and air gap it?
What we do depends on the attack vector, the data we have available regarding the threat, whether the system is providing critical services, etc…
It also depends on legality, which means that the legal team should be involved very early on in the process once an incident has been confirmed. If an attacker has been able to access sensitive data, that immediately exposes the organization to legal liability, and you could take steps that further increase legal liability if you’re not careful.
In addition to containing the threat, we also need to take steps to gather evidence. Once our teams go in and start eradicating the threat, it will inevitably make modifications to evidence which can cause many problems.
We’d want to collect information such as:
- MAC addresses, IP addresses, the affected systems and their identifiers, etc…
- Information about who reported or first found the incident
- Dates and time of when the incident was discovered and any other events related to the incident
- Logs and relevant files
- Etc…
This will not only be useful information for the incident response team, but it may also be required for legal proceedings that come after the incident.
As a quick note, it’s generally not a good ideal to jump straight into containment as soon as you discover an incident without some initial analysis, because it can sometimes cause more damage.
Say, for example, that someone has infected one of your devices or cloud instances with malware. That malware is periodically sending a signal to a C&C server for status reports, and maybe to exfiltrate small amounts of data. This is what initially triggered a response, and so your team wants to immediately jump in and cut off access to the C&C server to prevent any more signals from reaching out. Except the malware is designed to automatically encrypt all volumes attached to the device/instance the moment it loses connectivity to the C&C server. Oops, you just ransomwared yourself.
Eradication
After we’ve contained the threat, we can move on to eradicate it.
Eradication is the step in which we eliminate the components related to the incident, such as:
- Malware installed
- Closing down the vulnerabilities that caused the incident in the first place
- By Patching, implementing new firewall rules, etc…
The eradication phase is concerned with eliminating the root cause of the breach and the artifacts related to that breach.
Recovery
Right after and/or during the eradication phase, we will start the recovery phase.
Recovery is the process of restoring our systems back to being fully functional, but, of course, without still having the vulnerabilities that enabled the breach.
This is when we’ll want to:
- Restore from trusted backups
- Ensure that the backups were not infected or compromised with backdoors
- Patch and update the systems
- Implement specific logging and monitoring to look for additional compromises
- Thoroughly test those systems to make sure the vulnerabilities don’t still exist
- Etc…
Depending on the scope of the incident, the recovery and eradication phases may be performed at the same time, and/or they make take weeks or months to perform.
Lessons Learned
After we’ve successfully eradicated and recovered from the incident, we can move on to the lessons learned phase, also known as the post-incident activity, or the post-mortem.
This is all about learning and improving. Because an incident occurred, that means there was an issue somewhere that allowed the incident to happen. What can we learn from it, and how can we make sure it doesn’t happen again?
If a different incident occurs in the future, how can we better respond that time? What did we do well and what didn’t we do well? Etc…
That’s what the lessons learned phase is all about.
This phase should involve reports, documentation, and meetings with the teams involved in the incident process and anyone else who might benefit. Those meetings should conclude with action items that can be implemented to improve the organization’s readiness and ability to respond to incidents in the future.
Get ready for the exam
As you conclude this article, make sure that you understand what happens during each phase of the incident response process, and make sure you memorize the order in which they take place so that you can be ready for the exam.
Responses