As the technology landscape grows and evolves, organizations constantly seek to better identify emerging vulnerabilities and enable scalable, predictable threat detection and response. The cybersecurity industry meets this demand with hundreds of tools, services, and data feeds designed to alert on possible risks, leading to an avalanche of notifications that often require human analysis and manual action.
Processing the never-ending queue of alerts dominates the days of many cybersecurity professionals, whether in a SIEM, ticketing system, chat channel, or email inbox. However, teams who invest in making this process more efficient, effective, and rewarding see enormous benefits in risk reduction and wellbeing.
Alert fatigue occurs when security professionals become overwhelmed by the volume and repetitive nature of the alert queue, losing the ability to distinguish alerts that represent actual issues (true positives) and everything else. Anyone with an email inbox full of unread spam or unread emergency alert text messages knows the feeling. Unfortunately, security tool vendors’ incentives can nudge them to be loud and wrong rather than silent and right.
In cybersecurity, there’s a tension between not wanting to miss real threats and not wanting to create noise. To avoid false negatives, tools and services can bias towards false positives by firing an alert when no issue is present. No one wants to see an expensive security investment producing nothing, so when in doubt, the safe bet is to fire and ask forgiveness. Unfortunately, this frequently pushes the burden to the security teams to keep the signal-to-noise ratio healthy. Security detection tools, threat intelligence feeds, and notification services also benefit from our bias to ignore incorrect predictions and celebrate correct ones.
We’re quick to forgive and explain when they’re wrong, but give them credit when they’re right, especially if we’ve invested a lot of time and money into implementing the solution.
Alert fatigue matters because many security tools still require human intervention to produce an impact. Many security tools can still interfere with legitimate processes; many default to notifying the security or IT team for the last mile of intervention. It’s safer to take credit for finding dangerous things and avoid the blame for breaking things in production.
Fatigue makes it more likely the last crucial step won’t occur: the system won’t be quarantined, the connection won’t be blocked, the fraudulent transaction will go through, and then all the previous investment will be for nothing. Advances in artificial intelligence and automation have improved the situation somewhat. However, organizations still like to have someone they trust (and can hold accountable) when making essential technical decisions.
Unsurprisingly, alert fatigue is also a key factor in team turnover: it’s difficult to retain talented professionals who feel engaged and impactful when they face an unrelenting deluge of noise.
Alert fatigue is not insurmountable. Business and security leaders can leverage the efforts of others solving similar challenges and apply a few principles to stem the tide:
Ignorance is not bliss. Raising the profile of alert fatigue to business and security stakeholders:
Illustrate the impact to risk: fatigue means missed alerts; missed alerts mean more incidents
Illustrate the cost of turnover: fatigue leads to burnout and exhaustion, which means higher costs for recruiting and retention
Illustrate the similarity to other processes: producing quality alerts and reducing noise is akin to quality control on the factory line
Understand the causes of alert fatigue in quantitative terms by measuring false positives. For example, track how many alerts were not associated with real incidents and your costs in time, effort, money, storage, etc., related to pursuing benign activity.
Costs include staff hours, business loss, and direct expenditures spent responding to these alerts, including initial triage. Consider these costs in the context of the security tools and services that produced them to understand the true cost of ownership. This also lets you measure cost reduction for process improvements, filtering, and tuning.
It’s also important to measure the direct volume, often calculated as events per analyst hour (EPAH), to see if your team is overwhelmed. These metrics help illustrate the impact of changes to staffing, process, security content, and automation.
Often reductively called tuning, tackling alert fatigue requires a relentless focus on creating and refining high-quality security content: signatures, correlation rules, and behavioral detections. Relying on the rules that come out of the box leaves security teams at the mercy of the vendor’s incentives. Finding the courage to disable rules that are unlikely to produce actionable signals has benefits: teams that embrace Marie Kondo’s “life-changing magic of tidying up” in their alert queues don’t regret it.
One approach to customized content is to build on MITRE’s outstanding ATT&CK taxonomy of adversary behaviors. Rather than trying to cover every technique, build a manageable list by removing techniques that don’t apply to your platforms and prioritizing based on risk. Ask which techniques are most likely, most dangerous, or both? ATT&CK itself contains data on which behaviors are used by the most malware and threat groups. (Built from ATT&CK v9.0 data available at https://github.com/mitre/cti/blob/master/enterprise-attack/enterprise-attack.json)
*need to add table
Building up from common, high-risk, high-impact behavioral detections helps avoid having to tune down from thousands of hyper-specific rules, signatures, and indicators (and their attendant false positives).
The best-intentioned vendors still respond to incentives. Bring alert fatigue and false positive rates into discussions with vendors at the time of procurement or subscription renewal. Voting with your wallet is essential to long-term change. Coalition is a technology company, and we're well aware of the pain points of alert fatigue. We've worked to streamline our alerting and set our policyholders up for success.
At Coalition, we are proactive about helping security teams manage their attack surface and IT infrastructure. Every alert sent by our Coalition Control platform helps reduce the risk of an incident at one of our policyholders, which benefits everyone involved. Because we monitor public-facing corporate infrastructure for changes and new vulnerabilities, our alerts come at a frequency and cadence that is both useful and manageable by security operators. It’s all a part of our mission to keep organizations aware of their risks and safe from threats. Start monitoring your own organization’s attack surface with a free account here.