One of the more depressing pieces of information from Verizon's 2014 Data Breach Investigations Report is the fact that, over the past five years, the time difference between when a data breach occurs and when it is discovered has been on the rise. Yes, that's right: despite investing in countless security tools to detect security threats, we're actually getting worse at the job. This is largely because we lack data mining capabilities that can address the unique information security requirements.
For information security practitioners, the message from regulatory bodies, best practices standards, and business stakeholders has been the same for a long time: capture as much data as you can. Of course, capturing data and using data mining techniques to get actual value out of that data are two different things entirely.
SIEM Tools
Historically, the "collect data" mandate has been interpreted as deploying security information and event management (SIEM) tools, or at least a centralized log collection. While SIEM and log management can be great tools that add value to security operations, they do have some limitations.
For one, they're focused on event data only. While SIEM vendors may argue that they collect everything as opposed to just events, the reality is that these tools still look at everything like an event and try to cram unstructured data into event taxonomies. A second problem with SIEM is that it's still heavily focused on detecting abnormalities based on defined events; you still need to know what it is you're looking for before you can establish an alert on it.
BI Tools
Another approach to solving the security problem—one that has been used far less than SIEM—is business intelligence (BI) tools. For years, BI solutions ruled the roost for gaining intelligence out of large data sets. Typically, these were deployed not for information security purposes, but for business operations; finance, supply chain analysis, and even human resources have long relied on BI tools to provide insight into the "biggest picture" of their business.
Unfortunately, there are several drawbacks that prevent BI solutions from being deployed heavily for data mining within information security. One problem is simply the fact that, historically, BI solutions used their own programming and analytics languages.
Today, these solutions have largely morphed into big data analytics platforms, and this has several security implications. First, it means that we can collect any type of security data, even unstructured data that can be populated into repositories and searched without the need for heavy indexing or other functions that require constraining data types. Another advantage of big data solutions is that many analytics platforms have removed the need for dedicated BI personnel; simple queries can be created, often with zero programming, and can return results incredibly fast. Big data represents the current direction of security analytics and data mining, and I don't think we've yet scratched the surface of the benefits it can provide to security operations.
Necessary Solution Qualities
While big data platforms can be used for improved information security data mining, there are several properties unique to the security discipline. Any solution—whether SIEM, BI, big data, or something else down the road—needs to be able to take these special qualifiers into account:
- Speed, Speed and Speed. Unlike a traditional business case—say, identifying long-term buying patterns among specific demographics—most information security use cases that need data now if they're going to be useful. For example, detecting an APT or other directed attack requires correlation and identification of unusual activity in near real-time if the attack is going to be stopped. Otherwise, while the data is still useful after the fact, it becomes a forensic effort rather than a preventative one. Similarly, data mining to support other use cases, such as fraud detection, requires information to be as fresh and as close to real-time as possible.
- Data from Everywhere. Really advanced security functions, such as discovering new attacks, require visibility into not only multiple types of assets—perimeter network control devices (e.g., firewalls), perimeter security data (e.g., IDS/IPS), servers, databases, and applications—but multiple levels of data as well. Events, system state changes such as altered configurations, network traffic, and more can all be required to get value out of data mining for information security.
- Discovering and Alerting on Abnormalities, not just Signatures. This has been the holy grail of security monitoring for some time, and it's a problem that still hasn't been fully solved. Signature-based detection is a useful part of security in depth, but it's not a solution for discovery of real-time threats, zero-day activities, and carefully planned and orchestrated attacks that utilize layers of technology infrastructure. Without this capability, all the speed in the world won't help us get past the square we've been stuck at for quite a few years now.
While there's still no silver bullet for information security—and of course, there never really will be—we're getting to the point where data mining technologies can help us find new security problems we didn't know we had, and reduce the time it takes us to detect them. We'll see where the convergence of security and big data, or the next evolution of data intelligence after that, can take us.