Data is crucial to security. But finding a cost-effective, efficient way to store the data generated by modern security stacks is proving challenging for companies. In search of a technology that scales as data volumes grow and offers a predictable storing model, security professionals are looking into data lakes. 

Two reasons are driving an interest in security data lakes, said Omar Khawaja,  Field CISO at Databricks, who recently spoke at Hunters Con, Hunters’ virtual conference. The first is the influx in security data. From EDRs to authentication tools to logs from cloud service providers, data plays a fundamental role in security operations. But SIEMs – technology created before the big data era – weren’t designed to handle the large volumes of data used by security professionals.


The legacy approach to SIEM that was used to manage the data "when it was 10x smaller, those no longer work nearly as well when data is growing 10, 20, 30 percent,” Khawaja said, adding that “for a typical SOC, it’s not unusual for them to double, triple quadruple [their data volumes] every two or three years.”

The second is changes to infrastructure that SIEMs weren’t designed to support. Data centers have been replaced with multiple cloud platforms and hybrid environments with work split between a cloud platform and an on-premise system. “SIEMs were optimized for [on-premise] environments but as more workloads move out of data centers and to the cloud or multiple clouds or the cloud with something still happening on-premise. You need a solution that’s optimized for this situation,” Khawaja said.

The failed equation of more data equalling greater storage costs

Handling this rise in data requires a storage platform that scales as data volumes increase, he said. Equally important is a pricing model that’s affordable. Charging by the amount of data in an environment will likely prove cost prohibitive. If increasing data collection by 10 percent also requires a 10 percent increase in the cost of running a SOC, “that equation fails miserably,” Khawaja said.

Large organizations have already encountered the limits of traditional SIEMs and transitioned to security data lakes, Khawaja said, noting that of the world’s 15 largest banks, he’s aware that half of them are using a security data lake. One bank that’s using a data lake is HSBC, which according to a Databricks blog, expanded its retention and enabled their threat hunters to perform 3x more hunts while lowering the total cost of ownership by leveraging the Databricks Lakehouse platform.

As data volumes rise, other companies will find themselves in a similar situation with their SIEMs, he added. “Everyone’s headed in the same direction. Ten years ago the amount of data a Fortune 100 company collected, that’s the same amount of data a Fortune 1000 company is likely collecting.”

The limitations of SIEM led ChargePoint, which operates electric vehicle charging stations in the U.S. and Europe, to switch to a data lake. With their previous SIEM, ChargePoint found that they had little control over the storage and retention of their data. When their security data was stored by the SIEM vendor, it was a challenge to take back ownership of the data, often involving high costs and red tape.

“Some organizations have a good data retention or data disposal policy, but most don't and they end up retaining data forever, including customer data,” said Rohan Singla, ChargePoint’s Director of Cybersecurity and Privacy. “This is a big pain point from both a cost perspective and a management perspective.”

Spicing up a vanilla data lake with partners

While data lakes may not offer the capabilities found compared to the technologies they’re replacing, data lake providers partner with other companies to fill these gaps. 

For SIEM replacement, Databricks has partnered with Hunters. Using Partner Connect, Databricks customers can set up a security data lake in just a few clicks using Hunters SOC Platform.

“Hunters are experts on the business processes for cybersecurity, just like there are other partners focused on the business cases for financial services. This hypothetical of leveraging the vanilla data lake to get price, performance, efficiency, quality of scale would have remained hypothetical if it weren’t for the value that organizations like Hunters provides,” Khawaja said.