Under the Hood Series: Hunters' Detection Engine
by Yuval Izchakov, Senior Software Engineer and Dvir Sayag, Cyber Research Content Lead
Detection Engine Background
Welcome to the third post in our “Open XDR - Under the Hood” series, which covers a key pillar of the Open XDR: a self-driven detection engine that maximizes the ability to have both out-of-the-box (OOTB) detections and native alerts from different security tools.
Early and accurate detection of threats continues to be a challenge for security teams, hindered by an unfavorable signal-to-noise ratio, false-positive alerts, and lack of visibility -- all contributing to attacks still being missed. Using multiple security domains such as EDR, Cloud, and Identity helps improve detection, but this also requires having in-house expertise in order to facilitate both domain-specific detection and complex, cross-domain attack detection, avoiding detection-silos.
Hunters’ open Extended Detection and Response (XDR) solution applies an additional detection layer that extracts high-noise threat signals and alerts from existing security data and automatically maps them onto a MITRE ATT&CK technique across surfaces. The mapping happens on-the-fly and the Hunters platform enables organizations to know what kind of threat coverage they get for their data sources and which detection capabilities they have for each data source mapped onto specific TTPs.
Detection Types
Before we deep-dive into the detection pipeline, it’s important to understand the different detections presented in the Hunters’ portal.
- Native Detections - The set of detections that comes with the security tool itself. For example, AWS GuardDuty includes its own alerts that indicate potentially malicious activity. The Hunters’ Detection Engine provides these kinds of detections, and moreover, aligns them with easy-to-understand schema, split them into different subcategories and, the logs and telemetry are being used for further investigations.
- OOTB Detections - The set of detections investigated and implemented by Hunters’ research and threat hunting teams, mostly based on the MITRE ATT&CK framework and on the team’s expertise of the various attack surfaces. Each security tool that the customer uses comes with its own set of OOTB detections that the Hunters team adds, which are different from the native detections.
- IOC-Based Detections - The set of detections that the team adds periodically following massive cyber events such as breaches, vulnerabilities, specific malware, and more. This detection scrapes customers’ data and looks for the specific IOCs.
- Custom Detections - Users can also add their own detection rules that join the list of the platforms’ detections. These rules are usually about specific data sources that are key assets for the customer.
'Custom Detections' in the Hunters portal
Detection Engine in Action
To understand what the Hunters’ detection engine is and how it works, let’s take a look at an example of an Attack Story from the Hunters portal.
The breach below was observed both with native CrowdStrike detections and Hunters’ OOTB detections, showcasing a web shell deployment and lateral movement. Each row indicates a single alert created by the detection engine, broken down to who / what / where filters.
An Attack Story in the Hunters portal
In this case, the CrowdStrike native detection raised a threat signal that identified reconnaissance activity and deployment of a web shell. Following that, Hunters’ web shell detection managed to identify execution of rundll32, by w3wp.exe, an activity that is most likely to be an indicator of web shell, lowering the chances of a false-positive alert.
Next, Hunters’ detection flagged an execution of cvtres.exe, by w3wp.exe (uncommon child process), a suspicious activity.
The CrowdStrike native detections and Hunters’ detections are automatically correlated and presented in the portal, enabling our “Attack Story” feature to showcase the full incident. This provides a clear picture of the attack for the analysts and enables them to act accordingly, lowering the chances of missing the attack.
How Does it Work?
In our last “Under the Hood” series blog post, we explained how Hunters’ flexible ingestion provides many OOTB solutions to data ingestion while maintaining a level of flexibility that can be tailored for specific complex use cases. Once data is parsed, cleaned, and semi-structured in the data lake, it’s time to put it to good use, but how do we do that?
When we (Hunters’ Detection Engineering team) initially characterized our detection engine, we had to evaluate what is the best streaming platform for our engine. We set the following requirements:
- Familiar DSL: Express detections in a language that is widespread and commonly used in the analytics domain
- Real-time: Once data is processed in our data lake, it’s important that the platform will make sure that it will be immediately available to our engine to apply computations
- Scalable: We’re going to scan terabytes of data at a time, the platform needs to support scaling
- Support stateful computations: Complex analytical queries will need complex windowing functions (tumbling, sliding, session), which require state management
- Support cross table correlations: Support joins over multiple tables to make sure that we are able to correlate properly
To answer all of these requirements, we looked into several stream processing platforms and analyzed each of their pros and cons. After carefully analyzing our options, we settled on Apache Flink. Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams.
Let’s take a bird’s-eye view of our detection pipeline:
Detection Pipeline
Our detection pipeline’s starting point is normalized data in Snowflake, which we can now build our detections on top of.
When a Flink detection runs, it executes Snowflake queries and sends the query results to Amazon S3 (object storage). We ship data from Snowflake to S3 because now Flink can read the result data in parallel, and stream it into the pipeline as fast as it can.
After data gets loaded into Flink, the magic of detection starts happening. Code written in Flink’s SQL dialect gets compiled into Java code at runtime and executes on the fly.
This code then starts searching, aggregating, filtering, and potentially windowing the incoming event data in search of patterns. Each detection has its own set of operations that are performed on top of the data.
In order for detections to be resilient in case of failure (of any type), we snapshot the detections state and put it yet again in Amazon S3 in case we need to perform any error recovery (checkpoints).
Once the detection engine has found an interesting pattern in the data, it will generate a threat signal we call a ‘lead.’
The lead is the bread and butter for the rest of the data pipeline which will later perform operations on top of it (enrichment, feature extraction, scoring, etc.). Once ready, we ship all leads to downstream systems using Apache Kafka (an open-source distributed event streaming platform) which allows us to connect the leads to other systems in real-time.
Conclusion
A well-engineered detection engine that orchestrates the pipeline properly is a must for a detection-based product to be able to reduce false-positive threat signals and to reduce the response time to a real-time event.
With Hunters XDR, customers get OOTB detections, investigated and implemented by our experts, native security product’s detections, IOC’s based detections, and the ability to make custom detections using the portal.
Read our previous blog post of the Open XDR - Under the Hood Series: Hunters’ Open XDR Approach to Flexible Data Ingestion, and stay tuned for the next one which will cover Hunters’ Graph Correlation.