Endpoint Security (EDR)
Understanding EDR Architecture: From Lightweight Agents to Telemetry Collection
Explore how EDR agents capture system-level telemetry, including process executions and registry changes, to provide full visibility into endpoint activity.
The Evolution of Endpoint Observability
Traditional security tools historically focused on static identification. They checked files against a list of known bad signatures before execution occurred. This approach proved insufficient as adversaries began using legitimate tools and in-memory techniques to bypass disk-based scanning.
Endpoint Detection and Response (EDR) represents a shift from static snapshots to continuous observability. Instead of asking if a file is malicious based on its hash, EDR asks what the system is doing right now. It functions like a flight recorder for a workstation or server, capturing every significant event in real time.
The primary goal of an EDR agent is to provide a comprehensive audit trail of system activity. This telemetry allows security engineers to reconstruct the timeline of an attack long after the initial breach occurred. By capturing process starts, network connections, and configuration changes, the agent provides the context needed for behavioral analysis.
For developers, understanding EDR is about understanding system-level telemetry. It is the process of turning granular OS events into actionable data structures. This visibility ensures that even if an attacker uses a trusted system utility, their anomalous behavior will stand out against the baseline of normal operations.
From Signatures to Behavioral Telemetry
Signature-based detection relies on the attacker making a mistake by reusing code that has already been identified. Modern threats frequently use polymorphic code or fileless techniques that never touch the physical disk. This makes traditional antivirus blind to the most dangerous types of lateral movement and data exfiltration.
Behavioral telemetry focuses on the sequence of actions rather than the identity of the actor. For example, a web browser process should not suddenly spawn a command-line interface and begin scanning the local network. EDR agents detect these deviations by monitoring the relationships between processes and their resource usage.
This shift necessitates a massive increase in data collection. An EDR agent must ingest thousands of events per second without degrading the user experience. Success depends on efficient filtering and the ability to distinguish between developer activity and malicious intent.
The Flight Recorder Mental Model
Think of the EDR agent as a high-fidelity logging system for the entire operating system. While application logs tell you what your code did, EDR telemetry tells you what the OS did on behalf of your code. It captures the interaction between the software layer and the underlying hardware and configuration.
A flight recorder does not just save the crash data; it saves the preceding hours of flight parameters. Similarly, EDR keeps a rolling buffer of telemetry that allows investigators to see the reconnaissance phase of an attack. This historical perspective is vital for identifying the root cause of a security incident.
The Architecture of EDR Telemetry Collection
Capturing telemetry requires deep integration with the operating system kernel. Modern EDR agents typically use a combination of kernel-mode drivers and user-mode services to gather data. This dual-layered approach ensures that the agent can see events as they happen while maintaining the stability of the host system.
In the Windows ecosystem, agents often leverage Event Tracing for Windows (ETW). ETW is a high-performance tracing facility built into the kernel that allows the OS to report events from various components. EDR agents subscribe to these providers to receive updates on process creation, network activity, and file system changes.
On Linux, agents might utilize the Berkeley Packet Filter (eBPF) or the Audit subsystem. eBPF allows the agent to run sandboxed programs within the kernel to monitor syscalls with minimal overhead. This provides a safe and efficient way to observe system behavior without modifying the kernel source code directly.
The efficacy of an EDR solution is not measured by the volume of data it collects, but by the precision of its hooks and the context it attaches to every event.
The agent must act as a transparent intermediary. It should not interfere with the normal execution of developer tools or production workloads. High-quality agents are designed to drop telemetry events under extreme load rather than causing a system hang or a blue screen of death.
Kernel Drivers and Minifilters
File system minifilters are specialized drivers that sit in the I/O stack. They allow the EDR agent to intercept every file open, read, write, and delete operation before it reaches the disk. This is how agents identify a ransomware process encrypting files in real time.
Kernel callbacks are another critical mechanism. The OS can be configured to notify the EDR agent whenever a new process is created or a thread is injected into a remote process. This ensures that the agent is aware of every new execution context on the machine immediately.
Direct kernel integration carries risks. A bug in a kernel-mode driver can crash the entire system, leading to downtime. As a result, many modern security vendors are moving as much logic as possible into user-mode while keeping only the essential data collection hooks in the kernel.
User-Mode Hooks and Performance Considerations
User-mode hooking involves injecting code into running processes to monitor API calls. For example, an agent might hook into the Windows API to see when an application attempts to allocate executable memory. While powerful, this technique can be bypassed by sophisticated malware that calls the kernel directly.
Performance is the most significant constraint for EDR developers. Each hook adds a small amount of latency to system calls. If an agent monitors too many events or performs complex analysis on the host, it can make the system feel sluggish and unresponsive to the end user.
Analyzing Process Executions and Parent-Child Links
Process telemetry is the backbone of endpoint security. Every action on a computer is performed by a process, and tracking the lifecycle of these processes is essential. An EDR agent records the process ID, the user context, the full command line, and the cryptographic hash of the executable.
Understanding the lineage of a process provides the necessary context for detection. For example, seeing a python.exe process is normal for a developer. However, if that process was spawned by a Microsoft Word document, it is highly suspicious and indicates a potential macro-based exploit.
The agent captures the exact command-line arguments used during execution. This reveals the intent behind the process. A PowerShell instance running a standard script is different from one running an encoded command that downloads a payload from a remote server.
1{
2 "event_type": "PROCESS_START",
3 "timestamp": "2024-03-01T14:20:01.450Z",
4 "process_id": 4412,
5 "parent_process_id": 1024,
6 "executable_path": "C:\\Windows\\System32\\cmd.exe",
7 "command_line": "cmd.exe /c powershell.exe -ExecutionPolicy Bypass -File C:\\Temp\\setup.ps1",
8 "user": "CORP\\jdoe",
9 "integrity_level": "High",
10 "hashes": {
11 "sha256": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
12 }
13}The telemetry must also include the 'integrity level' or privilege level of the process. An attacker gaining administrative rights is a critical event. By comparing the privileges of the parent and child processes, the EDR can identify privilege escalation attempts as they happen.
Reconstructing Execution Chains
Security analysts use process trees to visualize the flow of an attack. By linking parent and child process IDs, they can trace an incident back to the original entry point. This might reveal that a single phishing email led to a browser exploit, which then spawned a shell.
Modern EDR tools use a concept called Causality Chains. This links not just processes, but also the files they created and the network connections they opened. This holistic view prevents attackers from hiding their tracks by jumping between different processes.
Developers should be aware that their build scripts often look like malicious activity to a naive EDR. Frequent process spawning, compiler executions, and temporary file creation are common patterns in software development. Fine-tuning EDR involves teaching the system to recognize these legitimate development workflows.
Monitoring Registry Changes and Persistence
The Windows Registry is a frequent target for attackers looking to maintain persistence on a system. By modifying specific keys, malware can ensure that it starts automatically every time the computer reboots. EDR agents monitor these 'Auto-start Extensibility Points' (ASEPs) with high scrutiny.
Registry telemetry includes the specific key being modified, the value being written, and the process performing the modification. This allows the system to detect when a non-standard process attempts to write to sensitive areas like the Run or RunOnce keys. Such behavior is a hallmark of persistent malware.
Monitoring is not limited to just startup keys. Attackers often modify system configurations to disable security features or change network settings. EDR provides a log of these configuration drifts, allowing administrators to revert unauthorized changes and identify the source of the compromise.
1# Simple logic to identify non-standard persistence
2def is_suspicious_registry_change(event):
3 # Define common persistence keys
4 persistence_keys = ["Software\\Microsoft\\Windows\\CurrentVersion\\Run"]
5
6 # Check if the event matches a sensitive key
7 if event['key_path'] in persistence_keys:
8 # Flag if the process is not an authorized installer or system service
9 if event['process_name'] not in ["msiexec.exe", "trusted_updater.exe"]:
10 return True
11 return FalseDetecting Malicious Persistence Mechanisms
Persistence is the difference between a temporary nuisance and a long-term data breach. Attackers use a variety of techniques to stay hidden, from scheduled tasks to WMI event subscriptions. EDR agents must monitor all of these specialized system interfaces to prevent attackers from gaining a permanent foothold.
WMI monitoring is particularly important because it allows attackers to run code without an actual executable file on disk. By creating a permanent WMI event subscription, malware can execute logic in response to system events. EDR captures these subscription creations and alerts on their unusual parameters.
File integrity monitoring (FIM) complements registry tracking. It alerts when critical system binaries or configuration files are altered. By combining file and registry telemetry, EDR creates a high-fidelity picture of the system's state and any unauthorized attempts to change it.
Engineering Trade-offs and Best Practices
Deploying an EDR agent involves balancing security requirements against operational constraints. Every piece of telemetry collected consumes CPU, memory, and network bandwidth. Engineering teams must decide what data is essential and what can be safely ignored to maintain performance.
Filtering at the source is a common optimization strategy. Instead of sending every single file-read event to the cloud, the agent might only report writes to executable directories. This reduction in volume saves on storage costs and speeds up the time it takes to detect an actual threat.
The trade-off between local and remote analysis is another key consideration. Local analysis allows for immediate prevention, such as killing a process before it can encrypt files. However, complex behavioral analysis often requires the massive computing power and global context available only in a cloud environment.
- Data Volume: High-fidelity telemetry can generate gigabytes of data per endpoint daily.
- False Positives: Overly sensitive rules can block legitimate developer tools and disrupt workflows.
- Latency: Kernel hooks and inspection engines can introduce measurable delays in system operations.
- Privacy: Monitoring process arguments and file names can inadvertently capture sensitive user data.
Managing Telemetry Volume and Noise
Noise reduction is a primary focus for EDR administrators. In a typical development environment, tools like Git, compilers, and linters generate thousands of benign events. Creating exclusions for these trusted paths is necessary to prevent alert fatigue among security analysts.
Smart sampling and aggregation can help manage data load. Instead of reporting every network packet, the agent might report a summary of connections to a specific IP address over a period of time. This maintains visibility into the connection without the overhead of per-packet logging.
Ultimately, the goal of EDR is to empower engineers with the data they need to defend their systems. By understanding how these agents work and the telemetry they provide, developers can build more secure applications and respond more effectively when an incident occurs.
