January 27, 2026
10
min read

The Federated Truth: Why Data Lakes Are Failing Investigations

The Federated Truth This article argues that traditional security architectures based on data centralization (Data Lakes and SIEMs) are failing to meet the needs of modern investigations due to prohibitive storage costs, data ingestion lags, and incomplete visibility. The author identifies a "SecOps Last Mile" problem, where analysts lose critical time switching between disconnected consoles to access data that was never ingested into the central repository. The proposed solution is a Federated Data Model, such as Command Zero, which queries data directly where it resides (EDR, Identity Providers, etc.) via APIs rather than moving it. This approach eliminates ingestion delays, provides access to 100% of real-time data, and reduces infrastructure costs. By leveraging AI to normalize these distributed queries, the federated model allows analysts to investigate threats in seconds rather than hours, shifting the focus from data management to rapid threat resolution.

Eric Hulse
Director of Security Research
In this article

During my years defending networks across military, government, and commercial environments, I’ve watched security teams invest millions in centralizing their data, struggling with SecOps bottleneck. The promise was always the same: “Get all your data in one place, and investigations become simple.” The reality? By the time your data makes it into that lake, the breach is already three days old, and you’re waiting for an analyst to understand what was ingested, assuming the data was ingested at all.

Here’s what actually happens: Most organizations only ingest alerts and some change logs into their SIEM to minimize licensing/storage cost. The full stateful data, the detailed context that makes SecOps investigations possible, never makes it in. There’s no parser for it, no budget to build one, and no storage allocation to keep it. So when a SOC analyst needs to investigate, they don’t have the data they really need. They have breadcrumbs pointing to where data might exist, scattered across a dozen product consoles.

I watched this play out recently with a senior analyst at a large enterprise. He was investigating an internal incident and started in the SIEM, found an alert, saw some correlated events, but needed more context. What followed was the security operations version of credential hell:  

  • log into the EDR console to check endpoint telemetry
  • log into the identity provider to verify authentication patterns  
  • log into the proxy to see network connections.  

Forty minutes to find the queries he needed and create the ones that didn’t exist. Forty minutes of context switching between tools while trying to maintain his mental model of the investigation. All this mess, if only the analyst has access to and knowledge of these separate tools.  

Then we ran the same investigation using Command Zero. He entered the username, described what he needed to know, and the platform orchestrated queries across all those same systems simultaneously. When the results came back, he looked at me and said something I’ll never forget: “It took me longer to grab my MFA tokens for those three other products and find my password in the password manager than your platform took to run the entire investigation.”

That’s not a tool problem. That’s not an analyst skill gap. That’s a fundamental architectural mismatch between how we’ve built our security infrastructure and the operational realities of 2026 threat landscapes.

The Mathematical Reality of Centralization

The data centralization approach made sense fifteen years ago when most organizations were generating gigabytes per day. But the math has changed dramatically. A mid-sized enterprise now generates terabytes of security-relevant data daily. Endpoint telemetry, cloud service logs, network flows, identity events, email metadata, the volume compounds with every SaaS application you adopt and every remote workforce expansion.

To reduce alert fatigue and manage security alert volume has been to build bigger data lakes and more powerful SIEMs. But this creates three compounding problems that mathematics, not marketing, determines.

  1. Most security-relevant data never gets ingested at all. Organizations make triage decisions based on cost and complexity: “We’ll ingest alerts from the EDR, but not the raw telemetry. We’ll get authentication successes and failures from the IdP, but not the full session context. We’ll capture email gateway blocks, but not the metadata that shows communication patterns.” The result is that when analysts need to investigate, the data they need is still sitting in the source system, inaccessible without logging into another console and learning another query language.
  1. Ingestion lag creates investigation blind spots for the data that does make it in. Moving data from where it’s generated into a central repository takes time. For high-velocity data sources like EDR or network flow logs, you’re looking at ingestion windows measured in hours, not minutes. During my time operating SOCs, I’ve watched analysts try to investigate active threats while staring at data that’s three to six hours old. The gap between “what happened” and “what we can see” becomes a tactical disadvantage that threat actors exploit relentlessly.
  1. Storage costs scale non-linearly with data volume. It’s not just the infrastructure, though that’s substantial. It’s the licensing models that charge per GB ingested, the increased compute needed to query larger datasets, and the retention requirements that force you to keep years of parsed, normalized data. What starts as a $200,000 annual SIEM license becomes a $2 million enterprise-wide data lake project, and you’re still only capturing 60% of the data sources you need for thorough investigations.

The dirty secret of data centralization is that it optimizes for vendor revenue, not investigation outcomes. You pay to ingest data, pay to store it, pay to query it. And still end up logging into product consoles when you need answers the SIEM can’t provide.

The SecOps Last Mile Problem

Even when centralization works technically, it fails operationally at what I call the “SecOps last mile,” the final gap between having data and getting answers. This is the bottleneck that kills investigation velocity.

You’ve spent millions centralizing your data. Your SIEM ingests terabytes daily. Your data lake contains three years of parsed events. But when an analyst needs to answer “Did this user’s credentials get used anywhere else in the environment?” They’re still writing custom queries, waiting for search results, manually correlating across multiple data models, and then logging into three other consoles to get the context that was never ingested in the first place.

The promise was that centralization would make investigations fast. The reality is that centralization just moved the bottleneck from data access to a combination of data interpretation, console sprawl, and authentication gymnastics. That analyst I mentioned earlier? His 40 minutes of query crafting and console hopping is the norm, not the exception. And that’s for a senior analyst who knows the tools. Junior analysts spend even longer, or give up and escalate.

The fundamental issue is that centralization optimizes for data storage, not investigation workflow. Your data lake answers the question “Where is all our data?” but not “What does this investigation need right now?” And it certainly doesn’t answer “How do I get to the 40% of relevant data that was never ingested?”

I’ve watched skilled analysts spend an hour piecing together investigation context from four different systems, each with its own authentication flow, query language, and data model. The investigation stalls not because the analyst lacks skill, but because the architecture creates friction at every step. Open a new console. Find your password. Get your MFA token. Remember how their query syntax works. Wait for results. Context switch to the next tool. Repeat.

The Federated Data Model: Query Where Data Lives

The architectural shift that’s transforming investigation velocity is deceptively simple: stop moving data around, and start querying it where it already exists. This is federated (direct-to-data) search, and it represents a fundamental rethinking of security operations architecture.

Instead of building pipelines to ingest, parse, normalize, and store data from every security tool ,an effort that inevitably fails to capture everything you need, you build intelligent query orchestration that can ask each tool directly for what you need, when you need it. Your EDR has an API. Your identity provider has an API. Your email gateway has an API. Command Zero’s architecture recognizes that the fastest path to investigation answers isn’t through a centralized data warehouse. It’s through direct, real-time queries to the systems that already have the data in native format.

What we find in practice is that federated search eliminates the three mathematical problems that kill centralized approaches. First, all your data becomes accessible, not just the subset you could afford to ingest. When an analyst needs endpoint process execution details, Command Zero queries the EDR directly, regardless of whether that telemetry was ever sent to the SIEM. When they need email communication patterns, the platform queries the email gateway, even if only block events were centralized. Every security tool becomes part of the investigation surface.

Second, there’s no ingestion lag. You’re querying current data, not data that was current six hours ago. When an analyst searches for authentication events, they’re seeing what happened in the last five minutes, not the last five hours. The investigation shows what’s happening now, not what was happening when the last ingestion batch ran.

Third, storage costs become largely irrelevant. You’re not storing duplicate copies of data that already exists in your security tools. The EDR vendor is already storing endpoint telemetry. The identity provider is already maintaining authentication logs. You don’t need to pay to store that data twice. Your infrastructure costs shift from “How do we store all this data?” to “How do we efficiently query it?”

And critically, credential hell vanishes. Instead of an analyst logging into six different consoles with six different authentication flows, Command Zero’s integrations handle that orchestration in the background. The analyst describes their investigation question once, and the platform coordinates queries across every relevant system simultaneously. That senior analyst’s comment about MFA tokens and password managers? That friction disappears entirely.

How AI Enables Federated Architecture

The missing piece that makes federated search practical for security operations is AI data normalization. Traditional federation approaches failed because they required analysts to understand every tool’s query language and data model. You’d need to know how to query the EDR API, then separately query the identity provider API, then manually correlate the results. This distributed complexity across every analyst instead of centralizing it in infrastructure.

Modern AI changes this equation completely.  Command Zero utilizes AI SOC agents and autonomous cyber investigations to understand the investigation question in natural language, determine which data sources need to be queried, translate that question into each tool’s native API call, retrieve the results, and normalize everything into a unified investigation context, all in seconds. The analyst asks “Show me everywhere this credential was used in the last 48 hours,” and the system orchestrates queries across EDR, IAM, proxy, and email systems simultaneously.

This is where the “cyber security federate data access” model becomes operationally superior to centralization. Instead of waiting hours for data to be ingested and parsed, or discovering the data was never ingested at all, you are getting real-time results directly from authoritative sources. The AI normalization happens at query time, not ingestion time, which means you’re always working with current data in its most accurate form.

The practical advantage shows up in investigation timelines. In customer engagements, we’ve seen federated search reduce time-to-answer from hours to minutes for questions that span multiple data sources. That analyst’s 40-minute investigation becomes a 90-second investigation. And unlike his manual process, the federated approach doesn’t require him to remember six different query syntaxes or manage six different authentication sessions.

Architectural Focus: APIs as Investigation Infrastructure

The technical foundation of federated search is treating security tools’ APIs as investigation infrastructure rather than integration endpoints. This requires a shift in how security operations teams think about their tool ecosystem.

Traditional architecture views each security tool as a data source that needs to be connected to a central platform. The focus is on building pipelines: EDR → Parser → SIEM, IAM → Parser → Data Lake. You’re constantly managing data movement and transformation and inevitably making decisions about what data to exclude because ingestion is too expensive or complex. Command Zero’s architecture views each security tool as a query endpoint that can answer specific investigation questions directly. The focus shifts to query orchestration: What questions can this tool answer? How do we ask it efficiently? How do we combine results with other tools?

This architectural difference cascades through everything about how investigations work. Instead of pre-ingesting data “just in case” you need it and then discovering you didn’t ingest the right data after all, you query data when you actually need it. Instead of analysts learning six different authentication flows and query languages, they describe their investigation needs in natural language and let the system handle the technical execution.

The practical implementation means that when Command Zero orchestrates an investigation, it’s making direct API calls to your existing security tools. When you need authentication data, it queries your identity provider’s API. When you need endpoint telemetry, it queries your EDR’s API. When you need email metadata, it queries your email gateway’s API. All of this happens in parallel, and the results come back normalized and correlated without any data ever being duplicated, stored centrally, or excluded from investigation scope due to ingestion limitations.

Breaking the SecOps Bottleneck

The operational transformation that federated architecture enables is the elimination of security bottlenecks at the investigation layer. In centralized architectures, bottlenecks occur at data ingestion (what can we afford to bring in?), parsing (can we build the connector?), storage (how long can we keep it?), and querying (how fast can we search it?). Federated search collapses all of these bottlenecks into a single query orchestration layer that happens in real-time, and includes all your data, not just the fraction that made it into the lake.

What this means for tier-2 and tier-3 analysts is investigation velocity that matches threat actor speed. Instead of spending hours gathering data before they can start analyzing: 40 minutes finding queries, another 20 minutes logging into consoles, another 30 minutes waiting for results. They get comprehensive investigation results in seconds. The cognitive load shifts from “How do I access the data I need?” to “What does this data tell me about the threat?”

During customer implementations, we consistently see this velocity improvement manifest in three ways. First, investigation time-to-completion drops by 60-70% because data gathering, which traditionally consumes most of investigation time, becomes near-instantaneous. Second, investigation accuracy improves because analysts are working with current data from all relevant sources rather than historical snapshots of the fraction that was ingested. Third, investigation thoroughness increases because the friction cost of querying additional data sources drops to near zero. An analyst can follow an investigation thread into a system that was never connected to the SIEM without breaking stride.

The last-mile problem—the gap between having data and getting answers—disappears when you’re not fighting with data access friction, authentication flows, or query language variations. Analysts can follow investigation threads wherever they lead because querying a new data source is as simple as describing what they need to know. The security bottlenecks that traditionally slow down investigations become architectural non-issues.

The Economics of Federation vs. Centralization

The cost equation for federated search is fundamentally different from centralized data lakes. In traditional architecture, your costs scale with data volume: more endpoints generating more logs means higher ingestion costs, storage costs, and licensing costs. You’re essentially paying for the privilege of making duplicate copies of a subset of data that already exists in your security tools, and then paying again when you need to access the data that didn’t make the cut.

In federated architecture, your costs scale with investigation activity, not data volume. You’re not paying to ingest and store terabytes of data you might never query. You’re paying for the orchestration intelligence that can query the data you actually need when you need it: all the data, not just what fit in the ingestion budget. The economic model shifts from “infrastructure to store everything” to “intelligence to find anything.”

This creates a different scaling curve. Adding new data sources in centralized architecture requires parser development, increased storage, and higher licensing costs, which is why so many sources never get connected. Adding new data sources in federated architecture requires API integration, which is typically hours of engineering work rather than weeks. The marginal cost of expanding your investigation scope drops dramatically, which means you can actually achieve the comprehensive visibility that centralization promised but never delivered.

For security operations teams under budget pressure, which is essentially all of them in 2026,his economic shift is transformative. You can expand your investigation capabilities without exponentially expanding your infrastructure costs. You can investigate across data sources you could never afford to ingest into a SIEM. The constraint stops being “Can we afford to centralize this data?” and becomes “Does this tool have an API we can query? And the answer is almost always yes.

Operational Realities and Cultural Shifts

The transition from centralization to federation requires acknowledging some hard truths about how security operations actually work. The centralization dream was built on the assumption that if you could just get all your data in one place, everything would become simple. What we’ve learned through decades of SIEM implementations is that centralization just moves complexity around. It doesn’t eliminate it. And in practice, centralization doesn’t even get all your data in one place. It gets some of your data in one place, and leaves the rest scattered across the tools where it originated.

Federated search acknowledges the reality that your data already exists in multiple authoritative systems, and that’s actually fine. The goal isn’t to consolidate everything into a single pane of glass. It’s to make querying across multiple systems as effortless as querying one. This is the practical middle ground between centralization’s promise and its operational reality.

I’ve worked with security teams who resist this shift because it challenges the mental model they’ve operated under for years. “If we’re not centralizing data, how do we ensure we have visibility?” The answer is that federated search provides better visibility because you’re seeing current data from all systems rather than historical copies of the fraction you could afford to ingest. You’re seeing what’s happening now across your entire security ecosystem, not what happened six hours ago in the tools you connected to your SIEM.

The cultural shift is moving from “We need to own all the data” to “We need to query all the data.” This requires trust in API reliability, which modern security tools increasingly provide. It requires confidence in AI normalization, which has matured significantly in the last two years. Most importantly, it requires accepting that the fastest path to investigation answers isn’t through the data warehouse you built. It’s through the systems that already have the answers, waiting to be asked.

The Future of Investigation Architecture

As we continue to refine this federated approach, the strategic implications become clearer. Security operations architecture is evolving from platforms that store subsets of data to platforms that orchestrate investigation intelligence across all data. The value isn’t in the data you’ve centralized. It’s in the questions you can answer, regardless of where the data lives.

This shift parallels broader trends in how organizations handle data generally. The data warehouse approach is giving way to data mesh architectures that recognize different systems as authoritative sources for different domains. Security operations is following this pattern, and federated search is the operational manifestation of this architectural evolution.

For security teams planning their infrastructure for 2026 and beyond, the question isn’t “How do we build a bigger data lake?” It’s “How do we query across all the data sources we already have?” The teams that embrace federated architecture gain investigation velocity advantages that compound over time. Every hour saved on data gathering is an hour gained for actual analysis. Every bottleneck eliminated is an investigation that completes faster. Every authentication flow removed is friction that no longer slows your analysts down.

The federated truth is that your data doesn’t need to be in one place to be useful. It just needs to be query-able when you need it. Command Zero’s architecture proves this approach works at enterprise scale, transforming how security operations teams handle the SecOps last mile from investigation question to investigation answer. The future of security investigations isn’t in moving data faster or ingesting more of it. It’s in querying all of it where it already lives, without making your analysts become authentication specialists and query language polyglots in the process.

Eric Hulse
Director of Security Research

Continue reading

AI
Highlight

The Black Box SOC AI Agent Problem (And How to Fix It)

Security Operations Centers face a difficult paradox where AI agents offer necessary speed but create unacceptable liability due to their "black box" nature. CISOs remain hesitant to deploy these autonomous systems because they cannot explain the reasoning behind actions like blocking users or terminating processes, which leads to compliance failures and a lack of trust. Traditional AI models prioritize prediction over the transparency required for complex, iterative cyber investigations. Command Zero addresses this critical gap by introducing a "glass box" architecture designed for verified autonomy rather than blind trust. This approach transforms the investigation process into a visible, auditable "stack trace" where every query, source, and decision is exposed to the analyst. Beyond simple transparency, the system ensures pivotability, allowing human analysts to seamlessly take over and inject expertise into autonomous workflows without losing baseline data. By combining this visibility with the ability to customize investigation logic for specific environments, Command Zero allows organizations to safely leverage the speed of AI automation while maintaining the rigorous oversight and explainability essential for modern security operations.
Eric Hulse
Jan 23, 2026
8
min read
AI
Highlight

Beyond the Bouncer: Why the Autonomous SOC Must Complete Complex Investigations

Most AI SOC tools function like nightclub bouncers—checking credentials and filtering alerts rather than conducting genuine investigations. This "Bouncer Fallacy" creates quieter SOCs but not necessarily secure ones. Command Zero argues that effective AI SOC platforms must go beyond simple alert triage to automate the full investigative process. Their approach treats AI as a detective, not a filter: when alerts fire, autonomous agents execute complete investigations across federated data sources, following lateral movement, analyzing privilege escalation, and collecting evidence. By the time analysts review cases, they receive fully-mapped "crime scenes" with proposed verdicts and supporting evidence. Command Zero's "Glass Box" architecture provides explainability through visible investigation paths and Chain of Thought reasoning, building trust and enabling continuous learning. This transforms SOC analysts from alert processors into strategic decision-makers, automating 90% of routine work while drastically reducing MTTR.
James Therrien
Jan 20, 2026
3
min read
AI
Highlight

The AI SOC Paradox: Why Organizational Architecture Matters More Than Algorithm Performance

The barrier to AI-powered security operations isn't model sophistication—it's fragmented architectures across 83+ security tools that create impossible environments for autonomous agents to navigate. Command Zero addresses this through structured, question-based investigations and autonomous agents operating within a federated data model. Unlike pure autonomous approaches requiring data centralization and exhibiting unpredictable behavior, our platform provides governance-by-design, accessing data across existing tools without prerequisite transformation projects. By embedding expert knowledge into investigative frameworks and using AI for intelligent question selection, context-aware analysis, and decision-making, we deliver investigations completing in minutes rather than hours. The future of AI in security operations requires architectural thinking over technological autonomy—creating frameworks where AI augments and automates within enterprise governance requirements, enabling productivity gains without sacrificing transparency, auditability, or control.
Dean De Beer
Dec 16, 2025
8
min read
By clicking “Accept All Cookies”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.