Last year at BSides San Francisco, I watched Anthropic's security team, Jackie Bow and Peter Sanford, present their internal AI-powered SOC. The numbers were striking: a 90% reduction in investigation time, from 40 minutes down to 3 minutes. Claude was handling security investigations with sophisticated context assembly, pulling data from multiple sources, and producing analyst-grade outputs.
It was genuinely impressive work. It was also a stark reminder of a question every security leader needs to ask: Can your organization sustain this kind of engineering effort over time? Because what I saw wasn't just a clever implementation; it was a security team with deep AI expertise, direct access to model developers, and the organizational commitment to iterate through production challenges. They weren't bolting AI onto existing processes. They were architecting investigation workflows specifically designed for AI execution.
Most security teams are trying to replicate this magic with a Python script, a LangChain wrapper, and one overworked engineer who's passionate about AI.
The Numbers Don't Lie
The current generation of AI frameworks has created what I call the "prototype illusion". You can build something impressive in days. Splunk's research team built DECEIVE, an AI-powered SSH honeypot, in roughly three days. It worked, it generated summaries, and it looked production-ready.
But here's what the statistics actually show about the gap between demo and production:
- Gartner: 30% of generative AI projects will be abandoned after proof-of-concept by the end of 2025.
- MIT Study: Only 5% of custom enterprise AI tools reach production.
- S&P Global: The abandonment rate surged from 17% to 42% year-over-year.
- Overall: 46% of projects are scrapped between proof-of-concept and broad adoption.
That's a 95% failure rate for custom AI implementations. Let that sink in.
What Actually Breaks: The Technical Reality
What we're seeing across the industry is a consistent pattern: rapid prototype success followed by production failure. During my 24 years in security operations, I've learned that production failures rarely happen for the reasons teams expect. The hard problems aren't about getting the AI to work; it's about getting it to work reliably, consistently, and sustainably in production environments.
The Context Assembly Challenge
Your prototype works great with carefully curated log examples. Production gives you malformed JSON, incomplete syslog entries, proprietary vendor formats, and telemetry gaps. When investigating a credential attack, you might need authentication logs from Azure AD, endpoint data from CrowdStrike, network traffic from your firewall, and privilege escalation indicators from your SIEM.
That's potentially millions of log entries. How do you select the relevant information from those millions to give the AI the right context for accurate analysis? Most teams approach this by tuning their vector databases and hoping the right information gets retrieved. What they're missing is that context selection requires sophisticated engineering work that doesn't show up in prototypes.
The Reliability Verification Problem
You can't test AI investigation systems the way you test traditional software. AI systems are probabilistic, the same investigation might produce slightly different analysis across runs. This requires verification infrastructure that validates conclusions converge consistently across variations in phrasing and evidence presentation. Most prototypes skip this entirely, then discover in production that "the AI works differently than it did in testing".
The Cost Scaling Reality
Most prototypes never test what happens when you scale from 100 alerts a day to 3,000. There is anecdotal evidence (noted by HPE) of an oil and gas company seeing their AI operational costs balloon from $1.2 million to $12 million annually because they hadn't architected for production scale. In production SOC environments, costs can scale unpredictably without proper architectural planning.
The Organizational Reality Nobody Talks About
The failure mode that kills more AI SOC projects than any technical challenge is the "hero developer" problem. Most projects rely on one or two smart individuals who understand both security operations and AI engineering. They built the prototype, understand the prompts, and keep the system running.
Then that person gets recruited away or burns out. The project becomes an orphan. Small issues compound, and the system that was delivering 90% time savings six months ago is now producing inconsistent results. Eventually, someone makes the call to shut it down and go back to manual processes.
The Strategic Resource Planning Question
Can you sustain and evolve this over time? The AI landscape moves fast. Ask yourself:
- Organizational Continuity: What happens when your AI-savvy security engineer leaves?
- Sustained Engineering Capacity: Who handles ongoing maintenance as models and APIs evolve?
- Threat Landscape Pace: Who encodes new investigation patterns as attack techniques change?
For most security teams, the honest answer is: "We don't have a plan for this". That's not a strategy; that's hope.
The Build vs. Buy Calculation
The right questions are about expertise and continuity:
Expertise accumulation: Does our organization have the capacity to maintain expertise in context engineering and verification infrastructure?
Strategic focus: Is becoming expert at AI infrastructure engineering the best use of our limited engineering capacity?
For most organizations, honest answers point toward investing in platforms that have already solved these infrastructure challenges.
What Success Actually Looks Like
The 2025 State of AI in the SOC survey found organizations processing 960 alerts per day on average, with large enterprises handling 3,000+. Meanwhile, CrowdStrike data shows that average eCrime breakout times (the time it takes for an initial compromise to escalate to lateral movement) have reached just 48 minutes.
In that environment, you can't afford to be debugging why your custom AI system degraded. You need investigation velocity that works reliably today and six months from now. Success is achieving sustainable production deployment that survives organizational changes and scales with operational demands.
In our next post, I'll share how Command Zero approaches these production challenges with an architecture that enables reliable AI-augmented investigation without requiring you to become an AI engineering expert.





