Anthropic's GTG-1002 disclosure: When AI Becomes a Cyber Weapon of Mass Destruction, Investigation Capabilities Must Scale

When Anthropic disclosed that Chinese state-sponsored group GTG-1002 had weaponized Claude Code to execute cyberattacks against roughly thirty organizations—with AI handling 80-90% of the tactical work—the security community rightly focused on the offensive implications. Here's a sophisticated threat actor using LLMs to perform reconnaissance, develop exploits, and exfiltrate data at a speed and scale that would have been impossible for human operators just months ago.

But there's another critical question buried in that disclosure: When a threat actor compromises thirty organizations simultaneously using AI-driven automation, how do you investigate all of those incidents at human speed?

You don't. And that's exactly why we built Command Zero the way we did.

The Investigation Crisis No One Talks About

During my years defending networks in the Air Force, I watched the same pattern repeat endlessly: sophisticated attacks would happen, evidence would scatter across multiple systems, and investigators would spend days—sometimes weeks—manually stitching together what happened. We'd query endpoint logs (if they even existed), pivot to firewall data, check identity providers, dig through email logs, and somehow try to maintain context across all of it while incidents were actively unfolding.

The constraint was never analyst skill. It was the fundamental mismatch between the scale of modern attacks and the linear, human-paced investigation process.

Now imagine investigating the GTG-1002 campaign. Thirty compromised organizations. Each one potentially has unique infrastructure, different security tools, varied log retention policies. The threat actor used Claude to fire thousands of requests, often multiple per second, performing reconnaissance, testing exploits, harvesting credentials, and categorizing exfiltrated data. All simultaneously.

Traditional investigation approaches—even with skilled tier-2 and tier-3 analysts—simply cannot match that scale. You're asking human operators to chase an adversary that moved at machine speed across dozens of environments, leaving evidence trails in multiple disparate data sources at each target.

This is where the fundamental disconnect lies: Offensive automation has scaled dramatically, but defensive investigation has remained stubbornly human-paced.

AI-Augmented Investigation: Not Opportunism, But Core Vision

When we started building Command Zero, AI-augmented investigation wasn't a feature we bolted on after seeing what ChatGPT could do. It was the foundational architecture we designed from the beginning, because the investigation bottleneck has always been about cognitive load and tool fragmentation, not analyst expertise.

The vision was straightforward: Security teams shouldn't waste cognitive capacity remembering query syntax, translating questions between different data sources, or manually correlating evidence across systems. They should focus on what humans do best—strategic thinking, pattern recognition, and decision-making under uncertainty.

We knew that effective investigation required three things:

1. Rapid querying across heterogeneous data sources: Analysts need to ask questions that span EDR, SIEM, identity providers, cloud logs, and email systems without mentally switching between KQL, SPL, SQL, and whatever proprietary query language each vendor invented.

2. Persistent context across the investigation lifecycle: When you're three hours into investigating a Business Email Compromise and discover the attacker pivoted from one mailbox to fifteen others, you can't afford to lose track of what you've already confirmed. Context must carry forward—not in an Excel spreadsheet with 47 tabs, but in a system that understands the investigation as a coherent whole.

3. Knowledge capture that doesn't require manual documentation: The expertise that tier-3 analysts develop through years of investigation shouldn't exist only in their heads—or in that "Investigation Playbook.xlsx" file that everyone knows exists but nobody can find the latest version of. Investigation methodologies need to be codified, shared, and continuously refined.

These requirements pointed toward a system that could understand investigation intent, translate it into the appropriate technical queries, maintain state across the entire process, and learn from each investigation. That's AI augmentation. Not replacing analysts, but giving them leverage they've never had before.

The LLM Progression That Changed Everything

Here's the honest truth: When we started building Command Zero, the LLM capabilities we needed didn't quite exist yet. Early models could handle straightforward query translation, but they struggled with the complex, multi-step reasoning that real investigations require.

Then the progression accelerated dramatically.

Over the past two years, we've watched LLM capabilities evolve in ways that directly unlocked investigation use cases:

Multi-step reasoning capabilities improved from "complete this task" to "develop and execute a multi-stage investigation plan." When an analyst asks "Did this user access any sensitive files after their password was reset?", modern LLMs can decompose that into authentication log queries, file access checks, sensitivity classification lookups, and timeline correlation—then execute all of it while maintaining logical consistency.

Tool use and function calling matured from experimental features to reliable capabilities. The AI can now determine which data source to query, construct the appropriate query in that source's native language, interpret the results, and decide whether follow-up queries are needed. This is the difference between a query assistant and an investigation partner.

Domain-specific performance improved dramatically through better training data and fine-tuning approaches. LLMs now understand security concepts, recognize investigation patterns, and generate queries that reflect how experienced analysts actually think about problems—not just how documentation says they should.

Each of these advancements didn't just make Command Zero better. They enabled capabilities that were previously impossible. The vision was always there; the technology finally caught up to make it real.

Augmentation, Not Replacement: Understanding the Human-AI Partnership

Let me be absolutely clear about something: Command Zero is not building AI to replace security analysts. That was never the goal, and it never will be.

Here's why: The most valuable thing an experienced security analyst brings to an investigation isn't their ability to write KQL queries or remember which SIEM field corresponds to which EDR attribute. It's their intuition about what's weird, their pattern recognition across incidents, their understanding of business context, and their judgment about what matters.

These are distinctly human capabilities. An AI can query faster, maintain perfect context, and never forget investigation methodologies. But it can't look at an access pattern and think "That's technically allowed, but it feels wrong given what I know about this user's role." It can't weigh the political implications of escalating a potential insider threat case. It can't make the judgment call about whether to take production systems offline during an active compromise.

What AI can do—what it should do—is eliminate the tedious, repetitive, cognitively demanding work that gets in the way of analysts exercising these human capabilities.

Consider the typical investigation workflow before AI augmentation. The analyst:

1. Recognizes something suspicious (human expertise)
2. Figures out which system logged the suspicious activity (mental overhead)
3. Remembers or looks up the query syntax for that system (context switching)
4. Writes the query, makes syntax errors, refines it (frustration)
5. Interprets results and decides what to check next (human expertise)
6. Repeats steps 2-5 across multiple systems (cognitive exhaustion)
7. Tries to remember everything discovered so far (cognitive limit)
8. Manually documents findings (time sink)

Steps 2, 3, 4, 6, 7, and 8 are where AI creates leverage. They're necessary for investigation, but they don't require human judgment—they just consume the mental capacity and time that analysts need for steps 1 and 5, where their expertise actually matters.

This is what augmentation means in practice: AI handles the mechanical work of investigation so analysts can focus on the analytical work that only humans can do.

The analyst still drives the investigation. They still make decisions about what's suspicious, what requires escalation, and how to respond. But now they're working at the speed of thought rather than the speed of query syntax recall.

The Defensive Investigation Advantage in the AI Era

The GTG-1002 disclosure makes something clear: We've entered an era where sophisticated threat actors will use AI to attack at scale and speed. Anthropic reports that at the peak of the campaign, the AI was making thousands of requests, often multiple per second—an attack speed that human hackers simply cannot match.

This creates an asymmetry problem: If attackers are using AI to operate at machine speed across multiple targets simultaneously, defenders must be able to investigate at similar scale and speed.

This is where the architecture choices we made from the beginning become critically important.

Command Zero's Custom Questions aren't just convenient wrappers around queries—they're investigation frameworks that can execute across any number of targets in parallel. When we codify how to investigate "Did this compromised account access any sensitive SharePoint sites?", that question can run against one compromised account or thirty compromised accounts simultaneously. Same investigation methodology, machine-scale execution.

Consider how you'd investigate if your organization was one of the GTG-1002 victims:

Traditional approach: Your incident response team discovers suspicious activity. They start querying the EDR for the initial indicator. Find something concerning, so they pivot to SIEM logs. Notice authentication anomalies, so they check Azure AD. See suspicious mailbox access, so they dig into Exchange logs. Each pivot requires switching tools, remembering different query syntaxes, and manually tracking what they've learned. They're building context in an Excel spreadsheet while trying to answer the fundamental questions: What accounts were compromised? What systems did the attacker touch? What data was accessed? What persistence mechanisms exist? Timeline: days or weeks to piece together the full scope.

AI-augmented approach: Your team asks Command Zero the key questions: What credentials were compromised? What systems did this account access? What emails were sent from the compromised mailbox? What files were downloaded? These questions—encoded as Custom Questions—execute across EDR, SIEM, identity providers, email logs, and cloud platforms simultaneously. Results correlate automatically. You're not switching between tools; you're following the investigation thread. Timeline: hours to understand full scope, with complete evidence already correlated.

The difference isn't just speed. It's thoroughness. When you're asking humans to manually investigate thirty incidents, they'll take shortcuts. They'll check the obvious indicators, confirm the compromise, and move on to the next one. They have to—there isn't time for a comprehensive investigation at scale.

When AI handles the mechanical query work across all targets simultaneously, you can be just as thorough investigating thirty incidents as you would be investigating one. Every organization gets the same depth of analysis. Every edge case gets checked. Every correlation happens automatically.

This is the investigation advantage that AI creates: matching the scale and speed of AI-enabled attacks with AI-augmented defensive response.

The Question-Driven Investigation Framework

What makes this approach effective isn't just that AI can write queries faster. It's the methodology we've built around how investigations actually work.

Experienced analysts don't investigate by writing random queries. They follow mental frameworks: Check this, then check that, then if you find X, check Y. It's structured thinking developed over years of incident response.

Command Zero's Custom Questions codify these frameworks. They turn the investigation approach that exists in senior analysts' heads into executable, repeatable methodologies that any tier-2+ analyst can leverage.

When a tier-3 analyst develops a particularly effective way to investigate OAuth application abuse, that doesn't stay locked in their head or buried in a case note. It becomes a Custom Question that the entire team can use. The methodology is preserved, shared, and continuously refined based on new cases.

This creates a compounding advantage over time. Each investigation that teaches the team something new improves the investigation framework for every future case. The organization's collective investigation expertise grows rather than remaining scattered across individual analysts' memories.

And when threat actors evolve their tactics—as they inevitably will—the response isn't "figure out new queries." It's "update the relevant Custom Questions based on what we learned." The methodology adapts, and every analyst immediately benefits from that adaptation.

What the GTG-1002 Campaign Teaches Us About What's Next

Anthropic's disclosure of GTG-1002 is significant not because it's the first time threat actors used AI (it almost certainly isn't), but because it's the first time we have documented evidence of AI handling the majority of an actual cyberattack campaign's tactical work.

The report notes that GTG-1002 jailbroke Claude by decomposing malicious tasks into small, seemingly innocent operations and role-playing as a legitimate security firm. The AI occasionally hallucinated credentials or misidentified publicly-available information as sensitive data—limitations that might seem reassuring until you realize they're temporary technical constraints, not fundamental barriers.

Every model release brings capabilities that were impossible six months prior. The investigation capabilities we've built into Command Zero have improved dramatically as LLMs evolved; there's no reason to think offensive capabilities will stagnate while defensive capabilities advance.

What this means for security operations is straightforward: AI-enabled attacks will become more sophisticated, more scalable, and more automated. Defensive investigation must evolve to match.

The good news is that AI provides advantages to defenders that don't exist on the offensive side. Attackers need to jailbreak models, decompose tasks to bypass safeguards, and work around safety limitations. Defenders building AI-augmented investigation tools don't face these constraints—we can tune systems specifically for investigation workflows, optimize for the questions analysts actually need answered, and integrate directly with the security tools that provide ground truth.

More importantly, defenders have something attackers don't: investigation experience encoded across thousands of actual incident response cases. Every Business Email Compromise investigation, every ransomware response, every insider threat case has taught security teams something about what questions matter and what patterns indicate real threats versus false positives.

This accumulated expertise, when codified into AI-augmented investigation frameworks, creates a defensive advantage that's hard to replicate from the offensive side. Attackers can use AI to scale their operations, but they can't easily capture the institutional knowledge that experienced defenders have built over decades of responding to actual incidents.

The Operational Reality Check

There's a tendency in our industry to treat AI as either a silver bullet that solves everything or a dangerous overreach that we should avoid. Neither perspective matches operational reality.

Back when I was doing tier-3 operations defending networks, I would have killed for a system that could maintain perfect context across a complex investigation, instantly translate my questions into the correct query language for whatever system I needed to check, and automatically correlate findings across multiple data sources. Not because I needed AI to replace my analytical capabilities, but because I was wasting huge amounts of time and cognitive capacity on mechanical work that got in the way of actual analysis.

The skill ceiling for security analysts isn't going to drop because AI can write queries. If anything, it rises. The analysts who understand how to effectively partner with AI—how to structure investigation questions, how to validate AI-generated results, how to maintain strategic context while the system handles tactical execution—those analysts become dramatically more effective than they were before.

This is what I mean by augmentation: AI raises the operational ceiling by removing the mechanical constraints that previously limited how much an analyst could accomplish.

A tier-2 analyst who previously spent 60% of their time wrestling with query syntax and tool switching can now spend 60% of their time on actual analytical work. A tier-3 analyst who could thoroughly investigate five incidents per week can now handle fifteen, with the same depth of analysis. An incident response team that needed three days to piece together what happened during a compromise can now do it in three hours.

These aren't theoretical improvements. They're the operational advantages we see teams achieve when investigation tooling actually matches how investigations work.

Command Zero's Position in the AI Security Landscape

The cybersecurity AI landscape is fragmenting into distinct approaches:

Some companies are building AI for offensive security—like Xbow's focus on autonomous pentesting to help defenders understand what AI-enabled attacks look like. That's valuable work. Organizations need to test their defenses against the tactics that sophisticated threat actors will use.

Other companies are building AI for threat detection—systems that analyze telemetry streams, identify anomalies, and generate alerts. This addresses the "find the needle in the haystack" problem at scale.

Command Zero operates in a different space: AI-augmented investigation for when you're already responding to an active incident or suspected compromise.

This is the difference between "What happened?" and "Is something happening?" We're not replacing SIEM alerting or EDR detection. We're giving incident responders and security analysts the leverage they need when those systems (or a concerned user, or threat intelligence, or any other source) indicate that something requires investigation.

When GTG-1002 compromises an organization, that compromise will eventually generate some kind of signal—unusual authentication patterns, suspicious file access, abnormal data transfers, something. At that point, someone needs to investigate: What account was compromised? How did they gain access? What did they do once inside? What data was accessed or exfiltrated? What persistence mechanisms exist? What other accounts or systems are affected?

These are investigation questions that require querying multiple systems, correlating findings, maintaining context, and making analytical judgments about what matters. This is where AI augmentation creates decisive advantages for defenders.

Looking Forward: The Analysis Arms Race

Anthropic's disclosure mentions that they're "continually working on new methods of investigating and detecting large-scale, distributed attacks like this one." That's the right response. The GTG-1002 campaign won't be the last time sophisticated threat actors weaponize AI for cyberattacks. It might not even be the most sophisticated example—it's just the first one that's been publicly documented.

What happens next is predictable: Attackers will continue to evolve their use of AI, finding better jailbreak techniques, more sophisticated task decomposition strategies, and ways to work around model limitations. Defensive AI will evolve in parallel, with better detection capabilities, more sophisticated investigation frameworks, and improved analyst augmentation.

This is the arms race we're in. The question isn't whether AI will be used in cybersecurity—it already is, on both sides. The question is whether defensive AI can evolve fast enough to match the pace of offensive AI evolution.

Command Zero's approach gives us confidence that the answer is yes, for a specific reason: **We're building on top of investigation methodologies that already work.** We're not inventing new ways to investigate incidents. We're taking the investigation approaches that experienced analysts have developed over decades and giving them the scale, speed, and consistency that AI enables.

As LLMs continue to improve—and they will—the investigation capabilities we can offer will improve proportionally. Better context handling means more complex investigations without losing thread. Better reasoning means more sophisticated analytical assistance. Better integration capabilities mean fewer manual steps in the investigation workflow.

But the core insight remains constant: Effective investigation requires human expertise partnered with machine scale. Analysts drive the investigation. AI handles the mechanical execution. Together, they can respond to AI-enabled threats at the scale and speed those threats demand.

This is the Moment Command Zero is Built For

When we set out to build Command Zero, we knew that investigation would eventually need to scale beyond what human analysts could handle manually. We didn't know exactly when that inflection point would arrive, but we knew the direction the industry was heading.

The GTG-1002 disclosure suggests that moment has arrived. Sophisticated threat actors are using AI to conduct attacks across dozens of targets simultaneously, at speeds human operators cannot match. The investigation capability gap—the mismatch between attack scale and defensive response capacity—is now a critical operational problem for security teams.

This is the moment Command Zero was built for: When the question isn't "Can we investigate this incident?" but rather "Can we investigate thirty incidents simultaneously with the same thoroughness we'd apply to one?"

The answer, with AI-augmented investigation, is yes. The investigation frameworks are ready. The technology has caught up to the vision. And as offensive AI capabilities continue to evolve, defensive investigation capabilities will evolve to match.

Because fundamentally, this isn't about AI replacing human analysts or automating away security expertise. It's about giving skilled defenders the leverage they need to operate at the scale the threat landscape now demands. It's about ensuring that when threat actors weaponize AI for attacks, defenders have AI-augmented capabilities that let them investigate, respond, and harden systems effectively.

That's always been the plan. Now it's operational reality.

Want to see how AI-augmented investigation works in practice? Book a demo with our team to see how Command Zero's question-driven investigation framework helps security teams respond to incidents at scale.