This blog post is the first in a series of short articles discussing goals, challenges and approaches to implementing AI as an enabling technology in Command Zero.
In my previous blog post (Transforming Cyber Investigations: The Power of asking the right Questions), I mentioned our approach to implementing Large Language Models (LLMs) in Command Zero. Our general philosophy towards AI is simple. We use LLMs to augment the capabilities of our platform. We structure our content (Questions, Facets, Metadata, Prompts, Answers, Relationships) to improve the quality of the models’ responses. As with developing any production-ready application, LLMs bring their own set of unique implementation challenges.
For us, these challenges can be categorized as accuracy, latency, scalability and cost. All of which have an impact on the user experience.
Context and intent are all you need
As we, as an industry, integrate LLMs into products and processes, it's been clear from the start that the quality and accuracy of any AI-driven insight is directly proportional to the relevance of the contextual information we feed into those systems. Collecting enough context and intent during investigations to ensure meaningful outcomes and responses is an ongoing challenge.
Context in cyber investigations goes beyond the availability of data. It's the entire ecosystem in which an incident occurs— the organization's structure, its technology stack, historical patterns and investigations along with the global threat landscape. Intent, on the other hand, relates to the purpose behind each question, the actions of the attacker, the direction of the investigation and the goals and priorities of the security team.
Challenges in capturing context and intent
1. Fragmented Data: The relevant context is often scattered across multiple data sources, making it difficult to build a cohesive picture of what may have occurred.
2. Fluid Context: The context of an investigation can change rapidly, requiring constant updates, new questions and reassessment of outcomes.
3. Implied Knowledge: Much of the context in an organization exists as implicit knowledge in the minds of the security team, making it somewhat challenging to extract and make available to the LLM.
4. Ambiguous Intent: User queries or investigation paths may have multiple possible intents, further complicating the quality of the LLM’s response.
5. Contextual Relevance: Determining which questions and data are most relevant to a specific investigation is complex, even for experienced analysts.
6. Scalability of Context Collection: As the volume and variety of data grow, collecting and processing contextual information at scale become increasingly challenging.
Some techniques we use to improve contextual understanding and outcomes include Retrieval Augmented Generation (RAG), Semantic Search and entity extraction for embeddings, enrichment and question prioritization to name a few. In follow up posts I'll cover some of these. Another technique is leveraging NLP generated question metadata.
Enhancing investigations with contextual questions for intent aka "What questions should I ask?"
It's a simple question but building a system that allows for the accurate, repeatable presentation of questions to ask is a little more complicated. Our process for question creation includes both manual and automated steps in our generation pipeline. The final step is the creation of additional metadata for each question we ask. This enriched metadata, created through Natural Language Processing (NLP) techniques, serves as a bridge between the raw data produced and extracting meaningful insights. This approach significantly enhances our ability to capture context and intent, leading to more effective decision-making during investigations.
The questions we create and the associated context and intent that we generate are designed in such a way to help provide input to the models to make better decisions and propose additional actions.
In-context learning is a feature of LLMs that allows the model to use the context provided to perform a domain-specific task without the need for fine-tuning of the base model. A key part of this is the domain-specific text and phrasing used, crafting the system, assistant and user prompts, template definitions and provided contextual data. Each use case we address requires changes to each of these and the use of NLP-enriched question metadata is part of this process.
The examples below show a few of our questions and the intent created as one part of the NLP process.
Question:
What modifications have been made to service principals in the Microsoft 365 (M365) tenant?
Intent: The intent is to explain the importance of monitoring modifications to service principals in the Microsoft 365 environment and how this can help in identifying potential security threats.
Question:
What previously deleted Azure Applications have been restored in the Microsoft 365 (M365) tenant?
Intent: The intent is to provide guidance on how to identify and evaluate newly added Azure applications during a cybersecurity investigation in a Microsoft 365 tenant.
Question:
What application role assignments have been made to service principals in the Microsoft 365 (M365) tenant's Azure Active Directory (AAD)?
Intent: The intent is to explain the role of service principals and application role assignments in Azure Active Directory, and how reviewing these can help in a cybersecurity investigation.
By controlling their creation right down to how we phrase the questions, we're able to:
- Prioritize and order the questions based on the current context of the investigation. For example, in the early stages, the system might determine, based on provided, relevant results and potential leads that it may be a Business Email Compromise (BEC), Insider Threat or something else. This helps to narrow down the investigation's focus.
- Provide investigation flexibility and "What If" scenarios allowing analysts the flexibility to explore different scenarios and hypotheses (go down that rabbit hole!) quickly and with little additional cost. The platform can then generate relevant questions and potential outcomes based on these hypothetical situations, helping analysts to consider multiple angles during an investigation.
- Create resource-specific questions focused on specific entities or resources involved in the investigation. For example, if an IP address is flagged, the platform will ask: "What is known about the geolocation and reputation of this IP address?" or for a Microsoft User Principal Name (UPN) correlate that identifier with the user's email address, role, groups and assignments, GitHub and Okta activity and more.
Without sufficient context or a clear grasp of intent, it's challenging to infer appropriate next steps in an investigation and to expect that a general LLM do the same is unreasonable. Through each piece of relevant context and every clarification of intent we can refine and direct the analysis, leading to more accurate and actionable investigation outcomes.
Conclusion
Hopefully this provides some insights into how we think about the content we produce and how we structure, enrich and use it with the help of LLMs. For now, our focus for AI implementation is on capturing, processing, utilizing context and clear intent. We are continually refining our approach to implementing AI in cyber investigations. This involves leveraging techniques like RAG, Semantic Routing, intelligent discovery processes, and dynamic contextual questioning. Through this, we can create a platform that 'understands' the nuance and context of each investigation. This leads to more accurate, actionable, and impactful outcomes for everyone.