About Sumo Logic
Sumo Logic is a cloud-based analytics platform that unifies organizations by collecting, analyzing, and managing log data from applications and networks. It provides real-time insights into security, operations, and business intelligence, and can help to automate troubleshooting. Over 2,000 organizations worldwide save time and effort by relying on Sumo Logic to get powerful real-time analytics and insights to resolve the hardest questions facing their cloud-native applications.
Sumo Logic’s Challenge
Recognizing the step change in product and service improvements made possible by GenAI, Sumo Logic has long been an established player heavily investing in innovation across its many product offerings.
“Customers have accepted AI as a real innovator, so the time is now for differentiating and disrupting the market,” said Tej Redkar, Chief Product Officer
Sumo Logic surmises that automating rule creation for security information and event management (SIEM) – specifically the challenges experienced in the logging and observability space and the process of identifying a root cause from logs – could be improved using GenAI. An initial project was aimed at automapping structured log data to the Elastic Common Schema (ECS). The end goal of this initial proof of concept (POC) engagement was to catapult innovation that reduces mean time-to-resolution (MTTR) for Sumo Logic’s customers.
Why Tribe AI?
Sumo Logic was introduced to Tribe AI by the private equity firm, Francisco Partners, who recently acquired their business. After hearing the Tribe AI team lead a conversation on applying GenAI to innovate observability and security, the Sumo Logic team knew that Tribe AI was the right partner for the engagement.
Developing the Phase 1 Use Case
A Proof of Concept for Auto Mapping Log Data
The engagement with Sumo Logic started with a four-week POC where Tribe AI was to perform extensive research and testing to understand if a large language model (LLM) could be utilized to automap structured log data to the Elastic Common Schema (ECS) format to improve observability. The work was successful in demonstrating that yes, an LLM could be used for this type of functionality.

However, the Tribe AI team didn’t stop there. They also tested using an LLM to interpret unstructured log data, and showed that this works with Anthropic’s Claude.
Unstructured log data parsing tasks in Claude:
1. Claude parses unstructured logs into ECS format
2. Claude identifies the log type, then parses it
3. Claude correctly explains what's happening in the log data
4. Claude identifies incidents from looking at logs (analysis)

Phase 1 Proposed Solution
During the POC engagement, the Tribe AI team proposed a solution that was executed following a series of steps, detailed below, to achieve automapping of structured log data.
1. Read the log file from a JSON file and its corresponding csv ground truth with two columns for the correct map (key ---> ECS Key)
2. Generate descriptions from the logs for better "step by step" reasoning
3. Generate mapping with the FieldSets and keys detailed in prompt
4. Mapping + filter hallucinations + evaluation if Ground Truth is Provided
5. Present summary (counts & accuracy)
Developing the Phase 2 Use Case
Unlocking the Future of ‘Generative Context Engine’ with GenAI
Due to the success of the teams’ working relationship during the POC engagement, Sumo Logic decided to begin a second phase of the project. The focus of phase two was the interpretation of unstructured log data in the event of an incident – e.g. security threat or infrastructure outage – with a goal to reduce mean time-to-resolution when incidents arise. Customers presently spend millions on observability tools and are still limited by archaic, time-consuming processes when it comes to discovering the root cause of any given incident.

Current State of Logging & Observability:
Organizations face massive volumes of log data generated by applications, infrastructure, and services. These logs provide invaluable insights, but the sheer volume and complexity can make extracting meaningful information a daunting task.
Complexity of Logs:
Logs often come in unstructured formats from various sources—servers, applications, security systems, etc.—making it hard to identify patterns or link events. Traditional systems rely on predefined schemas, but they struggle with dynamic, unstructured data.
The Root Cause Process:
When an issue arises, teams need to sift through these logs, identify patterns, and link logs across different systems to form a 'trace' of the event. Traces provide a clearer picture of what went wrong. This end-to-end process can take hours or even days—especially when the logs are unstructured or missing critical information.
Tracing Instrumentation Challenges:
Implementing tracing systems can be time-consuming, requiring expert-level knowledge to set up and interpret. While tracing is helpful for deep diagnostics, many organizations haven’t yet implemented it or lack the resources to manage it effectively.
Business Opportunity:
Sumo Logic has recently moved to a value-based, free data ingesting customer licensing model. Removing price as a barrier to entry, Sumo Logic witnessed a large increase in new customers. They wish to cut through the complexity and demonstrate their platform’s value to pinpoint root causes quickly in real time to both new and existing customers who have not yet implemented tracing.
The teams made a hypothesis that they could utilize logs and leverage LLMs to provide a more dynamic and efficient view into interpreting customer log data.
By providing LLMs with unstructured log data in natural language, the models can interrupt and respond – again, using natural language – to more quickly uncover the root cause of the incident in a much faster fashion.
Phase two spanned three months and in the end proved the team's hypothesis was valid, as mean time-to-resolution was reduced from hours/days to less than one minute. The team calls this functionality ‘Generative Context Engine’ and it has created a tremendous amount of excitement in the industry.
Phase 2 Proposed Solution
Tribe AI harnessed the power of Anthropic’s Claude 3.5 Sonnet LLM to extract meaningful insights from customer log data. Customers can skip the process of instrumenting traces to identify root causes, as Claude 3.5 automatically analyzes logs and directly pinpoints the root cause of incidents. This functionality, called ‘Generative Context Engine’, eliminates the need for predefined trace identifiers and accelerates the troubleshooting process. Claude’s capabilities had been improved significantly since the original POC, so the frequency of 'hallucinations' noted in phase 1 were reduced greatly in phase 2.
How it works:
- Log Compression: Using proprietary approaches, logs are deduplicated and then sampled to retain the representation across all the services in the data, while maximizing the number of error messages that will fit in the context window along with the prompt
- Log Summarization: Sonnet 3.5 summarizes the resulting thousands of logs, allowing analysts to extract key insights without manually sorting through endless data.
- Service map view: Using Sonnet 3.5, an overview map is generated showing how services are connected, highlighting services that are exhibiting problems.
Tech Stack Details
Full-stack cloud-based application working alongside the existing Sumo Logic’s environment:
- Cloud: Amazon Web Services (AWS) - Bedrock, S3, Cloudwatch, EKS
- Large language models: Anthropic Claude 3.5 Sonnet
- Languages used: Python (coding)
Model Training with Sample Data
Because Sumo Logic data wasn’t available at the start of the engagement, Tribe AI formulated a plan for demoing and testing the model using sample data. Data was generated using the OpenTelemetry demo open-source Astronomy app hosted from minikube on Tribe team members’ laptops. The app was set up to send log and trace data from there into Sumo Logic. The Tribe AI team then fed data through Sonnet 3.5 for analysis and interpretation. Eventually the app was tested with multiple transactions of log data, then deliberated over-scaled with a load generator to 1,000 fake users, until it crashed and the teams could test the model’s accuracy in interpreting the incident and root cause. Later, another open-source tool called Chaos Mesh was leveraged to generate test data representing specific scenarios, such as network outages, and cascading failures.
Sumo Logic’s Experience Working with Tribe
“Partnering with Tribe AI – and leveraging their complementary GenAI skill set – was critical to the success of this project,” said Tej Redkar, Chief Product Officer
Redkar’s team at Sumo Logic has experience with building and utilizing traditional AI algorithms but without the Tribe AI team’s first-hand experience in leveraging LLMs, the process of delivering first-of-its-kind ‘Generative Context Engine’ wouldn't have been so smooth. Redkar credits Tribe AI for their ability to rapidly procure a custom team of experts that understood the use case and contributed to the positive outcomes of the engagement.
“We had an ambitious scope and needed a really novel GenAI application to achieve our goal, which required a very high-level of expertise in LLMs and prompt engineering,” said Tej Redkar, Chief Product Officer
Tribe Team Members
Kuba & Alex: ML Engineers
Kash: AI Engineer
Sam: Technical & Product Lead + ML/AI Engineer
Orges: Engagement Manager
Impact
For Sumo Logic, the biggest impact achieved during the engagement was in reducing mean time-to-resolution (MTTR), a critical metric in their space. What used to take hours or days can now be achieved in less than a minute. Additionally, the cost per root cause is predicted to be around $.50 compared to the per hour/day rate of a full-time engineer.
This has also empowered their non-expert users to easily troubleshoot issues without relying on specialized staff or complicated instrumentation, broadening adoption and democratizing the use of log data across teams in the organization.
A demo of ‘Generative Context Engine” was showcased at AWS Re: Invent in December of 2024 and was overwhelmingly well-received. Coupled with the announcement of Sumo’s general availability release of their copilot (called ‘Mo’), the demo signaled to the market that Sumo Logic is a category-defining leader in the observability space.
The Future
Tribe AI and Sumo Logic know that they are just scratching the surface of what’s possible with GenAI, leaning on Anthropic and AWS as partners. Looking ahead, the teams believe the solution can do more than analyze unstructured logs and suggest fixes. The hope is that one day it could predict outages before they happen or automatically resolve issues on its own. GenAI is on track to completely transform how security and performance monitoring is done, not just at Sumo Logic, but across the industry.
The teams are currently working to scale up the Phase 2 solution with a one-to-one match of functionality through a beta test that incorporates real customer data.
The post-beta solution will be productized with plans for a General Availability launch by the end of Quarter 1, 2025.