TL;DR: A potential situation like this is actually pretty much standard in the software industry. Do you want to go full DIY or something in between (that is for example Chronosphere), where you still need to do the major lift of extracting intelligence from your observability data, or do you want to focus on getting ahead in this AI race and pay someone like Datadog so your own engineers can focus on getting you to your actual goal?
While we can’t tell which company switched to Chronosphere and from which company did they moved away, we can use this as a thought experiment and for the following discussion just pretend that “LLM Company XY” churned from Datadog and switched to Chronosphere: So, I would assume that LLM Company XY decided to switch to Chronosphere, primarily to lower cost. The 9 figure deal with Chronosphere is a multi-year deal. Since they are a large model provider, I estimate that they would have spent up to $30M per year with Datadog, before switching (OpenAI, which is Datadog’s biggest AI native customer brings about 80-90% of the AI native revenue to Datadog, with an estimated spending of $300M per year, this leaves about $30M (~10% of ~300M) for LLM Company XY. Note: if LLM Company XY did churn from Datadog, it was most certainly not Open AI, because first, they just renewed their contract with Datadog, second, they collaborate with Datadog and third, they are way to deeply integrated with Datadog to be able to just pull the plug). So we are talking about a potential ~$7.5M impact per quarter for Datadog. Certainly significant, even though their sales last quarter was $953M, so ~0.8%. Given the timing of the potential churn, I wouldn’t be surprised Datadog already knew or at least suspected this when they reported on Feb. 10, so that could explain some of the “more conservative” revenue guide? (Which, by the way is also “excluding our largest customer”, (which is OpenAI) and yet it is a higher YoY percentage guide than they gave in their initial FY25 guide a year ago.)
The more interesting question to maybe ask is why did LLM Company XY move from Datadog to Chronosphere? (again assuming this is actually what happened.) And what does that mean for other current or potential future AI native or other customers of Datadog?
The easiest explanation is that LLM Company XY did lean heavily into OpenTelemetry (OTel, which is open source telemetry gathering technology). Because they instrumented their code using an open standard rather than the Datadog SDK, they could simply flip a switch and instead of sending their data to Datadog’s servers, they pointed it toward Chronosphere. OpenAI (and many other AI native customers), however, are deeply integrated with Datadog’s specific features, like their Agents, AI App Monitoring and GPU-specific dashboards. Moving would mean rebuilding thousands of dashboards and custom alerts from scratch, not even to speak about losing the value of Datadog’s proprietary AI Agents. LLM Company XY’s problem could have been that during large-scale training runs, like for a new LLM version, the sheer volume of telemetry created a huge bill, while with Chronosphere’s solution they could only keep the error metrics. Datadog’s pricing model, which charges per-metric-ingested, made this level of cost-cutting much harder to achieve (more on that at the end). So why doesn’t OpenAI and other AI natives just leave Datadog for the same reason? Or why would new potential customers choose Datadog over Chronosphere?
It’s simple: If you have or want to hire a team of engineers to focus them on getting insights from your open source telemetry, and you are ok not getting the quality of observability and the insights that Datadog can offer, then you can use OTel (open source telemetry gathering, but, critically, not interpretation of this data), and basically do observability all by yourself.
But wait a minute, why would you then even need Chronosphere at all? Again, simple: Chronosphere has technology that allows their customers to drop the vast majority of their data volume before it needs to be stored, so you aren’t paying to store “garbage” data you’ll never look at.
So, the bottom line is that this situation is actually pretty much standard in the software industry. Do you want to go full DIY or something in between (that is for example Chronosphere), where you still need to do the major lift of extracting intelligence from your observability data, or do you want to focus on getting ahead in this AI race and pay someone like Datadog so your own engineers can focus on getting you to your actual goal?
OK, there is one last piece to this puzzle, that irked me: Does Datadog have somewhat of an innovator’s dilemma? What Chronosphere offers is great, why wouldn’t you want to throw away useless data and save money on storage? But if Datadog would offer this service then they would hurt their own revenues. And if they don’t, they will be at risk that someone else will come and offer it, together with tools that extract intelligence from the remaining data.
Well, it turns out that, Datadog launched their own “Observability Pipelines” to compete directly with Chronosphere’s Control Plane. It’s an at-the-edge worker that sits in your cluster. It allows you to filter, sample, and redact data before it hits Datadog’s cloud. While this technically reduces their ingestion revenue, Datadog charges a separate fee for the pipeline itself. They are essentially saying: “We’d rather you pay us a smaller, predictable fee to manage your data reduction than have you pay Chronosphere to do it.” Even if a customer uses Datdog’s observability pipeline to drop 80% of their raw metrics, they still need Datadog’s Bits AI SRE Agent to investigate the remaining 20% and they are betting that if they give you the tools to save money on storage, you’ll spend those savings on their more expensive AI Agent tools.
I hope this analysis helps to put a bit more color into how Datadog operates!
-Ben