The demo was perfect. The agent routed the invoice, updated SAP, replied to the customer, and closed the case in under a minute. Three weeks into production, the same agent was sending polite non-answers to half the inbox, escalating the rest, and quietly making the operations team slower than before the project started.

That gap between the demo and the deployment is the single most consistent pattern in enterprise agentic AI today. And in June 2025, Gartner gave it a number.

The Gartner 40% prediction, in plain language

Gartner predicts that over 40% of agentic AI projects will be canceled by the end of 2027. The reasons cited by Senior Director Analyst Anushree Verma are rising costs, unclear business value, and inadequate risk controls. The same research introduced the term "agent washing" to describe vendors who relabel chatbots, RPA scripts, and single-step assistants as autonomous agents. Gartner estimates that of the thousands of self-described agentic AI vendors in the market, roughly 130 actually meet the definition.

The headline is the 40% number. The more useful story is why the 40% fails and what the surviving 60% are doing differently. Across Tekst's own deployments and the published research from Neo4j, Foundation Capital, and MIT NANDA, the pattern converges on one missing layer. Agentic AI projects that cross the chasm from pilot to production share a context graph. The ones that stall almost never do.

This is the case for treating the context graph as a precondition, not a feature, of the agentic process automation category.

Why AI agents fail enterprise deployments, and why it is not the model

The instinct when an agent underperforms is to upgrade the model. Swap GPT for Claude, or Claude for something fine-tuned. It almost never fixes the problem, because the problem is not reasoning capacity. It is reasoning grounded in nothing.

Integration is not the real problem. Context is.

The first wave of enterprise AI failures blamed integration. Teams assumed that if the agent could read SAP, Salesforce, Outlook, and the ticketing system, it would work. So they built the pipes. The agent could now technically see everything. It still could not act intelligently, because reading a field is not the same as understanding what that field means inside a process. A "hold" status in SAP means something different when it comes from finance than when it comes from logistics, and something different again when it comes from a regulatory review. Raw access does not teach that. Integration-first thinking does not fix this. Context does.

Rules are not enough. Agents need decision traces.

Foundation Capital's December 2025 analysis made the point sharply. The enterprises getting agentic AI into production are not writing more rules. They are capturing decision traces: a structured record of which decision was made, by whom, under what inputs, with which exceptions, and what the outcome was. Rules describe what should happen. Decision traces describe what actually happened, and why. An agent grounded in decision traces can handle the messy middle that rules cannot anticipate.

Generic AI has no operational memory of how work actually gets done.

MIT NANDA's State of AI in Business 2025 report found that 95% of enterprise GenAI pilots deliver zero return on investment. The core barrier, the researchers concluded, was not model quality. It was learning. Most GenAI systems do not retain feedback, adapt to context, or improve over time. That gap closes when the agent has access to a persistent, structured memory of how the business operates. Neo4j's research team calls this the difference between the Event Clock, which records what happened in sequence, and the State Clock, which records what the business currently looks like. An agent without both clocks is guessing. An agent with both is reasoning.

What a context graph for AI agents actually does

A context graph is a structured, continuously updated representation of how a business operates. It connects systems, processes, decisions, exceptions, and outcomes into one coherent model. When an agent is grounded in a context graph, every action it takes is traceable back to real operational precedent.

In Tekst's architecture, the context graph is the Unified Process Model that sits beneath every automation. It is what the Universal Tracing layer writes to, and what enterprise process intelligence reads from when it maps how work actually flows. The agent does not query a static knowledge base. It queries a living model of the operation, updated every time a case moves through the system. That is what a context graph for AI agents contributes that no model upgrade can replicate.

The four failure modes of agentic AI without a context graph

Across the canceled agentic AI projects in the publicly documented cases and the patterns we see inside Tekst deployments, four failure modes keep showing up. All four trace back to the same missing layer.

Failure mode 1. The agent cannot see across systems.

The agent is technically connected to SAP, Salesforce, Outlook, and the document management system. It still treats each one as a silo. A customer email arrives asking about a delayed shipment. The agent reads the email, checks the CRM, finds the account, and replies with a generic status update. It never checks whether there is an open ERP exception, a logistics ticket, or a compliance hold on the same account. The result is a reply that is technically correct and operationally wrong. Process intelligence starts in the inbox precisely because this is where the cross-system signal is richest and the context graph gets its best operational data.

Failure mode 2. The agent cannot explain its own decisions.

Audit, risk, and compliance teams block promotion to production because the agent cannot produce a defensible record of why it chose action A over action B. Without decision traces stored in a context graph, the agent's reasoning is a black box. Even when the outcome is correct, the lack of auditability is a hard stop in regulated industries. This is one of the most common reasons a successful pilot never scales.

Failure mode 3. The agent breaks on edge cases and unstructured input.

Enterprise work is mostly edge cases. A PO number arrives in the subject line instead of the body. A customer replies in German to a ticket opened in English. An attachment contains a scanned invoice with handwritten annotations. The agent handles the clean 60% of cases and escalates the rest, which is exactly the 40% that was driving the operational cost. The business case collapses. Agents running on top of a context graph that has already mapped similar edge cases in shared inbox operations do not break. They reason by analogy to prior cases, because the prior cases are in the graph.

Failure mode 4. The agent cannot compound learning across runs.

Each conversation starts from zero. The agent never gets smarter because there is no structured memory for it to get smarter against. Six months after go-live, the agent is the same agent. The team is not. They have learned dozens of new patterns, exceptions, and shortcuts. The gap between human learning and agent stagnation widens until someone pulls the plug. A context graph closes that gap by capturing the team's decisions as they happen, so the agent has something to learn from on the next run.

What the surviving 60% will have in common

The enterprises that will still be running their agentic AI projects at the end of 2027 are not the ones with the biggest model budget. They are the ones that treated the context graph as foundational infrastructure before they deployed a single agent.

Becton Dickinson went live in three weeks, handling more than 100,000 topics across multiple languages, without the months of NLP training that other approaches demand. The team kept working the way they already did. Leadership got a PowerBI-connected view of email traffic, response times, and topics that had been invisible before. The project worked because the agent had a context graph underneath it, grounding every action in real operational precedent, real process logic, and real decision history.

The pattern is consistent. When the context graph is in place first, the agent scales. When it is not, the agent demos well, stalls in pilot, and shows up in the 40%.

Final thought: agents are only as good as the context they inherit

The next twenty months will separate the agentic AI projects that scale from the ones that get quietly shut down. The dividing line is not going to be the model, the prompt engineering, or the integration stack. It is going to be whether there is a context graph underneath the agent.

Tekst treats the context graph as the precondition, not the last step. Every deployment starts by mapping how work actually flows, writing that into a Unified Process Model, and only then activating the agents on top of it. That is how Tekst works in three steps, and it is why the projects keep running long after the demo week is over.

If you are evaluating agentic AI for enterprise operations, stop leading with "which model." Lead with "what is the context graph we are building this on." Get that right, and you are already out of the 40%.

Other blog you might like
Agentic Process Intelligence: The Missing Layer for Scalable Enterprise AI

Agentic Process Intelligence combines process intelligence and agentic AI to guide enterprise automation in real time. Learn how enterprises scale AI with control.

Wait, it’s all text?

From speech to text to insights: Learn how multilingual AI enables accurate classification of audio files with state-of-the-art tools like OpenAI's Whisper and more.

What Is Agentic Process Automation (APA)?

Learn what Agentic Process Automation is and how it turns unstructured inbox workflows into intelligent automated processes.

Get AI into your operations

Discover the impact of AI on your enterprise. We're here to help you get started.

Talk to our experts
Name Surname
Automation Engineer @ Tekst