Going Agent Experience First: What We Built, What We Broke, What We Learned
Table of Contents
For thirty years, “user” meant a human being. Every design principle, every interface, every experience framework assumed a person sitting at a screen.
That’s over.
In the last year, AI agents went from a curiosity to an operating reality. We first heard the term Agent Experience (AX) from Netlify’s Matt Biilmann. Salesforce declared a new era of experience design. BVP made AX their number-one AI developer law. HubSpot’s Dharmesh Shah recently wrote about “Agentic User Experience” and argued that wrapping your API in MCP isn’t enough, that agents need thoughtfully designed workflows, not just exposed endpoints. The first AX job postings showed up.
They’re all right about the direction. But the conversation so far has been dominated by one question: can agents access your product?
We’ve been asking a different question. And it took us somewhere we didn’t expect.
The Decision That Changed How We Build
A few weeks ago, we made a decision at Monte Carlo that felt uncomfortable: for every new product capability, we would build for AI agents first and humans second.
Not “also for agents.” First.
The reasoning started with something in our own telemetry we couldn’t ignore. Roughly 25 customer accounts and 130 individual users were connecting to Monte Carlo through AI coding agents (Claude Code, Cursor, others). No marketing. No documentation. No promotion. They found us on their own.
And the usage pattern was consistent across all of them. The number-one action agents performed was checking whether a specific data table was healthy before doing anything else.
Agents instinctively need a trust layer. They were already treating Monte Carlo as one.
So we made it official. New development sequence: MCP tools and skills first. Validate with customers already working through agents. Build the human user interface last.
What Agent First Actually Looks Like
I want to be clear: agent-first doesn’t mean killing the dashboard. It means changing the order you build things in. That order turns out to matter more than we expected.
Start with the obvious. When you build for agents first, you build for how people actually work right now. According to IBM, 99% of enterprise developers are exploring or already building AI agents. Salesforce projects a billion agents in service by the end of 2026. Gartner estimates 40% of enterprise applications will integrate task-specific agents by the end of this year. Our customers aren’t waiting for us to build an agent strategy. They’re already working through agents.
But there’s a less obvious benefit. Building agent-first enforces architectural discipline you wouldn’t get otherwise. Capabilities built for agents have clean interfaces, structured inputs and outputs, and no assumptions about a human in the loop. When you later wrap those in a UI, you’re building on solid foundations. Go the other direction, build the screen first, and you end up with capabilities tightly coupled to flows that agents can’t use. Then you rebuild from scratch.
“We used to design the screen first and figure out the API later. Now we design the MCP tool first, and the screen becomes a thin layer on top. It sounds like a small change, but it completely reordered how we think about product.” — Santiago Aguiar, Engineering, Monte Carlo
And here’s the thing that surprised us most: building agent-first forces you to discover your actual value. Strip away the dashboard, the navigation, the visual interface, and ask “what does an agent actually need from us?” The answer is clarifying. Sometimes uncomfortably so.
What Agents Actually Need
Here’s the question that kept us up at night: in a world where agents can connect directly to Snowflake, dbt, and Airflow via their own MCP servers, what unique value does Monte Carlo provide? What stops customers from bypassing us entirely?
We expected the answer to be about data quality metrics. It wasn’t. Or rather, that was only part of it.
The real answer was institutional memory.
Monte Carlo stores something no other system in the data stack has: the history of how the data behaved, and what went wrong. Not just the current state of a table, but the full record of past trends, incidents, resolution patterns, triage history, ownership context, and blast radius analysis. Raw infrastructure tools are stateless. They show you what’s true right now. Monte Carlo stores what was true yesterday, what broke last Tuesday, who fixed it, and how.
For an agent trying to debug a data issue that’s happened before, that history is the most valuable context it can get. And it compounds over time. The longer a customer uses Monte Carlo, the richer the institutional memory agents can draw on.
“Only 1 in 10 companies have successfully scaled their AI agents, not because of model shortcomings, but because they lack data architectures that deliver trusted context.” — MIT Technology Review, 2026
The second thing we learned was about cross-stack correlation. Individual infrastructure MCPs have a narrow view of their own system. Snowflake knows about queries. dbt knows about models. Airflow knows about DAG runs. None of them can answer the question agents actually ask: “Why is this table stale?”
Monte Carlo can. Because we sit across the entire stack, we can trace the answer: this table is stale because an Airflow DAG failed, which was caused by a dbt model change, which impacted four downstream dashboards. No single-tool MCP can reason across that chain.
“I was using Claude Code to investigate why a dashboard was showing stale numbers. I had Snowflake and dbt MCPs connected, but they each only saw their own piece. When I connected Monte Carlo, the agent traced the whole chain in one step. The broken DAG, the impacted model, the downstream tables. It was the difference between seeing a symptom and understanding the disease.”
— Senior Data Engineer, Enterprise Retail Customer
Third: trust as a pre-flight check. Agents are starting to make changes to data pipelines. Not just reading, but writing. Someone needs to validate whether those changes are safe before they execute. The agent asks: “If I modify this schema, what breaks downstream?” Only a system with full lineage and monitoring coverage can answer that.
The Gap in the AX Conversation
Here’s what struck me about the current AX conversation. It’s almost entirely about access: clean APIs, MCP servers, authentication flows, tool schemas, documentation. Even the most thoughtful frameworks (like Dharmesh Shah’s argument that agents need higher-level workflow abstractions, not just raw API endpoints) are still focused on how agents interact with your product. That work is essential. Biilmann’s framework of Access, Context, Tools, and Orchestration is the right foundation.
But it assumes something every data leader knows isn’t true: that the data on the other side of that API is reliable.
The research backs this up. Informatica’s CDO Insights 2026 survey found that half of data leaders cite data quality and retrieval as their top challenge for agentic AI, even as adoption accelerates. Precisely’s annual study revealed that 88% of leaders say they’re confident in their data readiness, yet 43% simultaneously call it their biggest barrier. That’s not a contradiction. That’s a blind spot. And it’s dangerous when agents act autonomously.
The BARC Data, BI and Analytics Trend Monitor 2026 was blunt: data quality management reclaimed the number-one position among all respondents. For AI and AI agents, high data quality is critical to avoid hallucinations, bias, and faulty recommendations.
McKinsey reinforced the point in their recent agentic AI piece: organizations need to move from periodic data cleanups to continuous, real-time data quality monitoring. Quarterly data audits don’t work when agents are consuming data every second.
The AX movement has given us the plumbing. What’s missing is a way to verify what flows through the pipes.
When a human user hits bad data, it’s frustrating. You see an error, you submit a support ticket, someone fixes it. The failure is visible.
When an agent hits bad data, nobody knows. The agent doesn’t complain about stale data. It doesn’t bounce. It doesn’t leave a 1-star review. It reasons over whatever it’s given with full confidence. Bad data in, confidently wrong answer out. At scale. By the time a human notices, the damage has compounded: wrong decisions informed by wrong analyses built on wrong data.
Trust Is the Missing Layer
Biilmann defined four pillars of Agent Experience: Access, Context, Tools, and Orchestration. I’d add a fifth: Trust.
| AX Pillar | Question It Answers | Who’s Building For It | Status |
|---|---|---|---|
| Access | Can the agent reach your product? | Netlify, Salesforce, MongoDB | Table stakes |
| Context | Does the agent understand your product? | dbt (skills), Snowflake (docs) | Emerging |
| Tools | Can the agent take action at the right level of abstraction? | MCP ecosystem, LangChain | Moving fast |
| Orchestration | Can agents coordinate complex workflows? | Airflow, LangGraph, CrewAI | Early |
| Trust | Can the agent verify the data is reliable? | This is the gap. | ⚠️ Missing |
Trust sits underneath all the other layers. An agent with perfect API access to a warehouse full of stale, duplicated, or schema-broken data will produce wrong answers with perfect confidence. You can optimize every other layer and still fail at the one that matters most.
Think about it this way: it doesn’t matter how elegant your agent workflows are, how cleanly you’ve collapsed five API calls into one composable action, if the data underneath is broken. The interface layer and the trust layer are different problems. The industry is focused on the first. Almost nobody is building for the second.
What does Trust require in practice? The things we’ve spent five years building at Monte Carlo: freshness monitoring so agents know data is current. Volume anomaly detection so agents know data is complete. Schema change tracking so agents know the structure hasn’t shifted beneath them. Lineage so agents can trace data from source to consumption. Incident history so agents can learn from what’s broken before.
The Question Every Company Will Face
Going agent-first forced us to ask something I think every infrastructure company will eventually confront: in a world where agents can access everything, query anything, and reason across any API, what is your product actually for?
If your product’s primary value is aggregating and visualizing data, agents don’t need the aggregation (they can do it) and they don’t need the visualization. If your value is executing queries, agents can call the warehouse directly. If your value is building pipelines, agents can write the code themselves.
The companies that win in an agent-first world are the ones whose value lies in knowledge that agents can’t reconstruct from raw APIs. Institutional memory. Cross-system correlation. Historical context. Trust verification. Things that take time to accumulate and can’t be replicated by a smart prompt.
“Before we connected Monte Carlo’s MCP server, our agents kept hitting the same data issues we’d already resolved months ago. They had no memory of past incidents. Once we gave them Monte Carlo, it was like giving them the institutional knowledge of our entire data team.” — Data Platform Lead, Global Travel Company*
None of these insights would have surfaced if we’d kept building UI-first. The agent-first path forced the question, and the question revealed the answer.
Where This Goes
The AX movement is right about the big picture. Agents are users. Their experience matters. Designing for agents is becoming as important as designing for humans.
But the conversation needs to go deeper than access and tooling. The hardest problem in agent experience isn’t getting agents into your product. It’s making sure what they find there is worth trusting.
We spent 30 years making software trustworthy for humans. Buttons that do what they say. Error states that explain what went wrong. Consistent interfaces that build confidence.
Now we need the same for agents. Not better APIs. Not more tools. Trust in the data itself.
Some companies will redesign their APIs. Others will redesign their data contracts. The ones that win will do both.
We’re early in this, and we’re going to get things wrong. I’ll share what we learn as it happens.
Our promise: we will show you the product.