How Pilot Flying J Scales Production AI with Monte Carlo
Table of Contents
Most people wouldn’t picture a travel center as a bastion of AI innovation.
But for Pilot Flying J, which operates hundreds of truck stops and travel centers across North America, AI has become a critical resource for efficient, real-time operations—from determining fuel routes to delivering on in-store experiences.
Over the last five years, Pilot has been systematically embedding machine learning across its organization. However, that innovation journey accelerated when the company shifted its focus away from traditional machine learning to embrace production-grade AI systems.
The driving force?
Reliable data and AI systems built on the foundation of data + AI observability.
Table of Contents
Running toward AI—without losing trust
Fast, timely operations are at the core of Pilot’s business model, and that doesn’t happen without plenty of reliable data to support it: mileage, connections, products, quality, logistics, and more.
In the early days, Pilot teams would access their business-critical data through a series of slow, manual processes. But where internal stakeholders saw bottlenecks, Machine Learning Senior Manager Travis Lawrence saw an opportunity to leverage new AI capabilities.
But, instead of layering AI onto existing processes to deliver generalized efficiency, Lawrence took a more deliberate approach: treat AI as part of the software stack and use it to solve specific known problems that deliver meaningful (and measurable) value.
For his team, that meant putting AI to work for three critical use cases: democratizing access to internal datasets, unlocking unstructured data, and automating inefficient workflows – with reliability at the core.
Building a text-to-SQL chatbot with AI-ready data
Lawrence’s team started by building a chatbot powered by text-to-SQL that would help enable the sales team to self-serve insights in real-time. Using natural language, teams can request data like fuel pricing and deal information that would typically require a dashboard or cross-team coordination.
Questions like, “How many Arby’s are there along I-40 at Pilots?” can be answered instantly, directly from the data lake.
But a chatbot without reliable data is just a faster way to get the wrong insights.
With data observability at the foundation, Lawrence’s system is designed with reliability from the start. It’s scoped to only the datasets required for the task, grounded with schema context and examples, evaluated for both syntactic and semantic correctness, and given multiple retries to self-correct.
Just as importantly, the team closely monitors tool usage; i.e. whether agents are calling the right tools in the right way. “Tool usage is our primary target for reliability,” Lawrence noted, “because incorrect actions are far more disruptive than incorrect answers.”
The result? A system that performs reliably in production, achieving a 95% correct SQL rate with 100% syntactically valid queries—and making it one of Pilot’s most impactful AI applications to date.
Turning unstructured data into better customer experiences
While Pilot had strong foundations for structured data, a growing corpus of actionable insights were still hidden within unstructured formats—training manuals, process documentation, contracts, and often blurry or incomplete photos of receipts and timesheets. And discovering the right information at the right time was proving to be a serious bottleneck for the Pilot team.
“Trying to find that information can be difficult, especially when you’re on a phone with a customer,” Lawrence said.
By applying semantic modeling across their RAG system and measuring retrieval quality using classic recall@K metrics, Pilot was able to unlock fast, conversational access to unstructured data sources to help frontline teams answer questions faster and deliver improved experiences for their customers.
Automating documentation workflows end-to-end
Pilot is also using AI to automate document-heavy workflows like customer sign-ups and intelligent document processing. What once took weeks can now be completed in less than a day.
Beyond extracting information, the team is carefully expanding into safely updating systems of record with new information gathered during conversations and document processing—bringing AI closer to the core of operational workflows while maintaining appropriate safeguards.
How input/output observability speeds adoption
Even as these initiatives were delivering clear efficiency gains, buy-in across the organization wasn’t automatic—especially as use cases moved closer to revenue-critical datasets.
What changed the conversation? Observability.
By extending the same data + AI observability principles Pilot already relied on for structured data into their AI systems, the team could make AI and agent behavior visible end to end: what tools agents use, what data the model pulls, and how it behaves in production were all available in the same pane that observed the inputs that powered it.
This shared visibility across inputs, operating models, and outputs proved essential for building trust with engineering, security, and business teams, by allowing Pilot to show—not just tell—that AI systems were operating as intended.
And Lawrence has brought the business even closer to AI reliability by opening up direct feedback loops as well. Simple feedback mechanisms, like thumbs up or thumbs down, now give the business a direct voice in shaping agent behavior. Those signals feed into prompt improvements via few-shot examples that are tested in QA and deployed quickly when results improve.
“The business started to trust them more because they can actually see it themselves,” Lawrence explained.
Many agent actions still require human approval, particularly when updating structured systems. Pilot also benchmarks AI performance against historical human error rates—typically 80–85% for manual data entry. If an agent performs better than humans, the team knows it’s improving outcomes rather than introducing new risk.
More reliable agents, more available resources
Because Lawrence’s team now works closely with the business, impact can be measured in outcomes, not just output: time saved, faster deal cycles, and improved customer experiences all demonstrate the material value Lawrence and team have delivered to the organization.
Sales reps stay engaged with customers instead of leaving conversations to search for data. Automated document workflows dramatically reduce onboarding time. And data is easier to access and operationalize.
As Lawrence puts it, success comes down to “time saved [and] additional deals closed.”
An agentic future built on Agent Observability
Looking ahead, Pilot is focused on continuing to democratize AI safely. The team is building a secure, self-service agent-building platform and a governed catalog of authenticated tools so that teams can eventually deploy simplified workflows all on their own.
As adoption continues to expand, all production AI will be built on a foundation of Agent Observability to monitor prompts, context, completions, errors, metrics, and more.
For Travis Lawrence and the Pilot data + AI team, this is just the beginning of a long and exciting AI journey ahead. Buckle up.
Learn more about how teams are using data + AI observability to build reliable agentic systems here.
Our promise: we will show you the product.