Lago Blog | When building AI chat is actually hard (how and why we built our agents)

We shipped our first AI features in late 2025, long after many other companies who built RAG chatbots starting in early 2023. That sounds late, but I don't think it is. It's a product of how our category constrains us, how we think about prioritizing features and how what looks easy is often hard (and vice-versa).

Let's start with the first part:

What we built and why

I'm super thoughtful about what we built. I've seen enough features rushed into the market to jump on a trend (remember NFTs?) and decided we wouldn't fall into this trap.

Just for context, we split our AI features into three distinct assistants, two of which are currently live.

The billing assistant (available now) automates repetitive workflows. For instance, users can say “Give customers in Canada a 10% discount for the next 3 months” instead of manually creating the coupon and its expiry, filtering by customers and applying to each.
The finance assistant (in early access) is a text-to-SQL agent that generates custom reports from Lago's data (e.g. “compare revenue growth in EU vs. US over the past 6 months”).
The pricing assistant (on the roadmap) will advise on which pricing/monetization strategies to build with Lago in the first place.

Why we built these assistants

Many in-product chatbots aren't very useful. They do the same thing as GPT/Claude with web search (read documentation and answer the question), but without the UX.

This is why I asked myself a question more product leaders should ask: Do we need to be building this? Or can users get the same value elsewhere?

This is especially important because you don't just build once and let it run. Factoring in the opportunity cost, ongoing maintenance and token costs can make building the wrong feature much more expensive than wasted engineering time.

But Lago has proprietary data: Usage events, revenue, customers, entities… the kind of stuff you'd be very worried if ChatGPT found with web search. We realized our customers couldn't automate billing workflows with AI if we didn't build it.

But for this to become useful, we wanted to build true agents, not just a chatbot that gets you hopefully-correct data you could've found in two clicks.

This made a big difference because building an agent that operates APIs is harder than building a chatbot, especially in billing. It requires permission systems, confirmation flows, audit logs and safeguards.

Why we built AI agents so “late”

Our product is the financial backbone of our customers. It directly touches accounting, compliance and security, which means we can't operate the way many startups do: only fix edge cases once someone complains, ignore known bugs if they don't happen often enough.

We need to get things just right, not directionally correct. This extends to our AI agents. You don't want a billing agent accidentally refunding your biggest customer.

This is why we waited:

We wanted to ensure we could add unique value, which required real agentic capabilities.
We needed to get things just right and needed to spend more time both conceptualizing and building.

So let's dive in why it was so hard:

Under the Hood: How the Billing Assistant Works

When people say AI chat is easy to build, they're talking about systems like this: A generic document chatbot.

It extracts data relevant to the prompt and uses it to enrich the response. But this isn't what we built.

The system behind our billing assistant is a three-layer stack:

A user's question hits the Lago API (a Rails backend)
which fires an async job via Sidekiq that initializes an MCP client and a Mistral agent.
Mistral interprets the user's intent and decides which tool(s) to call from 52 options. There are tools related to invoices, customers, subscriptions, plans, events, metrics, coupons, payments, credit notes, usage and logs.

This is where (as you can tell) things got challenging. We had to:

Construct the Rails backend and Mistral integration (find it here)
Write (and test) a good system prompt for the Mistral model.
Build an MCP server that routes requests to the correct tools (GitHub)
Code 52 tools that operate Lago's APIs and webhooks (GitHub)

And these are just the obvious things! Add to this the fact that everything is higher-stakes in billing.

For example, we needed to ensure the agents respected RBAC (role-based access control). You don't want your finance intern with "view only" access handing out discounts via chat. So our AI agents need to check permissions before every action. We also needed to ensure customers who used multiple entities got the correct results. There were a variety of these “little” things we had to get right.

But the biggest one was AI's biggest issue:

Guarding against hallucination

In most products, an AI hallucination is an inconvenience. In billing, it's a financial incident that harms trust and loses money. An agent that can void invoices, retry payments and apply discounts needs to get it right.

That's why we treat hallucination prevention in layers.

First, our Mistral agent operates under a detailed system prompt that constrains it to only use tools we've explicitly defined. It can't try, adapt and retry the API an OpenClaw instance might. This means it'll require slightly more user hand-holding, but also minimizes catastrophic results.

Second, we built guardrails before any consequential operation (create, update, delete, void, retry, refresh). In this, the agent must show the user a preview of exactly what it's about to do and wait for an explicit "yes." There's no “always allow” and you can't turn this off.

We've also intentionally not built some tools. Organization management, API key settings, webhook setup and similar things can only be done manually.

The prompt also went through many, many iterations. We ran thousands of queries to stress-test it and find gaps. Some versions were too permissive (allowed things they shouldn't) or too long (ran out of context too quickly).

One of our biggest learnings from this is that even though AI is extremely powerful, you might not want to enable users to do whatever they want to do.

Why a single assistant would've been dangerous

One of the weirder decisions we made was building 3 AI-powered chat interfaces, not one. We made this choice for a few reasons:

Lago is cross-functional. It's used by engineers, product/growth, finance and ops people. Depending on who uses it, the output they want is different. Finance wants data, or to find a specific invoice. Product/growth cares more about shipping their new pricing experiment quickly.

This is why selecting the right assistant already influences the outputs.

Imagine a product leader asks "what if we raised all prices by 20%?" If they're talking to the pricing assistant, they get strategic advice. If they're talking to a general-purpose agent with billing access, it might interpret this as an instruction to raise prices and execute the change.

By separating assistants, we create guardrails. The billing assistant can execute actions but only within billing. The finance assistant can query but can't modify. The pricing assistant only advises.

What went wrong

It's easy to talk about what went well and how we solved things, but a lot went wrong in the process. Let's explain a few here.

The initial motivation wasn't a customer request, so scoping was hard. Billing is infrastructure that's often bought with a checklist. Customers evaluate billing systems in large part based on the exact features you support. That means many features we build directly come from customers/prospects which means we already know the spec.

Because we didn't start from a specific workflow, the scope kept expanding. We began with a handful of invoice tools, then kept adding. Today we have 52 tools. That's a lot.

Every tool you add makes the agent harder to control, requiring more precise instructions.

Looking back, I'd start working with customers and see what workflows take the longest time, build tools for those and soft-launch it to those design partners.

Prompt engineering was its own project. Since our team is very engineer-driven, we expected the technical part to be the most difficult. But writing the system prompt was hard. Early versions had security gaps where the agent would sometimes execute actions without waiting for confirmation. Other versions were so detailed they burned through the context window before the conversation got anywhere useful.

None of these mistakes were fatal. But they cost us time and focus, and I think being honest about them is more useful than pretending the process was smooth.

Usage Metering

Billing & Invoicing

Entitlements

Cash Collection

Revenue Analytics

Lago Embedded

Lago AI ✨

Integrations

AI

Enterprise

Finance

IoT & Telco

Engineering

Finance

Operations

Product

Hybrid Plans

Usage-based

Enterprise Plans

Multi-products

Self-hosted

API Reference

Changelog

Documentation

GitHub

About us

Hiring

Blog

Playbook

Security

When building AI chat is actually hard (how and why we built our agents)

What we built and why

Why we built these assistants

Why we built AI agents so “late”

Under the Hood: How the Billing Assistant Works

Guarding against hallucination

Why a single assistant would've been dangerous

What went wrong

More from the blog

When building AI chat is actually hard (how and why we built our agents)

Self-Hosted vs. Cloud Billing: Data Sovereignty for Regulated Industries

Why we still build with Ruby in 2026

Why building a self-hosted SaaS is a headache (and how we make it easier)

Lago solves complex billing.