- Python:
lago-agent-sdkon PyPI. - JavaScript / TypeScript:
@getlago/agent-sdkon npm.
How it works
- Wraps your existing LLM client in place. Your application code does not change.
- Extracts usage from each response into a normalized shape (
CanonicalUsage). - Buffers events in memory and flushes them in batches to Lago’s
/events/batchendpoint. - Survives provider and Lago outages with exponential backoff and a bounded buffer.
- p99 wrap overhead under 5 ms. Your LLM call is never blocked on Lago.
- Never breaks your LLM call. Instrumentation errors are caught, logged, and optionally forwarded to your observability stack.
Supported providers
| Provider | Access |
|---|---|
| AWS Bedrock | Converse (sync + stream) |
| AWS Bedrock | InvokeModel (sync + stream), 7 model families |
| Mistral | native SDK (chat.complete + chat.stream) |
| OpenAI | native SDK |
| Anthropic | native SDK |
| Google Gemini | native SDK |
Quickstart
Initialize and wrap your LLM client
Pass your Lago API key and a default
external_subscription_id, then wrap your provider client. The returned object is a drop-in replacement.Make LLM calls normally
The wrapped client preserves the original signature, return shape, and exceptions. No call-site changes required.
Flush events on shutdown
Events flush automatically in the background. Call
flush() explicitly at process exit (FastAPI shutdown hook, Express server close, AWS Lambda extension, etc.) so in-flight events are not lost.Register billable metrics in Lago
Before events count toward charges, register matching billable metrics in your Lago tenant. The SDK ships with default metric codes (see Captured token dimensions below). Register each one as a
sum_agg metric.Follow Create a billable metric to set them up, then attach charges to them in your plan. For a full example, see the per-token pricing template.Mistral quickstart
The samewrap() call works with the native Mistral SDK:
Bill in tokens or in dollars
The SDK can bill two ways. By default it sends token counts, and you turn those into money with your Lago plans. Switch to price mode and the SDK sends the dollar cost of each call instead. It looks up the price of the model, multiplies by the tokens used, applies your markup, and emits one cost event. Where prices come from. The SDK reads public price lists that Lago maintains. OpenRouter for native OpenAI, Anthropic, Mistral, and Gemini clients. The AWS Bedrock public price list for Bedrock. No API keys, no price file for you to maintain. Prices refresh in the background about once an hour, so your LLM call is never slowed down waiting on a price. Add your margin. Set amarkup to resell LLM access at a profit. 1.2 means the customer pays your cost plus 20%. Your cost, plus your margin, is what gets billed.
extra_lago={"mode": "price", "markup": 1.5}, TypeScript lago: { mode: "price", markup: 1.5 } (for Bedrock, attach it as the command’s __lago).
Lago setup. In price mode the SDK emits one event per call with the metric code
llm_cost. Register a sum billable metric named llm_cost and attach a dynamic charge to it. Lago adds up the per-call cost into a single fee. The event also carries a full breakdown in its properties: the USD value, the cost before markup, the markup applied, and the price source.Captured token dimensions
The SDK normalizes every provider response into a 10-fieldCanonicalUsage object and emits one event per non-zero field. The default metric codes match Lago’s conventions. Override them in the config if your tenant already uses different names.
| Canonical field | Default Lago metric code | What it represents |
|---|---|---|
input | llm_input_tokens | Prompt tokens sent to the model |
output | llm_output_tokens | Completion tokens generated |
cache_read | llm_cached_input_tokens | Prompt tokens served from cache |
cache_write | llm_cache_creation_tokens | Prompt tokens written to cache |
cache_write_5m | llm_cache_write_5m_tokens | Cache write with 5-minute TTL |
cache_write_1h | llm_cache_write_1h_tokens | Cache write with 1-hour TTL |
reasoning | llm_reasoning_tokens | Reasoning / thinking tokens (when surfaced) |
tool_calls | llm_tool_calls | Number of tool / function invocations |
image_input | llm_image_input_tokens | Image input tokens |
audio_input | llm_audio_input_tokens | Audio input tokens |
Provider coverage
Which fields each adapter populates:| Field | Bedrock | Mistral native |
|---|---|---|
input | ✓ | ✓ |
output | ✓ | ✓ |
cache_read | ✓ (Anthropic on Bedrock) | ✓ (when cache hits) |
cache_write | ✓ (Anthropic on Bedrock) | ✗ |
cache_write_5m / cache_write_1h | ✓ (Anthropic InvokeModel) | ✗ |
reasoning | folded into output | folded into output |
tool_calls | ✓ | ✓ |
image_input / audio_input | ✗ | ✗ |
Reasoning, image, and audio fields are populated by the native OpenAI, Anthropic, and Gemini adapters.
Multi-tenant: pick a subscription per call
Lago needs to know which customer to bill for each LLM call. The SDK resolves theexternal_subscription_id in this priority order:
- Per-call override: highest precedence, attached to the individual request.
- Context-bound: set once per request handler. Propagates safely across async boundaries.
- Default at init: fallback if nothing else is set.
Configuration reference
Both SDKs expose the same configuration surface with idiomatic naming.Python (LagoConfig) | TypeScript (LagoConfig) | Default | Purpose |
|---|---|---|---|
api_key | apiKey | (required) | Your Lago API key. |
api_url | apiUrl | https://api.getlago.com/api/v1 | Override for the EU region or self-hosted instances. |
default_subscription_id | defaultSubscriptionId | None / null | Fallback external_subscription_id when none is set per call or context. |
metric_codes | metricCodes | DEFAULT_METRIC_CODES | Map canonical field to your billable metric code. |
pricing_mode | pricingMode | "tokens" | "tokens" sends token counts. "price" sends the computed dollar cost. |
markup | markup | 1.0 | Cost multiplier applied in price mode. 1.2 adds a 20% margin. |
cost_metric_code | costMetricCode | llm_cost | Billable metric code for the per-call cost event in price mode. |
pricing_ttl_seconds | pricingTtlMs | 3600 / 3_600_000 | How long fetched prices are cached before a background refresh. |
bedrock_default_region | bedrockDefaultRegion | None / null | Region used to look up Bedrock prices when the call does not carry one. |
flush_interval_seconds | flushIntervalMs | 1.0 / 1000 | How often the background worker flushes the buffer. |
max_batch_size | maxBatchSize | 100 | Max events per request to /events/batch. |
max_buffer_size | maxBufferSize | 10_000 | In-memory cap. Oldest events drop with a warning when exceeded. |
request_timeout_seconds | requestTimeoutMs | 10.0 / 10_000 | HTTP timeout per batch request. |
max_retry_seconds | maxRetryMs | 60.0 / 60_000 | Upper bound on exponential backoff between retries. |
on_error | onError | None / undefined | Callback for instrumentation failures. Wire it to Sentry, Datadog, etc. |
Custom metric codes
If your Lago tenant already uses different metric codes, override them at init time:Error handling
The SDK never breaks your LLM call. If instrumentation fails (adapter bug, Lago unreachable, network error), the SDK catches the error, logs a warning, and your call returns normally.
on_error / onError hook:
Exception hierarchy
Both SDKs export the same error classes for callers that want to handle SDK errors explicitly:LagoSDKError: base class for every SDK-raised error.LagoApiError: non-2xx response from Lago.LagoConfigError: invalid configuration at init time.UnknownClientError:wrap()was called on a client the SDK does not recognize.
Verify the integration
Make a wrapped LLM call
Run one end-to-end request through the wrapped client, then call
flush() explicitly to push the event immediately.Check the event in Lago
In the Lago dashboard, open Developers → Events and confirm an event appears with the expected metric
code and properties.Confirm usage on the customer
Open the customer’s usage view and confirm the metric counter increased.