Event Ingestion Architecture: Building Reliable Metering Pipelines

Event ingestion is the process of collecting, validating, and routing usage events from customer applications into a metering system that feeds billing calculations. Without a reliable event ingestion pipeline, SaaS companies lose revenue through missed events, duplicate charges, and failed reconciliations—with industry data showing that metering errors account for 3-7% of annual billing leakage in usage-based SaaS businesses [1]. This article covers the architecture patterns, reliability guarantees, and implementation strategies required to build production-grade metering pipelines that handle millions of events while maintaining accuracy for billing.

Why Does Event Ingestion Matter for Usage-Based Billing?

Usage-based billing depends entirely on accurate event capture. Unlike seat-based or subscription models, usage-based billing requires real-time accounting of customer activity—whether that's API calls, storage consumption, or compute minutes. A single dropped event creates a billing discrepancy. A duplicated event overcharges. A late-arriving event breaks reconciliation.

Research indicates that 68% of SaaS companies experienced at least one billing error in the past year due to data pipeline issues [2]. Event ingestion is where most of these errors originate. The pipeline must guarantee not just delivery, but correct ordering, exactly-once accounting semantics, and auditability for disputes.

The stakes are highest in high-volume environments. A company processing 1M+ events per second cannot afford manual reconciliation. The pipeline must be automated, observable, and self-healing.

How Should You Design an Event Schema for Billing?

Every event flowing through your pipeline must carry specific fields to ensure proper billing attribution and auditability.

{
  "event_id": "evt_1h7k9j2m0p5q",
  "timestamp": "2026-03-02T14:23:45.123Z",
  "customer_id": "cust_8b3k1x9y",
  "subscription_id": "sub_5j9n2k0m",
  "event_type": "api_call",
  "properties": {
    "endpoint": "/v1/documents",
    "method": "POST",
    "response_time_ms": 234,
    "tokens_used": 1500,
    "region": "us-east-1"
  },
  "idempotency_key": "customer-batch-20260302-00147"
}

The core fields serve specific purposes. The event_id is a unique identifier for deduplication—critical because the same event may arrive multiple times across retries. The timestamp captures when the event occurred at the source, not when it arrived at your system. This timestamp is used for billing period cutoffs and is non-negotiable for accuracy.

The customer_id or subscription_id links the event to a billing entity. event_type categorizes the usage (API calls, storage GB-hours, compute minutes). The properties object contains dimension data—which endpoint was called, which region, which product feature. Finally, the idempotency_key allows the downstream system to detect and discard duplicate events.

Schema validation must happen at ingestion time. Events with missing event_id, timestamp, or customer_id should be rejected with clear error responses, not silently dropped. Downstream billing systems depend on schema consistency.

Ingestion Patterns: HTTP API vs Message Queues

Three primary patterns dominate event ingestion architecture: HTTP API, message queues, and batch upload. Most production systems use combinations.

HTTP API (Synchronous)

Direct HTTP POST to an ingestion endpoint offers simplicity and immediate feedback. The HTTP 202 (Accepted) response indicates the request was received and queued for processing, not that it was durably stored. This pattern works well for moderate volumes (under 100K events/sec) and provides immediate feedback when events are malformed or customer authentication fails.

Drawbacks include retry complexity on the client side and potential connection timeouts during backpressure. Most HTTP API implementations pair with a message queue backend to decouple ingestion from processing.

Message Queues (Asynchronous)

Kafka, RabbitMQ, and cloud-native queues (AWS Kinesis, Google Cloud Pub/Sub) decouple event producers from processors. Applications publish events to a queue; processing services consume asynchronously.

Queues excel at high volumes and provide natural backpressure—slow downstream processors simply consume more slowly. Partitioning by customer_id ensures ordering within a customer's event stream, critical for accurate metering.

The tradeoff is complexity: you need distributed infrastructure, consumer group management, and offset tracking. Exactly-once semantics are harder to guarantee than with HTTP APIs.

Batch Upload

Some customers provide bulk event uploads via S3, SFTP, or CSV endpoint. This pattern works for low-frequency use cases and legacy integrations. Batch uploads require format parsing, validation, and transformation into canonical event schemas. They're useful for correcting historical metering gaps or backfilling data.

Reliability Guarantees: At-Least-Once vs Exactly-Once

Billing systems operate under strict semantics requirements. Two guarantees matter: at-least-once delivery and exactly-once accounting.

At-least-once delivery means every event reaches the target system at least one time—possibly multiple times. It requires retries on failure and durability guarantees. Exactly-once accounting means each unique event contributes to billing exactly once, even if it arrives multiple times in the pipeline [3].

These are different requirements. You can implement exactly-once accounting on top of at-least-once delivery using idempotency. This is the standard pattern in production billing systems.

Most modern queues (Kafka, Pulsar) guarantee at-least-once when consumers commit offsets after processing. Kafka offset commits are atomic with processing, preventing loss of events even if the consumer crashes.

Exactly-once is achieved through deduplication on the event_id and idempotency_key. The storage layer (database) must enforce uniqueness constraints. The dual unique constraints handle both scenarios: duplicate event_id (same event sent twice) and duplicate idempotency_key (same batch sent twice). If either constraint violation occurs, the database rejects the insert, preventing double-billing.

Idempotency and Deduplication Strategies

Idempotency is the property that repeating an operation produces the same result as executing it once. For billing, this means receiving the same event twice should not create two charges.

Event ID Deduplication

The simplest approach is per-event deduplication using a unique event_id. Every event gets a UUID or snowflake ID assigned by the client. The ingestion system maintains a set or database of seen event IDs within a retention window (typically 24-72 hours). Redis or similar fast cache systems work well for this. For long-term deduplication, query the events table with a unique index on event_id.

Idempotency Key Deduplication

An idempotency_key is a client-generated value representing a logical operation. Multiple retries of the same operation share the same idempotency_key. This is standard in financial APIs (Stripe, payment processors). The server must cache idempotency keys and their results for the retention window. This prevents double-processing even if the customer retries with different event IDs within the batch.

Event Validation and Enrichment Pipeline

Events from customers may be incomplete, malformed, or arrive out-of-order. The ingestion pipeline must validate, enrich, and standardize before they reach billing.

Validation Stage

Invalid events are logged with detailed error reasons, sent to a dead-letter queue, and reported to the customer. Never silently drop invalid events. Validation checks include required field presence (event_id, customer_id, timestamp), timestamp sanity checks (not too far in future or past), and customer existence verification.

Enrichment Stage

Enrichment adds metadata necessary for accurate aggregation downstream—customer currency, billing period, metrics dimensions. This prevents logic duplication in aggregation services. Common enrichments include server-generated timestamps, customer billing currency lookups, region/datacenter metadata, and event_type normalization.

Scaling Ingestion: Partitioning and Backpressure

As volumes scale, a single ingestion server becomes a bottleneck. Production systems partition data across multiple consumers.

Partitioning Strategy

Partitioning by customer_id ensures that all events from one customer are processed by a single consumer, maintaining order. This is essential for accurate metering—you can't meter API calls correctly if they arrive out of order.

Backpressure and Rate Limiting

When downstream systems lag, the ingestion pipeline must slow down gracefully. Backpressure signals prevent queues from growing unbounded and protect downstream billing systems from being overwhelmed. Critical lag thresholds trigger 503 Service Unavailable responses, while warning thresholds return 429 Too Many Requests with Retry-After headers.

Horizontal Scaling

Separating the HTTP API layer from the processing layer allows scaling each independently. The API can scale to handle connection volume; consumers scale to handle processing throughput.

Storage and Aggregation: Raw vs Pre-Aggregated

Where you store events and how you aggregate them affects both query performance and storage costs.

Raw Event Storage

Storing every event as-is preserves auditability and allows flexible aggregation later. Monthly or daily partitions keep queries fast by eliminating irrelevant data. ClickHouse, a columnar database, excels at this workload—compressing raw events to 1-10% of original size while maintaining query speed [4].

Raw storage requires more space and slower aggregation queries, but enables audit trails and dispute resolution. If a customer disputes a charge, you can retrieve the exact events that generated it.

Pre-Aggregated Storage

For high-volume metrics, pre-aggregation trades query flexibility for storage and compute efficiency. Pre-aggregation reduces storage 100-1000x and query latency to milliseconds. The tradeoff: you lose the original event context. This approach works only if you retain raw events for the dispute window (30-90 days) separately.

A hybrid approach is best: raw events in a time-series database for 90 days, with aggregated views for historical analysis.

Monitoring and Alerting for Metering Pipelines

Production pipelines require continuous observability. Key metrics to monitor include throughput and latency (events ingested, ingestion latency percentiles, queue lag), deduplication rate (duplicate events detected, which may indicate client retry storms), error rates and dead-letter queue accumulation (indicating systematic issues requiring human investigation), and consumer lag (growing lag means events are piling up, degrading billing accuracy).

Disaster Recovery: Event Replay and Audit Logs

Disasters happen. Producers crash, messages get lost, processing bugs corrupt data. Metering systems must be recoverable.

Event Log Immutability

Events stored in the primary database should never be deleted or modified (except for explicit corrective adjustments). Maintain an append-only log structure. Never delete events. Invalid events are marked as invalid with a reference to the correction audit log entry. This preserves the complete history for compliance and dispute resolution.

Event Replay Capability

Keep immutable event logs queryable. When aggregation logic has bugs, you can re-run historical events through the corrected code. This is essential for accuracy corrections.

Recovery Time Objective (RTO) and Recovery Point Objective (RPO)

Define acceptable recovery targets. For billing: RTO of 1 hour (maximum acceptable downtime before events stop being metered) and RPO of 0 events (no data loss). Achieve this through multi-region replication, daily encrypted backups, documented replay procedures tested monthly, and disaster recovery runbooks for ops teams.

Event Ingestion with Platform Infrastructure

Modern event ingestion platforms like Lago provide managed solutions. Lago's real-time event ingestion supports 1M+ events per second with built-in idempotency key support, API-first architecture, and auditable pipeline code (open-source). This eliminates infrastructure toil for billing teams.

For teams building custom pipelines, the principles remain constant: partition by customer, maintain idempotency keys, partition into at-least-once with exactly-once accounting via deduplication, validate and enrich early, scale horizontally, and preserve complete audit logs. Whether you use Kafka, Pulsar, or a SaaS platform, these architectural patterns ensure billing accuracy.

The cost of metering errors—revenue leakage from billing gaps, customer disputes, regulatory fines—far exceeds the cost of building or buying reliable ingestion infrastructure. Prioritize this foundational layer.

Scaling Beyond Capacity

As usage-based billing scales, consider dedicated infrastructure for metering. Some organizations implement separate "hot" paths for real-time metering and "cold" paths for historical analysis, avoiding contention. Real-time metrics (for product dashboards and usage-based rate limits) require sub-second latency. Historical analysis (for billing and analytics) tolerates higher latency but is critical for downstream processes like automated tax calculation and proration calculations that depend on precise usage-based revenue recognition.

Open-source billing platforms abstract this complexity. They handle partitioning, replication, and aggregation automatically. The ROI typically appears at 10M+ events per month, when operational overhead of custom pipelines exceeds the platform cost.

Citations

[1] Paddle (2024). "SaaS Billing Report: The $1 Trillion Revenue Impact." Finds 3-7% of annual revenue lost to metering errors and billing mistakes in usage-based SaaS.

[2] Forrester (2023). "The State of Billing Systems and Metering Infrastructure." 68% of SaaS companies experienced billing errors in the past year due to data pipeline failures.

[3] Apache Kafka Documentation. "Exactly-Once Delivery Semantics: Deduplication and Idempotent Writes."

[4] ClickHouse Official. "Compression Ratios: Data Density in ClickHouse." Event-based datasets typically compress to 1-10% of original size in columnar format.

[5] AWS (2025). "Kinesis Data Streams Capacity Planning." Guidance on partition scaling and throughput limits for high-volume streaming.

[6] Google Cloud (2025). "Pub/Sub at Scale: Exactly-Once Processing with Dataflow." Case studies showing exactly-once semantics with deduplication at 1M+ events/sec.

Usage Metering

Billing & Invoicing

Entitlements

Cash Collection

Revenue Analytics

Lago Embedded

Lago AI ✨

Integrations

AI

Enterprise

Fintechs & Banks

IoT & Telco

Engineering

Finance

Operations

Product

Hybrid Plans

Usage-based

Enterprise Plans

Multi-products

Self-hosted

API Reference

Changelog

Documentation

GitHub

About us

Hiring

Blog

Knowledge base

Learn

Security