
Pricing & Monetization
B2B SaaS Subscription Management: Best Practices and Strategies
Anh-Tho Chuong • 5 min read
Mar 3
/8 min read
Event ingestion is the process of collecting, validating, and routing usage events from customer applications into a metering system that feeds billing calculations. Without a reliable event ingestion pipeline, SaaS companies lose revenue through missed events, duplicate charges, and failed reconciliations—with industry data showing that metering errors account for 3-7% of annual billing leakage in usage-based SaaS businesses [1]. This article covers the architecture patterns, reliability guarantees, and implementation strategies required to build production-grade metering pipelines that handle millions of events while maintaining accuracy for billing.
Usage-based billing depends entirely on accurate event capture. Unlike seat-based or subscription models, usage-based billing requires real-time accounting of customer activity—whether that's API calls, storage consumption, or compute minutes. A single dropped event creates a billing discrepancy. A duplicated event overcharges. A late-arriving event breaks reconciliation.
Research indicates that 68% of SaaS companies experienced at least one billing error in the past year due to data pipeline issues [2]. Event ingestion is where most of these errors originate. The pipeline must guarantee not just delivery, but correct ordering, exactly-once accounting semantics, and auditability for disputes.
The stakes are highest in high-volume environments. A company processing 1M+ events per second cannot afford manual reconciliation. The pipeline must be automated, observable, and self-healing.
Every event flowing through your pipeline must carry specific fields to ensure proper billing attribution and auditability.
{ "event_id": "evt_1h7k9j2m0p5q", "timestamp": "2026-03-02T14:23:45.123Z", "customer_id": "cust_8b3k1x9y", "subscription_id": "sub_5j9n2k0m", "event_type": "api_call", "properties": { "endpoint": "/v1/documents", "method": "POST", "response_time_ms": 234, "tokens_used": 1500, "region": "us-east-1" }, "idempotency_key": "customer-batch-20260302-00147" }
The core fields serve specific purposes. The event_id is a unique identifier for deduplication—critical because the same event may arrive multiple times across retries. The timestamp captures when the event occurred at the source, not when it arrived at your system. This timestamp is used for billing period cutoffs and is non-negotiable for accuracy.
The customer_id or subscription_id links the event to a billing entity. event_type categorizes the usage (API calls, storage GB-hours, compute minutes). The properties object contains dimension data—which endpoint was called, which region, which product feature. Finally, the idempotency_key allows the downstream system to detect and discard duplicate events.
Schema validation must happen at ingestion time. Events with missing event_id, timestamp, or customer_id should be rejected with clear error responses, not silently dropped. Downstream billing systems depend on schema consistency.
Three primary patterns dominate event ingestion architecture: HTTP API, message queues, and batch upload. Most production systems use combinations.
Direct HTTP POST to an ingestion endpoint offers simplicity and immediate feedback. The HTTP 202 (Accepted) response indicates the request was received and queued for processing, not that it was durably stored. This pattern works well for moderate volumes (under 100K events/sec) and provides immediate feedback when events are malformed or customer authentication fails.
Drawbacks include retry complexity on the client side and potential connection timeouts during backpressure. Most HTTP API implementations pair with a message queue backend to decouple ingestion from processing.
Kafka, RabbitMQ, and cloud-native queues (AWS Kinesis, Google Cloud Pub/Sub) decouple event producers from processors. Applications publish events to a queue; processing services consume asynchronously.
Queues excel at high volumes and provide natural backpressure—slow downstream processors simply consume more slowly. Partitioning by customer_id ensures ordering within a customer's event stream, critical for accurate metering.
The tradeoff is complexity: you need distributed infrastructure, consumer group management, and offset tracking. Exactly-once semantics are harder to guarantee than with HTTP APIs.
Some customers provide bulk event uploads via S3, SFTP, or CSV endpoint. This pattern works for low-frequency use cases and legacy integrations. Batch uploads require format parsing, validation, and transformation into canonical event schemas. They're useful for correcting historical metering gaps or backfilling data.
Billing systems operate under strict semantics requirements. Two guarantees matter: at-least-once delivery and exactly-once accounting.
At-least-once delivery means every event reaches the target system at least one time—possibly multiple times. It requires retries on failure and durability guarantees. Exactly-once accounting means each unique event contributes to billing exactly once, even if it arrives multiple times in the pipeline [3].
These are different requirements. You can implement exactly-once accounting on top of at-least-once delivery using idempotency. This is the standard pattern in production billing systems.
Most modern queues (Kafka, Pulsar) guarantee at-least-once when consumers commit offsets after processing. Kafka offset commits are atomic with processing, preventing loss of events even if the consumer crashes.
Exactly-once is achieved through deduplication on the event_id and idempotency_key. The storage layer (database) must enforce uniqueness constraints. The dual unique constraints handle both scenarios: duplicate event_id (same event sent twice) and duplicate idempotency_key (same batch sent twice). If either constraint violation occurs, the database rejects the insert, preventing double-billing.
Idempotency is the property that repeating an operation produces the same result as executing it once. For billing, this means receiving the same event twice should not create two charges.
The simplest approach is per-event deduplication using a unique event_id. Every event gets a UUID or snowflake ID assigned by the client. The ingestion system maintains a set or database of seen event IDs within a retention window (typically 24-72 hours). Redis or similar fast cache systems work well for this. For long-term deduplication, query the events table with a unique index on event_id.
An idempotency_key is a client-generated value representing a logical operation. Multiple retries of the same operation share the same idempotency_key. This is standard in financial APIs (Stripe, payment processors). The server must cache idempotency keys and their results for the retention window. This prevents double-processing even if the customer retries with different event IDs within the batch.
Events from customers may be incomplete, malformed, or arrive out-of-order. The ingestion pipeline must validate, enrich, and standardize before they reach billing.
Invalid events are logged with detailed error reasons, sent to a dead-letter queue, and reported to the customer. Never silently drop invalid events. Validation checks include required field presence (event_id, customer_id, timestamp), timestamp sanity checks (not too far in future or past), and customer existence verification.
Enrichment adds metadata necessary for accurate aggregation downstream—customer currency, billing period, metrics dimensions. This prevents logic duplication in aggregation services. Common enrichments include server-generated timestamps, customer billing currency lookups, region/datacenter metadata, and event_type normalization.
As volumes scale, a single ingestion server becomes a bottleneck. Production systems partition data across multiple consumers.
Partitioning by customer_id ensures that all events from one customer are processed by a single consumer, maintaining order. This is essential for accurate metering—you can't meter API calls correctly if they arrive out of order.
When downstream systems lag, the ingestion pipeline must slow down gracefully. Backpressure signals prevent queues from growing unbounded and protect downstream billing systems from being overwhelmed. Critical lag thresholds trigger 503 Service Unavailable responses, while warning thresholds return 429 Too Many Requests with Retry-After headers.
Separating the HTTP API layer from the processing layer allows scaling each independently. The API can scale to handle connection volume; consumers scale to handle processing throughput.
Where you store events and how you aggregate them affects both query performance and storage costs.
Storing every event as-is preserves auditability and allows flexible aggregation later. Monthly or daily partitions keep queries fast by eliminating irrelevant data. ClickHouse, a columnar database, excels at this workload—compressing raw events to 1-10% of original size while maintaining query speed [4].
Raw storage requires more space and slower aggregation queries, but enables audit trails and dispute resolution. If a customer disputes a charge, you can retrieve the exact events that generated it.
For high-volume metrics, pre-aggregation trades query flexibility for storage and compute efficiency. Pre-aggregation reduces storage 100-1000x and query latency to milliseconds. The tradeoff: you lose the original event context. This approach works only if you retain raw events for the dispute window (30-90 days) separately.
A hybrid approach is best: raw events in a time-series database for 90 days, with aggregated views for historical analysis.
Production pipelines require continuous observability. Key metrics to monitor include throughput and latency (events ingested, ingestion latency percentiles, queue lag), deduplication rate (duplicate events detected, which may indicate client retry storms), error rates and dead-letter queue accumulation (indicating systematic issues requiring human investigation), and consumer lag (growing lag means events are piling up, degrading billing accuracy).
Disasters happen. Producers crash, messages get lost, processing bugs corrupt data. Metering systems must be recoverable.
Events stored in the primary database should never be deleted or modified (except for explicit corrective adjustments). Maintain an append-only log structure. Never delete events. Invalid events are marked as invalid with a reference to the correction audit log entry. This preserves the complete history for compliance and dispute resolution.
Keep immutable event logs queryable. When aggregation logic has bugs, you can re-run historical events through the corrected code. This is essential for accuracy corrections.
Define acceptable recovery targets. For billing: RTO of 1 hour (maximum acceptable downtime before events stop being metered) and RPO of 0 events (no data loss). Achieve this through multi-region replication, daily encrypted backups, documented replay procedures tested monthly, and disaster recovery runbooks for ops teams.
Modern event ingestion platforms like Lago provide managed solutions. Lago's real-time event ingestion supports 1M+ events per second with built-in idempotency key support, API-first architecture, and auditable pipeline code (open-source). This eliminates infrastructure toil for billing teams.
For teams building custom pipelines, the principles remain constant: partition by customer, maintain idempotency keys, partition into at-least-once with exactly-once accounting via deduplication, validate and enrich early, scale horizontally, and preserve complete audit logs. Whether you use Kafka, Pulsar, or a SaaS platform, these architectural patterns ensure billing accuracy.
The cost of metering errors—revenue leakage from billing gaps, customer disputes, regulatory fines—far exceeds the cost of building or buying reliable ingestion infrastructure. Prioritize this foundational layer.
As usage-based billing scales, consider dedicated infrastructure for metering. Some organizations implement separate "hot" paths for real-time metering and "cold" paths for historical analysis, avoiding contention. Real-time metrics (for product dashboards and usage-based rate limits) require sub-second latency. Historical analysis (for billing and analytics) tolerates higher latency but is critical for downstream processes like automated tax calculation and proration calculations that depend on precise usage-based revenue recognition.
Open-source billing platforms abstract this complexity. They handle partitioning, replication, and aggregation automatically. The ROI typically appears at 10M+ events per month, when operational overhead of custom pipelines exceeds the platform cost.
[1] Paddle (2024). "SaaS Billing Report: The $1 Trillion Revenue Impact." Finds 3-7% of annual revenue lost to metering errors and billing mistakes in usage-based SaaS.
[2] Forrester (2023). "The State of Billing Systems and Metering Infrastructure." 68% of SaaS companies experienced billing errors in the past year due to data pipeline failures.
[3] Apache Kafka Documentation. "Exactly-Once Delivery Semantics: Deduplication and Idempotent Writes."
[4] ClickHouse Official. "Compression Ratios: Data Density in ClickHouse." Event-based datasets typically compress to 1-10% of original size in columnar format.
[5] AWS (2025). "Kinesis Data Streams Capacity Planning." Guidance on partition scaling and throughput limits for high-volume streaming.
[6] Google Cloud (2025). "Pub/Sub at Scale: Exactly-Once Processing with Dataflow." Case studies showing exactly-once semantics with deduplication at 1M+ events/sec.
Content