
Pricing & Monetization
B2B SaaS Subscription Management: Best Practices and Strategies
Anh-Tho Chuong • 5 min read
Mar 3
/12 min read
A rating engine is the billing system component responsible for transforming raw usage events into billable amounts. Given that 65% of SaaS companies use consumption-based models according to recent industry surveys [1], rating engines have become fundamental infrastructure for scaling businesses. This article covers rating engine architecture, from event ingestion through billable amount calculation, including core patterns, real-time processing tradeoffs, and production considerations for high-scale systems.
In a modern billing pipeline, three distinct stages occur after usage event generation: rating, billing, and invoicing. A rating engine consumes raw usage events and applies pricing logic to produce billable amounts. The billing engine aggregates rated amounts, applies account-level rules (discounts, credits, taxes), and generates invoice line items. The invoicing engine formats those line items into customer-facing documents and manages payment collection.
This separation of concerns allows specialized optimization at each layer. A rating engine focuses purely on pricing logic—given an event and a price card, what amount is owed? A billing engine focuses on aggregation and financial rules. An invoicing engine focuses on customer communication and compliance. In practice, some platforms combine these, but architectural clarity requires understanding the distinct responsibilities. Rating precedes billing, which precedes invoicing, forming a unidirectional pipeline [2].
The rating engine must handle volume efficiently because usage events arrive continuously. For a platform processing 1M+ events per second, a naive implementation that recalculates pricing from scratch for every event becomes computationally infeasible. Smart rating engines pre-compute thresholds, cache price cards in memory, and use batch aggregation strategies to remain performant while maintaining billing accuracy.
The simplest rating pattern is flat rate: a fixed monthly charge regardless of usage. No event processing required—the rating engine emits a constant amount at month-end. The next layer of complexity is per-unit pricing, where each unit of usage triggers a fixed charge. A SaaS analytics platform might charge $0.05 per event. The rating engine receives 100,000 events in a month and emits $5,000 (100,000 × $0.05). This pattern requires event aggregation but no threshold logic.
Tiered pricing introduces complexity: prices change as usage accumulates. Two tiered models exist: graduated (inclusive) and volume. In graduated pricing, each tier applies only to its own usage band—the first 100 units cost $1 each, units 101-500 cost $0.80 each, units 501+ cost $0.60 each. In volume pricing, all units use the same tier price once a threshold is crossed—if you consume over 500 units, all units cost $0.60. Graduated pricing is more common in modern SaaS (AWS uses this model extensively) while volume pricing appears in utility billing.
To implement graduated pricing correctly, the rating engine must track cumulative usage and determine which tier each unit falls into. This logic handles the boundary between tiers correctly. If usage is 150 units with tiers [0-100 @ $1, 100-500 @ $0.80], the first iteration assigns 100 units to tier 1 ($100), the second assigns 50 units to tier 2 ($40), totaling $140. Errors in this logic—off-by-one errors, incorrect tier boundary handling, or exclusive vs. inclusive range confusion—are common sources of billing disputes [3].
Commit-based pricing combines a fixed commitment (e.g., $10,000/month) with per-unit pricing for overages above the committed amount. The rating engine must track consumed usage, convert it to cost via standard pricing, compare against the commitment, and emit the maximum of commitment or actual cost. If a customer commits to $10,000 and consumes only $7,000 worth of usage, they owe $10,000. If they consume $15,000, they owe $15,000. This simple max() operation is essential for many enterprise pricing models.
Overage handling introduces a complication: what constitutes an "overage"? In some models, the base plan includes a certain unit allowance (e.g., 1,000 API calls/month included), and overage charges apply above that. In others, a customer pays a flat fee for a plan tier, then per-unit for any usage. The rating engine must track both included and consumed amounts to compute the overage correctly. This is where testing becomes critical—off-by-one errors between included amounts and overage thresholds cause systematic billing errors across an entire customer segment [4].
Implementation requires state tracking across the billing period. If events arrive out-of-order or with timestamps spanning multiple days, the rating engine must aggregate usage correctly regardless of temporal ordering. Distributed systems further complicate this: events may be processed across multiple machines, requiring either eventual consistency or a central authoritative aggregator.
Volume pricing charges based on total quantity consumed. Unlike graduated pricing where rates vary by tier, volume pricing applies a single price point based on total usage. Consume 1,000 units and pay $0.50 each; consume 2,000 units and pay $0.40 each. All units use the volume-based price once thresholds are met. Volume pricing appears frequently in telecommunications and utility billing where per-unit costs decrease substantially at scale.
Package pricing bundles units into fixed lots. A cloud storage provider might charge $50 for 100 GB packages—you can't buy a partial package. If you consume 150 GB, you pay for 2 packages ($100). The rating engine rounds consumption up to the next package boundary, then multiplies by package price. This pattern appears in storage, bandwidth, and seat-based SaaS products where units align to discrete SKUs.
Percentage pricing charges a percentage of transaction value rather than per-unit amounts. Payment processors, marketplaces, and financial services platforms use this heavily. The rating engine receives transaction data and computes amount × percentage. Combined with tier minimums or maximums (minimum $0.30 per transaction, maximum 2.9%), percentage pricing requires min/max logic alongside percentage application. Percentage pricing is particularly prevalent in payment processing—industry data shows payment processors collectively handle trillions in annual volume [5].
Real-time rating processes events and computes billable amounts immediately upon ingestion. This approach provides instant billing visibility and prevents overbilling surprises at month-end. However, real-time rating requires a performant rating engine and introduces computational overhead on the critical path. Batch rating aggregates events over a period (hourly, daily, or at month-end) and processes them in bulk. Batch rating reduces per-event overhead but delays billing visibility and may allow undetected billing errors to accumulate.
The tradeoff depends on billing requirements. Enterprise customers typically demand real-time usage tracking and monthly billing predictability—real-time rating is worth the engineering investment. Cost-sensitive segments may tolerate batch rating if it enables lower pricing. Many platforms adopt a hybrid approach: real-time aggregation for visibility but batch rating for actual invoice generation, allowing for periodic corrections and adjustments.
Real-time rating architectures often use event streaming platforms (Kafka, Pulsar) to ingest events, then route through a rating service that applies pricing logic and emits rated events to a data warehouse. Batch systems typically aggregate raw events in database tables during the billing period, then run a rating job that processes accumulated events in a single batch query, yielding invoice line items.
A practical real-time system architecture includes: (1) event ingestion service accepting usage events via HTTP or gRPC; (2) event validation layer verifying required fields and data types; (3) rating service accessing cached price cards and applying logic; (4) aggregation store (Redis, DuckDB) tracking cumulative usage for the billing period; (5) invoice generation service that queries the aggregation store and produces final invoice line items. Each layer can scale independently based on bottlenecks. Open-source billing platforms like Lago implement this pattern with support for 8 charge models (per-unit, graduated, package, percentage, volume, dynamic, custom, and progressive) and real-time event ingestion at 1M+ events per second, providing an auditable foundation that teams can inspect and extend.
Many SaaS products charge based on multiple dimensions simultaneously. A database platform might charge $0.10 per GB of storage and $0.05 per million read operations. An analytics platform might charge per event ingested, per query, and per dashboard. The rating engine must handle multi-dimensional pricing by tracking each meter independently, applying rates, and summing results.
Implementation requires a pricing model that enumerates all meters for a given customer's plan. For each meter, the rating engine looks up total usage from the aggregation layer, applies the corresponding rate, and accumulates the amount.
Composite charges add another layer: charges that depend on multiple meters. A SaaS product might offer "free" queries up to 1,000 per million events ingested, then charge $1 per excess query. The rating engine must first compute both event count and query count, then apply conditional logic. These conditional charges require careful specification because edge cases abound—what if the customer ingests events but makes no queries? What if event ingestion is zero but queries are high? Testing composite charges thoroughly prevents billing errors in complex pricing models [4].
Billing at global scale introduces currency complexity. Different customer segments pay in different currencies, and exchange rate fluctuations occur continuously. The rating engine must know the billing currency for each customer and apply rates in that currency. For multi-currency pricing (a customer in EUR consuming from a USD-priced API), conversion must happen at a well-defined point in the pipeline—typically at rating time using either a fixed contracted rate or a real-time market rate.
Rounding is deceptively important. If a customer consumes 1 unit at $0.333333/unit, the billing amount is $0.33 or $0.34 depending on rounding mode. Across thousands of customers and millions of events, rounding errors accumulate. Industry standards (GAAP, IFRS) require consistent rounding rules, typically "round half up" (banker's rounding) or "round half to even." The rating engine must implement these rules consistently across all calculations and document the chosen approach in terms of service to prevent customer disputes.
A more subtle issue: at what decimal place does rounding occur? Some systems round individual line items to 2 decimal places, then sum (leading to precision loss if many items exist). Others sum first, then round the final amount. These produce different results. For compliance, define the rounding rule once (typically round final invoice total, not line items), implement it consistently, and test extensively. SaaS billing standards recommend rounding invoice totals to the minor currency unit (cents for USD) only, avoiding precision loss from early rounding [2].
At scale, rating engine performance becomes critical. Processing 1M+ events per second requires optimizations beyond naive per-event pricing lookup. Three optimization strategies dominate: (1) price card caching—load customer price cards into memory at the start of each billing period, update on price changes; (2) aggregation batching—accumulate events in memory and flush to persistent storage periodically, reducing database write load; (3) threshold pre-computation—for tiered pricing, pre-compute tier boundaries and store in memory to avoid repeated calculations.
Price card caching trades freshness for performance. If a customer's price card updates mid-month, should the change apply immediately or only to future events? Define this in your pricing model—most SaaS platforms apply changes to the next billing cycle only, simplifying the rating engine and preventing mid-cycle reconciliation issues. Once this rule is established, price cards become "frozen" during a billing cycle and can be cached safely.
Aggregation batching introduces a tradeoff between latency and throughput. If you accumulate events in-memory for 5 seconds before flushing, you process thousands in a single batch (high throughput) but delay visibility by ~5 seconds (higher latency). For most SaaS use cases, <5 second latency is acceptable. For real-time trading or risk management, per-event processing may be required despite lower throughput.
Threshold pre-computation prevents repeated arithmetic. For a customer on tiered pricing with 10 tiers, you might compute cumulative unit thresholds once and store them as an array [100, 600, 1500, ...] instead of recalculating on every event. When a new event arrives, binary search the array to find the applicable tier instantly.
Testing rating engines requires rigor beyond standard unit tests. Three testing strategies are essential: (1) property-based testing—define mathematical properties that must hold (e.g., cost must be monotonic in usage), then generate random inputs and verify properties automatically; (2) golden file tests—curate realistic pricing scenarios with manually verified expected outputs, then regression test against them; (3) edge case catalogs—explicitly enumerate boundary conditions (zero usage, tier boundaries, rounding edge cases) and test each.
Property-based testing catches subtle bugs. A property: "if usage doubles, cost increases monotonically." Generate random usage amounts, apply your rating logic, and verify the property holds. Frameworks like QuickCheck (Haskell), Hypothesis (Python), and jqwik (Java) automate this. Properties help catch logic errors—off-by-one errors in tier calculations, incorrect rounding, or boundary condition failures often violate obvious properties and are caught immediately.
Golden file tests prevent regression. Pick 20-50 realistic pricing scenarios (graduated tier + overage + percentage, package pricing + minimum charge, etc.), manually compute expected billing amounts in a spreadsheet or external tool, then store as test fixtures. When the rating engine changes, run it against all fixtures and catch unintended behavior changes. These tests are tedious to create but invaluable for production confidence [4].
Edge case catalogs ensure completeness. List every boundary: (a) zero usage; (b) usage exactly at tier boundaries; (c) usage just below and just above tier boundaries; (d) minimum charge thresholds; (e) percentage rounding; (f) multi-currency conversions; (g) overage edge cases. Test each explicitly. Many billing disputes arise from untested edge cases that appear benign during development but cause systematic errors in production.
Billing systems must be auditable—any customer dispute requires tracing an invoice line item back to the raw usage events that generated it. This requires careful system design. Each rated event should include pointers to its source (raw event ID, timestamp, customer ID), the pricing model applied, and the computed amount. When aggregating events into invoice line items, retain these pointers throughout the pipeline.
Practical auditability requires: (1) immutable event logs—append-only storage of raw usage events with monotonically increasing sequence numbers; (2) rating decision records—for each invoice line item, log the pricing model version, applicable tiers/rates, and computation inputs; (3) audit query support—allow querying events by customer/period and reconstructing invoice calculations. Some platforms implement this as a separate audit database; others store audit data alongside billing records.
For customer disputes, reconstruct the calculation: retrieve the invoice line item ID, look up its source events, verify the pricing model used, and recalculate the amount. If a discrepancy appears, the audit trail reveals whether it stems from incorrect event ingestion, wrong pricing model version, or calculation error. Document this capability in your system—many SaaS companies add audit features only after a major billing incident costs them customer trust. Open-source billing infrastructure like Lago makes this auditing straightforward because the rating logic itself is transparent and inspectable, rather than operating as a black box.
Rated amounts feed directly into downstream systems—tax calculation engines, invoicing platforms, and revenue recognition systems. Rating errors propagate and compound at each stage. The rating output must include sufficient metadata: customer ID, billing period, currency, rated amount, applicable meter and rate, usage quantity. Multi-tenant platforms that use tax automation downstream must ensure rating accuracy to prevent cascading compliance issues.
Similarly, platforms that use revenue recognition rules depend on accurate rating to ensure compliant revenue booking. The rating engine is foundational—errors here are expensive to fix later. Separate rating from tax calculation: rate the usage, then apply taxes to the rated amount. This allows replacing either component independently and makes the system more testable.
Several mistakes plague rating engine implementations. (1) Insufficient testing of tier boundaries—off-by-one errors in tier calculations are common and difficult to detect in manual testing. Use property-based testing and explicit boundary testing. (2) Rounding at the wrong step—rounding per-line-item instead of per-invoice accumulates errors. Define the rounding rule once and apply only at the final step. (3) Ignoring currency and localization—assuming USD or ignoring exchange rate timing creates billing errors for global customers. Handle multi-currency explicitly from the start.
(4) Lack of auditability—if you can't trace an invoice back to raw events, you can't resolve disputes. Design auditability in from the start, not as an afterthought. (5) Real-time rating without batch validation—real-time systems are fast but can accumulate subtle errors. Run a monthly batch validation that re-rates all events and compares against real-time results. (6) Ignoring overage and commit edge cases—commits, minimum charges, and overage thresholds introduce many edge cases. Test them exhaustively.
Billing often involves mid-period changes: plan upgrades, downgrades, or cancellations. The rating engine must handle proration calculations that allocate charges across multiple price cards. If a customer upgrades mid-month, the portion of the month before upgrade uses the old rates, the portion after uses new rates. Proration requires dividing the billing period into segments, applying appropriate rates to each, and summing results.
Proration adds complexity but is essential for fair billing. Implement it carefully with tests that verify correctness across various upgrade/downgrade scenarios. Some platforms avoid proration complexity by pro-rating at the subscription level (charge for an entire cycle, then issue credits for unused time), while others build full proration into the rating engine. The architecture choice depends on simplicity vs. precision tradeoffs.
The rating engine transforms raw usage events into billable amounts through systematic application of pricing logic. Strong rating engine architecture—with clear separation from billing and invoicing, support for multiple rating patterns, performance optimization, and rigorous testing—enables accurate, scalable, auditable billing at global scale. The most common errors (tier boundary off-by-ones, rounding at wrong steps, insufficient testing) are preventable with disciplined implementation and comprehensive test suites. For companies building billing systems, investing in a robust rating engine early pays dividends in customer satisfaction, operational simplicity, and reduced billing disputes. Understanding rating engine architecture is essential for anyone designing or implementing SaaS billing infrastructure.
Content