Skip to Content
LightRateConceptsToken Bucket

Token Bucket Algorithm

LightRate uses the token bucket algorithm, a widely-used rate limiting technique that provides both smoothness and flexibility.

How It Works

A token bucket has two key parameters:

  • Refill Rate: How many tokens are added per second
  • Burst Rate: Maximum number of tokens the bucket can hold

Each request consumes one or more tokens. If tokens are available, the request succeeds. If not, the request is rate limited.

Visual Example

Let’s say you have a token bucket with:

  • Refill Rate: 10 tokens/second
  • Burst Rate: 100 tokens

Timeline:

Time 0s: [████████████████████] 100/100 tokens (full) ↓ Request: Consume 5 tokens Time 0s: [███████████████ ] 95/100 tokens Time 1s: [████████████████████] 100/100 tokens (refilled 10, capped at burst) ↓ Request: Consume 50 tokens Time 1s: [██████████ ] 50/100 tokens Time 6s: [████████████████████] 100/100 tokens (refilled 50 over 5 seconds) ↓ Request: Consume 10 tokens Time 6s: [█████████████████ ] 90/100 tokens

Step-by-Step Breakdown

  1. Initial State: Bucket starts full with 100 tokens
  2. Request 1: Consume 5 tokens → 95 tokens remaining
  3. After 1 second: 95 + 10 = 105 tokens (capped at 100 due to burst rate)
  4. Request 2: Consume 50 tokens → 50 tokens remaining
  5. After 5 seconds: 50 + (10 × 5) = 100 tokens (capped at burst rate)

Why Token Buckets?

Token buckets offer several advantages over other rate limiting algorithms:

1. Burst Handling

Allows short bursts of traffic above the average rate. Users can accumulate tokens during idle periods and spend them quickly when needed.

Example:

  • Refill Rate: 10 tokens/sec
  • Burst Rate: 100 tokens
  • User makes no requests for 10 seconds → bucket fills to 100 tokens
  • User can now make 100 requests instantly
  • Afterward, limited to 10 requests/sec sustained

2. Smooth Rate Limiting

Tokens refill continuously, not in fixed time windows. Unlike fixed window algorithms, there’s no “reset” period where limits suddenly become available again.

Token Bucket:

Continuous refill: 10 tokens/sec always refilling

Fixed Window (for comparison):

0-60s: 600 requests allowed 60s: Counter resets Potential burst of 1200 requests at boundary!

3. Flexibility

Different operations can consume different numbers of tokens:

// Cheap operation - consume 1 token await client.consumeTokens(userId, 1, 'getProfile'); // Expensive operation - consume 10 tokens await client.consumeTokens(userId, 10, 'generateReport'); // Very expensive - consume 100 tokens await client.consumeTokens(userId, 100, 'bulkExport');

4. Fairness

Unused capacity accumulates (up to burst limit) for later use. Users who make fewer requests get to use their capacity when they need it.

Refill Rate vs Burst Rate

Refill Rate

The refill rate determines sustained throughput - how many requests per second a user can make over time.

Examples:

  • 1 token/sec = 1 request/sec sustained = ~86,400 requests/day
  • 10 tokens/sec = 10 requests/sec sustained = ~864,000 requests/day
  • 100 tokens/sec = 100 requests/sec sustained = ~8.6M requests/day

Set refill rate based on your average expected load, not peak load.

Burst Rate

The burst rate determines peak capacity - how many requests can be made instantly.

Examples:

  • Burst of 10 = Can handle 10 instant requests
  • Burst of 100 = Can handle 100 instant requests
  • Burst of 1000 = Can handle 1000 instant requests

Rule of thumb: Set burst rate to 5-10× the refill rate to handle peaks while preventing abuse.

Finding the Right Balance

Conservative (API with expensive operations):

Refill Rate: 5 tokens/sec Burst Rate: 25 tokens Result: 5 req/sec sustained, bursts of 25

Moderate (typical public API):

Refill Rate: 10 tokens/sec Burst Rate: 100 tokens Result: 10 req/sec sustained, bursts of 100

Generous (read-heavy API):

Refill Rate: 100 tokens/sec Burst Rate: 1000 tokens Result: 100 req/sec sustained, bursts of 1000

Per-User Token Buckets

LightRate creates separate token buckets for each user identifier. This provides fair and isolated rate limiting.

How It Works

Application: Email Service Rule: sendEmail (10 tokens/sec, burst 100) User alice → Bucket A (10/sec, burst 100) User bob → Bucket B (10/sec, burst 100) User carol → Bucket C (10/sec, burst 100)

Each user has their own bucket:

  • Alice’s usage doesn’t affect Bob’s limits
  • Bob can’t exhaust Carol’s tokens
  • Each user gets equal access

Benefits

1. Fairness

❌ Without per-user buckets: Alice makes 1000 requests → exhausts shared limit Bob makes 1 request → rate limited (unfair!) ✅ With per-user buckets: Alice makes 1000 requests → only Alice rate limited Bob makes 1 request → succeeds (fair!)

2. Isolation

  • One user can’t DoS other users
  • Misbehaving clients don’t affect others
  • Each user pays for their own usage

3. Scalability

  • Redis can handle millions of separate buckets
  • Buckets are created on-demand
  • Unused buckets are automatically cleaned up

User Identifiers

User identifiers can be anything that uniquely identifies a user or client:

  • User IDs: user_12345, alice@example.com
  • API Keys: sk_live_abc123, api_xyz789
  • IP Addresses: 192.168.1.1, 10.0.0.5
  • Session IDs: sess_abc123, token_xyz789
  • Organization IDs: org_enterprise_123

Choose user identifiers carefully! They determine how rate limits are applied. Using IP addresses might rate limit users behind a shared NAT gateway together.

Implementation Details

Storage

Token bucket state is persisted in Redis:

Key: lightrate:bucket:app_123:user_456:rule_789 Value: { tokens: 85.5, lastRefill: 1699564800.123 }

Precision

LightRate uses sub-second precision for token refill calculations:

// Calculate tokens refilled since last check const elapsed = currentTime - lastRefillTime; const tokensToAdd = elapsed * refillRate; const newTokens = Math.min(currentTokens + tokensToAdd, burstRate);

Atomic Operations

Token consumption is atomic to prevent race conditions:

// Redis Lua script ensures atomicity const script = ` local tokens = redis.call('GET', KEYS[1]) if tokens >= tokensRequested then redis.call('SET', KEYS[1], tokens - tokensRequested) return 1 else return 0 end `;

Efficiency

Tokens are calculated on-demand, not with background jobs:

  • No scheduled tasks needed
  • No CPU waste recalculating unused buckets
  • Scales to millions of users
  • Only active buckets use resources

Best Practices

1. Set Appropriate Refill Rates

Base refill rates on your service’s capacity:

// Consider your database/API limits const dbMaxQPS = 1000; // Your DB can handle 1000 QPS const users = 100; // Expected concurrent users const refillRate = Math.floor(dbMaxQPS / users); // 10 tokens/sec per user

2. Allow Reasonable Bursts

Give users flexibility while preventing abuse:

// Good: 5-10x multiplier refillRate: 10 burstRate: 50-100 // Too strict: No burst handling refillRate: 10 burstRate: 10 // Too loose: Potential abuse refillRate: 10 burstRate: 10000

3. Monitor and Adjust

Track rate limit metrics:

  • How often are limits hit?
  • What’s the average token consumption?
  • Are legitimate users being blocked?

4. Start Conservative

It’s easier to increase limits than decrease them:

// Start here refillRate: 5, burstRate: 25 // Increase based on data refillRate: 10, burstRate: 50 // Scale as needed refillRate: 50, burstRate: 500

Next Steps

  • Applications - Learn how to organize token buckets in applications
  • Rules - Create specific rate limiting rules
  • Quick Start - Implement your first rate limit
Last updated on