Token Bucket Algorithm

LightRate uses the token bucket algorithm, a widely-used rate limiting technique that provides both smoothness and flexibility.

How It Works

A token bucket has two key parameters:

Refill Rate: How many tokens are added per second
Burst Rate: Maximum number of tokens the bucket can hold

Each request consumes one or more tokens. If tokens are available, the request succeeds. If not, the request is rate limited.

Visual Example

Let’s say you have a token bucket with:

Refill Rate: 10 tokens/second
Burst Rate: 100 tokens

Timeline:


Time 0s: [████████████████████] 100/100 tokens (full)
         ↓ Request: Consume 5 tokens
Time 0s: [███████████████     ] 95/100 tokens

Time 1s: [████████████████████] 100/100 tokens (refilled 10, capped at burst)
         ↓ Request: Consume 50 tokens
Time 1s: [██████████          ] 50/100 tokens

Time 6s: [████████████████████] 100/100 tokens (refilled 50 over 5 seconds)
         ↓ Request: Consume 10 tokens
Time 6s: [█████████████████   ] 90/100 tokens

Step-by-Step Breakdown

Initial State: Bucket starts full with 100 tokens
Request 1: Consume 5 tokens → 95 tokens remaining
After 1 second: 95 + 10 = 105 tokens (capped at 100 due to burst rate)
Request 2: Consume 50 tokens → 50 tokens remaining
After 5 seconds: 50 + (10 × 5) = 100 tokens (capped at burst rate)

Why Token Buckets?

Token buckets offer several advantages over other rate limiting algorithms:

1. Burst Handling

Allows short bursts of traffic above the average rate. Users can accumulate tokens during idle periods and spend them quickly when needed.

Example:

Refill Rate: 10 tokens/sec
Burst Rate: 100 tokens
User makes no requests for 10 seconds → bucket fills to 100 tokens
User can now make 100 requests instantly
Afterward, limited to 10 requests/sec sustained

2. Smooth Rate Limiting

Tokens refill continuously, not in fixed time windows. Unlike fixed window algorithms, there’s no “reset” period where limits suddenly become available again.

Token Bucket:


Continuous refill: 10 tokens/sec always refilling

Fixed Window (for comparison):


0-60s: 600 requests allowed
60s:   Counter resets
       Potential burst of 1200 requests at boundary!

3. Flexibility

Different operations can consume different numbers of tokens:


// Cheap operation - consume 1 token
await client.consumeTokens(userId, 1, 'getProfile');
 
// Expensive operation - consume 10 tokens
await client.consumeTokens(userId, 10, 'generateReport');
 
// Very expensive - consume 100 tokens
await client.consumeTokens(userId, 100, 'bulkExport');

4. Fairness

Unused capacity accumulates (up to burst limit) for later use. Users who make fewer requests get to use their capacity when they need it.

Refill Rate vs Burst Rate

Refill Rate

The refill rate determines sustained throughput - how many requests per second a user can make over time.

Examples:

1 token/sec = 1 request/sec sustained = ~86,400 requests/day
10 tokens/sec = 10 requests/sec sustained = ~864,000 requests/day
100 tokens/sec = 100 requests/sec sustained = ~8.6M requests/day

Set refill rate based on your average expected load, not peak load.

Burst Rate

The burst rate determines peak capacity - how many requests can be made instantly.

Examples:

Burst of 10 = Can handle 10 instant requests
Burst of 100 = Can handle 100 instant requests
Burst of 1000 = Can handle 1000 instant requests

Rule of thumb: Set burst rate to 5-10× the refill rate to handle peaks while preventing abuse.

Finding the Right Balance

Conservative (API with expensive operations):


Refill Rate: 5 tokens/sec
Burst Rate: 25 tokens
Result: 5 req/sec sustained, bursts of 25

Moderate (typical public API):


Refill Rate: 10 tokens/sec
Burst Rate: 100 tokens
Result: 10 req/sec sustained, bursts of 100

Generous (read-heavy API):


Refill Rate: 100 tokens/sec
Burst Rate: 1000 tokens
Result: 100 req/sec sustained, bursts of 1000

Per-User Token Buckets

LightRate creates separate token buckets for each user identifier. This provides fair and isolated rate limiting.

How It Works


Application: Email Service
Rule: sendEmail (10 tokens/sec, burst 100)

User alice → Bucket A (10/sec, burst 100)
User bob   → Bucket B (10/sec, burst 100)
User carol → Bucket C (10/sec, burst 100)

Each user has their own bucket:

Alice’s usage doesn’t affect Bob’s limits
Bob can’t exhaust Carol’s tokens
Each user gets equal access

Benefits

1. Fairness


❌ Without per-user buckets:
   Alice makes 1000 requests → exhausts shared limit
   Bob makes 1 request → rate limited (unfair!)

✅ With per-user buckets:
   Alice makes 1000 requests → only Alice rate limited
   Bob makes 1 request → succeeds (fair!)

2. Isolation

One user can’t DoS other users
Misbehaving clients don’t affect others
Each user pays for their own usage

3. Scalability

Redis can handle millions of separate buckets
Buckets are created on-demand
Unused buckets are automatically cleaned up

User Identifiers

User identifiers can be anything that uniquely identifies a user or client:

User IDs: user_12345, alice@example.com
API Keys: sk_live_abc123, api_xyz789
IP Addresses: 192.168.1.1, 10.0.0.5
Session IDs: sess_abc123, token_xyz789
Organization IDs: org_enterprise_123

Choose user identifiers carefully! They determine how rate limits are applied. Using IP addresses might rate limit users behind a shared NAT gateway together.

Implementation Details

Storage

Token bucket state is persisted in Redis:


Key: lightrate:bucket:app_123:user_456:rule_789
Value: {
  tokens: 85.5,
  lastRefill: 1699564800.123
}

Precision

LightRate uses sub-second precision for token refill calculations:


// Calculate tokens refilled since last check
const elapsed = currentTime - lastRefillTime;
const tokensToAdd = elapsed * refillRate;
const newTokens = Math.min(currentTokens + tokensToAdd, burstRate);

Atomic Operations

Token consumption is atomic to prevent race conditions:


// Redis Lua script ensures atomicity
const script = `
  local tokens = redis.call('GET', KEYS[1])
  if tokens >= tokensRequested then
    redis.call('SET', KEYS[1], tokens - tokensRequested)
    return 1
  else
    return 0
  end
`;

Efficiency

Tokens are calculated on-demand, not with background jobs:

No scheduled tasks needed
No CPU waste recalculating unused buckets
Scales to millions of users
Only active buckets use resources

Best Practices

1. Set Appropriate Refill Rates

Base refill rates on your service’s capacity:


// Consider your database/API limits
const dbMaxQPS = 1000;  // Your DB can handle 1000 QPS
const users = 100;       // Expected concurrent users
const refillRate = Math.floor(dbMaxQPS / users);  // 10 tokens/sec per user

2. Allow Reasonable Bursts

Give users flexibility while preventing abuse:


// Good: 5-10x multiplier
refillRate: 10
burstRate: 50-100  ✅
 
// Too strict: No burst handling
refillRate: 10
burstRate: 10      ❌
 
// Too loose: Potential abuse
refillRate: 10
burstRate: 10000   ❌

3. Monitor and Adjust

Track rate limit metrics:

How often are limits hit?
What’s the average token consumption?
Are legitimate users being blocked?

4. Start Conservative

It’s easier to increase limits than decrease them:


// Start here
refillRate: 5,
burstRate: 25
 
// Increase based on data
refillRate: 10,
burstRate: 50
 
// Scale as needed
refillRate: 50,
burstRate: 500

Next Steps

Applications - Learn how to organize token buckets in applications
Rules - Create specific rate limiting rules
Quick Start - Implement your first rate limit