Token Bucket Algorithm
LightRate uses the token bucket algorithm, a widely-used rate limiting technique that provides both smoothness and flexibility.
How It Works
A token bucket has two key parameters:
- Refill Rate: How many tokens are added per second
- Burst Rate: Maximum number of tokens the bucket can hold
Each request consumes one or more tokens. If tokens are available, the request succeeds. If not, the request is rate limited.
Visual Example
Let’s say you have a token bucket with:
- Refill Rate: 10 tokens/second
- Burst Rate: 100 tokens
Timeline:
Time 0s: [████████████████████] 100/100 tokens (full)
↓ Request: Consume 5 tokens
Time 0s: [███████████████ ] 95/100 tokens
Time 1s: [████████████████████] 100/100 tokens (refilled 10, capped at burst)
↓ Request: Consume 50 tokens
Time 1s: [██████████ ] 50/100 tokens
Time 6s: [████████████████████] 100/100 tokens (refilled 50 over 5 seconds)
↓ Request: Consume 10 tokens
Time 6s: [█████████████████ ] 90/100 tokensStep-by-Step Breakdown
- Initial State: Bucket starts full with 100 tokens
- Request 1: Consume 5 tokens → 95 tokens remaining
- After 1 second: 95 + 10 = 105 tokens (capped at 100 due to burst rate)
- Request 2: Consume 50 tokens → 50 tokens remaining
- After 5 seconds: 50 + (10 × 5) = 100 tokens (capped at burst rate)
Why Token Buckets?
Token buckets offer several advantages over other rate limiting algorithms:
1. Burst Handling
Allows short bursts of traffic above the average rate. Users can accumulate tokens during idle periods and spend them quickly when needed.
Example:
- Refill Rate: 10 tokens/sec
- Burst Rate: 100 tokens
- User makes no requests for 10 seconds → bucket fills to 100 tokens
- User can now make 100 requests instantly
- Afterward, limited to 10 requests/sec sustained
2. Smooth Rate Limiting
Tokens refill continuously, not in fixed time windows. Unlike fixed window algorithms, there’s no “reset” period where limits suddenly become available again.
Token Bucket:
Continuous refill: 10 tokens/sec always refillingFixed Window (for comparison):
0-60s: 600 requests allowed
60s: Counter resets
Potential burst of 1200 requests at boundary!3. Flexibility
Different operations can consume different numbers of tokens:
// Cheap operation - consume 1 token
await client.consumeTokens(userId, 1, 'getProfile');
// Expensive operation - consume 10 tokens
await client.consumeTokens(userId, 10, 'generateReport');
// Very expensive - consume 100 tokens
await client.consumeTokens(userId, 100, 'bulkExport');4. Fairness
Unused capacity accumulates (up to burst limit) for later use. Users who make fewer requests get to use their capacity when they need it.
Refill Rate vs Burst Rate
Refill Rate
The refill rate determines sustained throughput - how many requests per second a user can make over time.
Examples:
- 1 token/sec = 1 request/sec sustained = ~86,400 requests/day
- 10 tokens/sec = 10 requests/sec sustained = ~864,000 requests/day
- 100 tokens/sec = 100 requests/sec sustained = ~8.6M requests/day
Set refill rate based on your average expected load, not peak load.
Burst Rate
The burst rate determines peak capacity - how many requests can be made instantly.
Examples:
- Burst of 10 = Can handle 10 instant requests
- Burst of 100 = Can handle 100 instant requests
- Burst of 1000 = Can handle 1000 instant requests
Rule of thumb: Set burst rate to 5-10× the refill rate to handle peaks while preventing abuse.
Finding the Right Balance
Conservative (API with expensive operations):
Refill Rate: 5 tokens/sec
Burst Rate: 25 tokens
Result: 5 req/sec sustained, bursts of 25Moderate (typical public API):
Refill Rate: 10 tokens/sec
Burst Rate: 100 tokens
Result: 10 req/sec sustained, bursts of 100Generous (read-heavy API):
Refill Rate: 100 tokens/sec
Burst Rate: 1000 tokens
Result: 100 req/sec sustained, bursts of 1000Per-User Token Buckets
LightRate creates separate token buckets for each user identifier. This provides fair and isolated rate limiting.
How It Works
Application: Email Service
Rule: sendEmail (10 tokens/sec, burst 100)
User alice → Bucket A (10/sec, burst 100)
User bob → Bucket B (10/sec, burst 100)
User carol → Bucket C (10/sec, burst 100)Each user has their own bucket:
- Alice’s usage doesn’t affect Bob’s limits
- Bob can’t exhaust Carol’s tokens
- Each user gets equal access
Benefits
1. Fairness
❌ Without per-user buckets:
Alice makes 1000 requests → exhausts shared limit
Bob makes 1 request → rate limited (unfair!)
✅ With per-user buckets:
Alice makes 1000 requests → only Alice rate limited
Bob makes 1 request → succeeds (fair!)2. Isolation
- One user can’t DoS other users
- Misbehaving clients don’t affect others
- Each user pays for their own usage
3. Scalability
- Redis can handle millions of separate buckets
- Buckets are created on-demand
- Unused buckets are automatically cleaned up
User Identifiers
User identifiers can be anything that uniquely identifies a user or client:
- User IDs:
user_12345,alice@example.com - API Keys:
sk_live_abc123,api_xyz789 - IP Addresses:
192.168.1.1,10.0.0.5 - Session IDs:
sess_abc123,token_xyz789 - Organization IDs:
org_enterprise_123
Choose user identifiers carefully! They determine how rate limits are applied. Using IP addresses might rate limit users behind a shared NAT gateway together.
Implementation Details
Storage
Token bucket state is persisted in Redis:
Key: lightrate:bucket:app_123:user_456:rule_789
Value: {
tokens: 85.5,
lastRefill: 1699564800.123
}Precision
LightRate uses sub-second precision for token refill calculations:
// Calculate tokens refilled since last check
const elapsed = currentTime - lastRefillTime;
const tokensToAdd = elapsed * refillRate;
const newTokens = Math.min(currentTokens + tokensToAdd, burstRate);Atomic Operations
Token consumption is atomic to prevent race conditions:
// Redis Lua script ensures atomicity
const script = `
local tokens = redis.call('GET', KEYS[1])
if tokens >= tokensRequested then
redis.call('SET', KEYS[1], tokens - tokensRequested)
return 1
else
return 0
end
`;Efficiency
Tokens are calculated on-demand, not with background jobs:
- No scheduled tasks needed
- No CPU waste recalculating unused buckets
- Scales to millions of users
- Only active buckets use resources
Best Practices
1. Set Appropriate Refill Rates
Base refill rates on your service’s capacity:
// Consider your database/API limits
const dbMaxQPS = 1000; // Your DB can handle 1000 QPS
const users = 100; // Expected concurrent users
const refillRate = Math.floor(dbMaxQPS / users); // 10 tokens/sec per user2. Allow Reasonable Bursts
Give users flexibility while preventing abuse:
// Good: 5-10x multiplier
refillRate: 10
burstRate: 50-100 ✅
// Too strict: No burst handling
refillRate: 10
burstRate: 10 ❌
// Too loose: Potential abuse
refillRate: 10
burstRate: 10000 ❌3. Monitor and Adjust
Track rate limit metrics:
- How often are limits hit?
- What’s the average token consumption?
- Are legitimate users being blocked?
4. Start Conservative
It’s easier to increase limits than decrease them:
// Start here
refillRate: 5,
burstRate: 25
// Increase based on data
refillRate: 10,
burstRate: 50
// Scale as needed
refillRate: 50,
burstRate: 500Next Steps
- Applications - Learn how to organize token buckets in applications
- Rules - Create specific rate limiting rules
- Quick Start - Implement your first rate limit