Design A Rate Limiter For API gateway (Token bucket / Sliding Window / Redis Based)

1) Problem Clarification / Làm rõ bài toán

EN

We need to prevent users/applications from abusing APIs or overwhelming backend systems.
A rate limiter enforces:

X requests / minute per user
burst tolerance
differentiated quota per subscription plan

VI

Cần chặn abuse API và bảo vệ backend.
Rate limiter áp đặt:

X request/phút cho mỗi user
cho phép burst
quota khác nhau theo gói subscription

2) Requirements / Yêu cầu hệ thống

EN – Functional

✔ per-user and per-IP throttling
✔ API Plan-based throttling (Free, Premium, Enterprise)
✔ burst handling
✔ rejection or queueing when limit exceeded
✔ audit logging

VI – Chức năng

✔ hạn mức theo user/IP
✔ hạn mức theo gói Free/Premium
✔ xử lý burst
✔ từ chối hoặc xếp queue
✔ có logging và audit

EN – Non-functional

✔ latency < 2ms per decision
✔ horizontally scalable
✔ consistent enforcement across gateway instances
✔ failure-safe (should fail-open or fail-closed?)

VI – Phi chức năng

✔ latency < 2ms
✔ scale ngang
✔ consistent khi nhiều instance gateway
✔ fail-safe (fail-open hay fail-closed?)

3) Architecture Overview / Kiến trúc tổng quan

Client → API Gateway → Rate Limiter → Upstream Service

Rate limiter typically backed by:
- Redis cluster
- Token Bucket or Sliding Window algorithms

VI

Gateway sẽ gọi rate limiter trước khi forward request.
Rate limiter dùng:

Redis cluster
Algorithm Token bucket hoặc Sliding window

4) Algorithms Comparison / So sánh thuật toán

EN

Algorithm	Good For	Cons
Fixed Window	Simple	Burst unfairness
Sliding Window	Precision	More compute
Token Bucket	Burst-friendly	Slight complexity
Leaky Bucket	Queueing fairness	Delay risk

VI

➡ Recommended: Token Bucket + Sliding window hybrid

5) Token Bucket Model / Mô hình Token Bucket

EN

Every user has a bucket:

capacity = limit (100 req/min)
refill rate = X tokens/sec

When request arrives:

If token exists → serve & decrement
If not → reject or queue

VI

Mỗi user có một “xô token”:

dung lượng = quota (vd 100/phút)
refill theo giây

Request đến:

nếu còn token → cho phép & trừ token
nếu hết → reject hoặc xếp queue

6) Distributed State Store / Kho lưu trữ phân tán

EN

We need shared global limit enforcement:

Solution → Redis cluster using atomic commands:

INCR
DECR
Lua script for multi-step atomicity

VI

Vì gateway chạy nhiều node nên cần store chung:

→ Redis cluster dùng lệnh atom:

INCR / DECR
Lua script đảm bảo atomic

7) Data Structures / Cấu trúc dữ liệu

EN

Redis Key patterns:

rate:<user_id>:bucket
rate:<ip>:bucket
rate:<plan>:usage

TTL indicates refill window.

VI

Key Redis:

rate:<user_id>:bucket
rate:<ip>:bucket
rate:<plan>:usage

TTL thể hiện vòng refill.

8) Quota by Plan / Quota theo gói subscription

EN

Example:

Free: 60 req/min
Premium: 600 req/min
Enterprise: 5,000 req/min

Gateway computes plan → assigns token bucket spec.

VI

Ví dụ:

Free: 60/phút
Premium: 600/phút
Enterprise: 5,000/phút

Gateway lấy plan của user → set quota tương ứng.

9) Burst Handling / Xử lý burst

EN

Token bucket deliberately permits bursts if bucket has capacity.
This smooths traffic.

VI

Token bucket cho phép burst nếu còn token.
Đây là ưu điểm so với fixed window.

10) Consistency Across Nodes / Đồng bộ giữa nhiều node gateway

EN

Redis is central, so reads/writes consistent.
We avoid storing counters in local memory except lightweight cache for hot paths.

VI

Redis là store trung tâm → consistency.
Không lưu counter local, tránh lệch limit.

11) Failure Handling / Xử lý lỗi

EN

Case 1: Redis down
→ fail-open or degraded fallback bucket in memory
(Prefer fail-open to avoid service outage)

Case 2: Gateway overload
→ queueing or local token cache bypass

VI

Redis down?
→ fail-open hoặc degrade bằng bucket memory local
(thường chọn fail-open, ưu tiên không down API)

Gateway quá tải?
→ queue hoặc bypass cache

12) Observability / Giám sát & Audit

EN

Track:

rejected requests
latency of limiter
false-positive rejects
Redis utilization
plan quota consumption heatmap

VI

Theo dõi:

số request bị reject
latency của limiter
tỷ lệ reject sai
load Redis
heatmap tiêu thụ quota theo plan/user

13) Optimization / Tối ưu

EN

Lua script atomic bucket check
batch refill
local-fast path caching for premium users
adaptive quota tuning

VI

Tối ưu:

Lua atomic check bucket
refill theo batch
local fast-path cache cho premium
auto-tune quota

14) Future Enhancements / Nâng cấp tương lai

EN

ML anomaly detection
Dynamic plan quota
Per-endpoint rate limiter
API billing integration

VI

ML phát hiện abuse
quota động
rate limit theo endpoint
billing theo API usage