Distributed Rate Limiter (Token Bucket, Sliding Window, Clustered)

1) Problem Clarification / Làm rõ bài toán

EN

A rate limiter controls how many requests a user/service can perform within a time window to:

prevent abuse
protect backend
enforce fair usage
avoid DDoS

VI

Rate limiter giới hạn số request trong khoảng thời gian để:

chống abuse
bảo vệ backend
đảm bảo công bằng
chống DDoS

2) Requirements / Yêu cầu

EN

Functional

✔ per-user limiting
✔ per-IP limiting
✔ per-API limiting
✔ burst tolerance
✔ distributed consistency

Non-functional

✔ highly available
✔ low latency (<2ms)
✔ scalable horizontally
✔ fail-safe mode

VI

Phi chức năng

✔ HA
✔ latency cực thấp
✔ scale tốt
✔ fail-safe khi cache lỗi

Chức năng

✔ limit theo user
✔ theo IP
✔ theo API
✔ cho phép burst nhỏ
✔ consistency trong môi trường phân tán

3) Core Algorithms / Thuật toán chính

A) Fixed Window Counter (Cửa sổ tĩnh)

EN

Simple integer counter reset every N seconds.

Pros: easy
Cons: boundary burst problem

VI

Dễ làm nhưng có lỗi burst ở rìa cửa sổ.

B) Sliding Window Log

EN

Store a timestamp log of requests.

Pros: accurate
Cons: heavy memory

VI

Chính xác nhưng tốn RAM, khó dùng cho scale lớn.

C) Sliding Window Counter (Production-grade)

EN

Blend of fixed + sliding:

rate = older_bucket_weight + current_bucket_weight

VI

Dùng 2 bucket thời gian để tính trơn mượt.

D) Token Bucket (Most widely used)

EN

Bucket has tokens → every request consumes 1 token.
Tokens refill at constant rate.

Pros:
✔ supports burst
✔ smooth control
✔ predictable

VI

Bucket có token, mỗi request tiêu 1 token.

Ưu:
✔ hỗ trợ burst
✔ kiểm soát mượt
✔ dự đoán được

E) Leaky Bucket

EN

Requests leak out at constant rate.

Good for enforcing stable throughput.

VI

Giới hạn tốc độ xả ra của request.

4) Distributed Architecture / Kiến trúc phân tán

Client → API Gateway → Rate Limit Service → Redis/Memcached Cluster → Backend

VI

Client → Gateway → Rate Limit Service → Redis Cluster → Backend

5) Redis-based Distributed Rate Limiter

EN

Why Redis?

✔ atomic operations
✔ TTL support
✔ fast
✔ clusterable

Typical Redis commands:

INCR
EXPIRE
GETSET
LUA script for atomic algorithms

VI

Redis phù hợp vì:

✔ atomic
✔ TTL
✔ rất nhanh
✔ cluster

Dùng lệnh INCR, EXPIRE, hoặc Lua để đảm bảo atomic.

6) Implementing Token Bucket in Redis

EN

Variables stored in Redis:

tokens
last_refill_timestamp

Lua script:

Calculate new tokens
If enough tokens → allow
Else → reject

VI

Redis lưu:

tokens
last_refill_timestamp

Lua script tính:

refill
nếu đủ token → cho qua
không đủ → chặn

7) Clustered Design / Kiến trúc cluster

EN

Use:

consistent hashing to distribute users across Redis shards
replication for HA
fallback local limiter when Redis fails

VI

Dùng:

consistent hash để chia user theo shard
replica đảm bảo HA
local fallback rate limit khi Redis down

8) Multi-Tenant Rate Limiting

EN

Different tenants (clients) have different limits:

tenantA → 1000 req/min
tenantB → 50,000 req/min
tenantC → pay-per-use burst

VI

Multi-tenant:

tenant A → 1000 req/min
tenant B → 50,000 req/min
tenant C → trả tiền theo lượt

9) Per-EndPoint Rate Limit

EN

Each API has a different SLA:

/login → strict limit  
/search → burst-friendly  
/payment → tighten security

VI

Mỗi endpoint limit khác nhau:

/login → giới hạn chặt  
/search → cho burst  
/payment → đặc biệt nghiêm ngặt

10) Rate Limiting at API Gateway

EN

Common systems:

Kong
NGINX + Lua
Envoy
Istio

Gateway enforces distributed limits via Redis.

VI

Thường triển khai tại API Gateway:

Kong
NGINX
Envoy
Istio

Gateway dùng Redis để check rate.

11) Handling Failures / Khi hệ thống lỗi

EN

If Redis down:

fail-open (allow requests)
fail-close (deny requests)
prefer hybrid mode based on SLA

VI

Redis hỏng:

fail-open (cho tất cả qua)
fail-close (chặn tất cả)
hoặc hybrid theo SLA

12) Anti-abuse & Detection

EN

Combine rate limiter with:

IP reputation
geo rules
device fingerprinting
anomaly detection

VI

Kết hợp rate limit với:

đánh giá IP
geo
device fingerprint
anomaly detection

13) Observability / Giám sát

EN

Track:

throttle count
token bucket refill rate
per-user usage
latency added by rate limiter

VI

Theo dõi:

số request bị block
tốc độ refill token
usage theo user
latency từ rate limiter

14) Choosing the Right Algorithm / Chọn thuật toán phù hợp

EN

Use Case	Algorithm
API requests	Token Bucket
Payment	Fixed Window / Strict
Streaming events	Leaky Bucket
UI search bar	Sliding Window

VI

Use case	Thuật toán
API	Token Bucket
Thanh toán	Fixed Window
Streaming	Leaky Bucket
Search bar	Sliding Window