1) Problem Clarification / Làm rõ bài toán
EN
We need a distributed notification service that pushes alerts to mobile/web users.
Questions to clarify:
- One-to-one push? broadcast? segmented push?
- Do we need delivery guarantees?
- Do we support silent pushes?
- Platform targets (APNs, FCM, WebSockets)?
- Do we allow scheduling?
- Are retries required?
VI
Thiết kế hệ thống push thông báo đến mobile/web.
Cần hỏi rõ:
- Push cá nhân? broadcast? theo segment?
- Yêu cầu delivery guarantee không?
- Có hỗ trợ silent push không?
- Target: APNs, FCM, Web push?
- Có scheduling không?
- Có retry không?
2) Requirements / Yêu cầu hệ thống
EN – Functional
✔ Device registration
✔ Send push to users / segments
✔ Delivery tracking
✔ Retry on failure
✔ Priority types (high vs normal)
VI – Chức năng
✔ Đăng ký thiết bị
✔ Gửi push theo user/segment
✔ Theo dõi trạng thái gửi
✔ Retry khi lỗi
✔ Mức ưu tiên (high/normal)
EN – Non-Functional
✔ Extremely scalable
✔ Low latency fan-out
✔ High delivery reliability
✔ Cost efficiency
VI – Phi chức năng
✔ Scale cực lớn
✔ Fan-out trễ thấp
✔ Đảm bảo độ tin cậy
✔ Chi phí tối ưu
3) Scale Estimation / Ước lượng tải
EN
Assume:
- 50M active users
- 10M push/min peak for broadcasts
- 40% failure retry potential
VI
Giả định:
- 50M người dùng active
- Broadcast peak 10M push/phút
- 40% retry tiềm năng
4) High-Level Architecture / Kiến trúc tổng quan
Client → Device Registry Service → Notification Service → Queue → Push Adapters → APNs/FCM/Web
↓
Status Store
EN
- Registry binds user IDs with device tokens
- Notification service prepares payload
- Queue buffers bursts
- Adapter integrates with APNs/FCM
VI
- Registry lưu token thiết bị
- Service build payload
- Queue chống burst
- Adapter call APNs/FCM/Web push
5) Device Registry / Đăng ký device
EN
Store mapping:
DEVICE (
user_id,
device_token,
platform ENUM(ios,android,web),
status,
last_activeTs
)
VI
Lưu mapping:
DEVICE (
user_id,
device_token,
platform,
status,
last_activeTs
)
Index by user_id for multi-device delivery.
6) Queue-based fan-out / Fan-out dựa trên queue
EN
Peak broadcast traffic must not overload push gateways.
=> Use Kafka / SQS:
- Partition keyed on segment or user shard
- Consumer groups scale horizontally
VI
Broadcast peak không thể push trực tiếp.
=> Kafka/SQS:
- Partition theo segment/user shard
- Consumer scale linear
7) Delivery Semantics / Đảm bảo gửi
EN
We aim at-least-once delivery:
- Retry if gateway fails
- Deduplicate using message IDs
Status store tracks:
- sent
- delivered
- opened
VI
Mục tiêu at least once:
- Retry khi gateway fail
- Dedupe theo msg_id
Status theo dõi:
- sent
- delivered
- opened
8) Push Adapter Layer / Tầng giao tiếp APNs/FCM
EN
Adapter normalizes format:
- APNs HTTP/2
- FCM REST/gRPC
- Web push (VAPID)
VI
Adapter chuyển payload:
- APNs HTTP/2
- FCM REST/gRPC
- Web push với VAPID
Retry strategy:
- exponential backoff
- jitter
- DLQ for poison payloads
9) Scheduling & Campaign Service / Lịch chiến dịch
EN
Optional: allow scheduled or recurring notification campaigns.
Offering:
- Cron-based scheduler
- Precomputed recipient list
- Fan-out through queue
VI
Tùy chọn: scheduler cho marketing/campaign.
Gồm:
- Cron scheduler
- Tính sẵn recipient list
- Fan-out qua queue
10) Rate Limiting / Giới hạn tốc độ
EN
Global limit per gateway provider
User-level rate limit to avoid spam
Segment-based throttling
VI
Limit theo gateway
Limit theo user
Throttling theo segment
11) Failure Handling / Xử lý lỗi
EN
Cases:
- Token expired → remove token
- APNs feedback list → token invalidation
- Burst spike → queue buffer
- Queue outage → degrade mode
VI
Các lỗi:
- Token hết hạn → xóa mapping
- APNs feedback nhận token chết → xóa token
- Peak burst → đệm qua queue
- Queue chết → degrade mode
12) Observability / Giám sát
EN
Monitor:
- delivery success rate
- queue lag
- retry count
- device invalidation rate
- notification latency
VI
Giám sát:
- delivery success rate
- queue lag
- số retry
- rate invalid token
- latency push
13) Cost Considerations / Tối ưu chi phí
EN
- Batch sending to FCM/APNs
- Priority-based routing
- Cold storage analytics
- Lower retention for raw logs
VI
- Push batch
- Routing ưu tiên
- Storage tier phân cấp
- Retention thấp cho raw logs
14) Future Enhancements / Mở rộng tương lai
EN
- ML-based targeting
- Rule-based segmentation
- A/B testing notifications
- Per-device QoS
VI
- Targeting bằng ML
- Segment rule-based
- A/B testing
- QoS theo device
Be First to Comment