1) Problem Clarification / Làm rõ bài toán
EN
Distributed SQL databases combine:
- ACID transactions
- SQL query support
- horizontal scalability
- multi-region replication
They behave like PostgreSQL but scale like a NoSQL system.
VI
Distributed SQL là sự kết hợp giữa:
- ACID
- SQL
- scale ngang nhiều node
- replicate đa vùng
Trông giống PostgreSQL nhưng scale như NoSQL.
2) High-Level Architecture
Client → Query Router → Distributed KV Store → Raft/Paxos Groups → Replicated Shards
↓
SQL Layer (Parser, Optimizer, Planner)
VI
Client → Router → Distributed KV → Raft groups → Shards
Ở trên là lớp SQL (parser/optimizer).
3) Key Concepts (khái niệm cốt lõi)
EN
A) Sharding
Data split into ranges or tablets.
B) Replication
Each shard replicated via Raft or Paxos.
C) Consensus
Ensures safety & leader-based writes.
D) Distributed Query Execution
SQL runs across multiple shards.
VI
A) Sharding
Dữ liệu chia thành range/tablet.
B) Replication
Mỗi shard replicate bằng Raft/Paxos.
C) Consensus
Đảm bảo an toàn dữ liệu, leader-based write.
D) Query phân tán
SQL chạy qua nhiều shard.
4) Data Sharding Model
EN
Spanner, Yugabyte, CockroachDB shard data by key range:
[0–100), [100–200), ...
Hot shards auto-split.
VI
Sharding theo range key:
[0–100), [100–200) …
Shard nóng sẽ auto-split.
5) Replication via Consensus
EN
Each shard has:
- 1 leader
- 2–4 followers
Writes go to leader → consensus commit → followers updated.
VI
Mỗi shard có:
- 1 leader
- 2–4 follower
Ghi qua leader → consensus → replicate.
6) Google Spanner’s TrueTime (Magic Component)
EN
Problem: clock skew across machines affects transaction ordering.
Spanner solves with TrueTime API:
TT.now() returns [earliest, latest]
Guarantees bounded clock uncertainty (ɛ).
This enables external consistency (strict serializability).
VI
Vấn đề: đồng hồ lệch giữa máy → ảnh hưởng thứ tự transaction.
Spanner dùng TrueTime:
TT.now() → [earliest, latest]
TrueTime có sai số ɛ được đo lường.
Cho phép Spanner đạt external consistency.
7) Timestamp-Based Transactions
EN
Commit wait:
Database waits until:
current_time > commit_timestamp uncertainty window
Ensures no future transaction can appear earlier.
VI
Commit phải chờ:
time hiện tại > commit_timestamp + uncertainty
Đảm bảo thứ tự giao dịch đúng tuyệt đối.
8) Distributed SQL Query Execution
EN
Queries run in parallel:
- local index lookups
- remote shard scan
- distributed joins
- distributed aggregate
Query planner chooses best strategy.
VI
Query chạy phân tán:
- lookup local
- shard scan
- join phân tán
- aggregate phân tán
Planner chọn chiến lược tối ưu.
9) Secondary Indexes in a Distributed SQL DB
EN
Indexes also sharded & replicated.
Two types:
- Local index (same shard as data)
- Global index (sharded independently)
Problems:
- index write amplification
- index → cross-shard consistency
VI
Index cũng sharded & replicated.
2 loại:
- Local index
- Global index
Vấn đề:
- write amplification
- consistency cross-shard
10) Multi-Region Replication
EN
Modes:
- Read-local / Write-global
- Multi-primary (geo-distributed)
- Follower reads for low-latency queries
VI
Kiểu replication:
- đọc local / ghi global
- multi-primary
- follower read để giảm latency
11) Transactions in Distributed SQL
EN
Transactions use:
- 2PL (two-phase locking)
- MVCC (multi-version concurrency control)
- timestamp ordering
Most use MVCC for snapshot isolation.
VI
Transaction dùng:
- 2PL
- MVCC
- timestamp ordering
Hầu hết dùng MVCC.
12) ACID in Distributed System
EN
✔ Atomicity → via MVCC + consensus
✔ Consistency → schema enforcement
✔ Isolation → snapshot isolation / serializable
✔ Durability → log replication
VI
✔ Nguyên tử → MVCC + consensus
✔ Nhất quán → schema
✔ Cô lập → SI/Serializable
✔ Bền vững → log replicate
13) Fault Tolerance
EN
Leader failure → auto re-election in Raft.
Region failure → failover to nearest healthy region.
Node failure → shard rebalancing.
VI
Leader chết → bầu leader mới.
Region lỗi → failover.
Node hỏng → rebalance shard.
14) Schema Management
EN
DDL must also be distributed:
- schema change transactions
- online schema migration
- versioned metadata
VI
DDL cũng phân tán:
- chạy qua consensus
- online migration
- version metadata
15) Scaling Strategy
EN
Scale by:
- adding nodes → auto rebalancing
- splitting hot ranges
- scattering leaders across nodes
- optimizing locality (geo-partitioning)
VI
Scale bằng:
- thêm node → tự rebalance
- split shard nóng
- phân leader đều
- geo-partitioning
16) Observability
EN
Metrics:
- Raft log lag
- commit latency
- hot shard detection
- contention rate
- read/write amplification
VI
Theo dõi:
- độ trễ log Raft
- latency commit
- shard nóng
- contention
- write/read amplification
17) Real-World Examples
Google Spanner
- TrueTime
- externally consistent
- global scale
Yugabyte
- PostgreSQL-compatible
- Raft-based
- high performance
CockroachDB
- auto-sharding
- PostgreSQL wire protocol
- strong consistency
VI
Spanner
- TrueTime
- consistency mạnh toàn cầu
Yugabyte
- tương thích PostgreSQL
- hiệu năng cao
CockroachDB
- tự sharding
- dùng Raft
- mạnh về consistency
[…] Designing Distributed SQL Database (Google Spanner / Yugabyte / CockroachDB) […]