Designing Distributed SQL Database (Google Spanner / Yugabyte / CockroachDB)

1) Problem Clarification / Làm rõ bài toán

EN

Distributed SQL databases combine:

ACID transactions
SQL query support
horizontal scalability
multi-region replication

They behave like PostgreSQL but scale like a NoSQL system.

VI

Distributed SQL là sự kết hợp giữa:

ACID
SQL
scale ngang nhiều node
replicate đa vùng

Trông giống PostgreSQL nhưng scale như NoSQL.

2) High-Level Architecture

Client → Query Router → Distributed KV Store → Raft/Paxos Groups → Replicated Shards
                 ↓
              SQL Layer (Parser, Optimizer, Planner)

VI

Client → Router → Distributed KV → Raft groups → Shards
Ở trên là lớp SQL (parser/optimizer).

3) Key Concepts (khái niệm cốt lõi)

EN

A) Sharding

Data split into ranges or tablets.

B) Replication

Each shard replicated via Raft or Paxos.

C) Consensus

Ensures safety & leader-based writes.

D) Distributed Query Execution

SQL runs across multiple shards.

VI

A) Sharding

Dữ liệu chia thành range/tablet.

B) Replication

Mỗi shard replicate bằng Raft/Paxos.

C) Consensus

Đảm bảo an toàn dữ liệu, leader-based write.

D) Query phân tán

SQL chạy qua nhiều shard.

4) Data Sharding Model

EN

Spanner, Yugabyte, CockroachDB shard data by key range:

[0–100), [100–200), ...

Hot shards auto-split.

VI

Sharding theo range key:

[0–100), [100–200) …

Shard nóng sẽ auto-split.

5) Replication via Consensus

EN

Each shard has:

1 leader
2–4 followers

Writes go to leader → consensus commit → followers updated.

VI

Mỗi shard có:

1 leader
2–4 follower

Ghi qua leader → consensus → replicate.

6) Google Spanner’s TrueTime (Magic Component)

EN

Problem: clock skew across machines affects transaction ordering.

Spanner solves with TrueTime API:

TT.now() returns [earliest, latest]

Guarantees bounded clock uncertainty (ɛ).

This enables external consistency (strict serializability).

VI

Vấn đề: đồng hồ lệch giữa máy → ảnh hưởng thứ tự transaction.

Spanner dùng TrueTime:

TT.now() → [earliest, latest]

TrueTime có sai số ɛ được đo lường.

Cho phép Spanner đạt external consistency.

7) Timestamp-Based Transactions

EN

Commit wait:

Database waits until:

current_time > commit_timestamp uncertainty window

Ensures no future transaction can appear earlier.

VI

Commit phải chờ:

time hiện tại > commit_timestamp + uncertainty

Đảm bảo thứ tự giao dịch đúng tuyệt đối.

8) Distributed SQL Query Execution

EN

Queries run in parallel:

local index lookups
remote shard scan
distributed joins
distributed aggregate

Query planner chooses best strategy.

VI

Query chạy phân tán:

lookup local
shard scan
join phân tán
aggregate phân tán

Planner chọn chiến lược tối ưu.

9) Secondary Indexes in a Distributed SQL DB

EN

Indexes also sharded & replicated.

Two types:

Local index (same shard as data)
Global index (sharded independently)

Problems:

index write amplification
index → cross-shard consistency

VI

Index cũng sharded & replicated.

2 loại:

Local index
Global index

Vấn đề:

write amplification
consistency cross-shard

10) Multi-Region Replication

EN

Modes:

Read-local / Write-global
Multi-primary (geo-distributed)
Follower reads for low-latency queries

VI

Kiểu replication:

đọc local / ghi global
multi-primary
follower read để giảm latency

11) Transactions in Distributed SQL

EN

Transactions use:

2PL (two-phase locking)
MVCC (multi-version concurrency control)
timestamp ordering

Most use MVCC for snapshot isolation.

VI

Transaction dùng:

2PL
MVCC
timestamp ordering

Hầu hết dùng MVCC.

12) ACID in Distributed System

EN

✔ Atomicity → via MVCC + consensus
✔ Consistency → schema enforcement
✔ Isolation → snapshot isolation / serializable
✔ Durability → log replication

VI

✔ Nguyên tử → MVCC + consensus
✔ Nhất quán → schema
✔ Cô lập → SI/Serializable
✔ Bền vững → log replicate

13) Fault Tolerance

EN

Leader failure → auto re-election in Raft.
Region failure → failover to nearest healthy region.
Node failure → shard rebalancing.

VI

Leader chết → bầu leader mới.
Region lỗi → failover.
Node hỏng → rebalance shard.

14) Schema Management

EN

DDL must also be distributed:

schema change transactions
online schema migration
versioned metadata

VI

DDL cũng phân tán:

chạy qua consensus
online migration
version metadata

15) Scaling Strategy

EN

Scale by:

adding nodes → auto rebalancing
splitting hot ranges
scattering leaders across nodes
optimizing locality (geo-partitioning)

VI

Scale bằng:

thêm node → tự rebalance
split shard nóng
phân leader đều
geo-partitioning

16) Observability

EN

Metrics:

Raft log lag
commit latency
hot shard detection
contention rate
read/write amplification

VI

Theo dõi:

độ trễ log Raft
latency commit
shard nóng
contention
write/read amplification

17) Real-World Examples

Google Spanner

TrueTime
externally consistent
global scale

Yugabyte

PostgreSQL-compatible
Raft-based
high performance

CockroachDB

auto-sharding
PostgreSQL wire protocol
strong consistency

VI

Spanner

TrueTime
consistency mạnh toàn cầu

Yugabyte

tương thích PostgreSQL
hiệu năng cao

CockroachDB

tự sharding
dùng Raft
mạnh về consistency

Designing Distributed SQL Database (Google Spanner / Yugabyte / CockroachDB)

1) Problem Clarification / Làm rõ bài toán

EN

VI

2) High-Level Architecture

VI

3) Key Concepts (khái niệm cốt lõi)

EN

A) Sharding

B) Replication

C) Consensus

D) Distributed Query Execution

VI

A) Sharding

B) Replication

C) Consensus

D) Query phân tán

4) Data Sharding Model

EN

VI

5) Replication via Consensus

EN

VI

6) Google Spanner’s TrueTime (Magic Component)

EN

VI

7) Timestamp-Based Transactions

EN

VI

8) Distributed SQL Query Execution

EN

VI

9) Secondary Indexes in a Distributed SQL DB

EN

VI

10) Multi-Region Replication

EN

VI

11) Transactions in Distributed SQL

EN

VI

12) ACID in Distributed System

EN

VI

13) Fault Tolerance

EN

VI

14) Schema Management

EN

VI

15) Scaling Strategy

EN

VI

16) Observability

EN

VI

17) Real-World Examples

Google Spanner

Yugabyte

CockroachDB

VI

Spanner

Yugabyte

CockroachDB

One Comment

Leave a Reply Cancel reply