1) Problem Clarification / Làm rõ bài toán
EN
We need to understand how data is divided to scale systems. People often confuse sharding and partitioning, but they aren’t the same.
VI
Cần hiểu cách chia dữ liệu để scale hệ thống.
Nhiều người nhầm sharding và partitioning, nhưng không giống nhau.
2) Definition / Định nghĩa
EN
✔ Partitioning = dividing data inside a single node or logical instance
✔ Sharding = distributing data across multiple nodes / servers
VI
✔ Partitioning = chia data bên trong 1 node / instance
✔ Sharding = chia data sang nhiều node / nhiều server
3) Why Needed? / Tại sao cần?
EN
- reduce contention
- improve performance
- scale storage capacity
- reduce index scanning
VI
- giảm tranh chấp
- tăng performance
- mở rộng dung lượng
- giảm index scan
4) Partitioning Types / Loại partitioning
EN
Range partitioning — based on value intervals
Hash partitioning — distribute evenly via hash
List partitioning — categorical grouping
Works inside one DB instance.
VI
Partitioning bên trong DB:
- range
- hash
- list (theo category)
5) Sharding Types / Loại sharding
EN
Vertical sharding: split features/tables
Horizontal sharding: split rows across nodes
Horizontal sharding = scaled-out database.
VI
Vertical sharding: tách bảng theo domain
Horizontal sharding: chia hàng sang nhiều node
Horizontal sharding = database scale-out.
6) Routing Layer / Lớp định tuyến
EN
Sharding requires routing logic:
- client-side routing
- proxy router
- central shard-map lookup
VI
Sharding cần routing:
- client routing
- proxy router
- lookup shard map
Partitioning does not need routing.
7) Example / Ví dụ minh họa
EN
Partitioning usage:
A PostgreSQL table partitioned by month for reporting.
Sharding usage:
User IDs distributed across 32 MySQL clusters.
VI
Partitioning:
1 table PostgreSQL partition theo tháng.
Sharding:
User ID chia sang 32 cụm MySQL khác nhau.
8) Failover & Consistency / Lỗi & nhất quán
EN
Partitioning failure: local failure of one partition but still same DB instance.
Sharding failure: node lost → subset of users/data unavailable.
VI
Partitioning lỗi = lỗi nội bộ DB.
Sharding lỗi = 1 node mất → 1 phần dữ liệu mất.
9) Migration & Rebalancing / Migration & cân bằng lại
EN
Partitioning migration: merging/splitting partitions within 1 DB.
Sharding migration: resharding user base → expensive and operationally complex.
VI
Partitioning migration = gộp/chia trong một database.
Sharding migration = chia lại dữ liệu giữa nhiều DB → phức tạp.
10) Access Patterns / Pattern truy cập
EN
Partitioning improves index locality.
Sharding requires:
- cross-shard joins forbidden
- fan-out queries
- query coordinator
VI
Partitioning tăng locality trong DB.
Sharding bắt buộc:
- tránh join cross shard
- fan-out query
- coordinator query
11) When to use Partitioning? / Khi dùng partitioning?
EN
- large table performance
- time-series
- archived records
VI
Dùng partitioning khi:
- bảng lớn
- time-series
- archive
12) When to use Sharding? / Khi dùng sharding?
EN
- DB instance runs out of capacity
- millions of users
- write-heavy workloads
VI
Dùng sharding khi:
- 1 DB không chứa nổi nữa
- user rất lớn
- workload write-heavy
13) Combined Approach / Kết hợp cả hai
EN
Real systems use both:
Example:
Cassandra = sharded + partitioned
Clickhouse = sharded cluster + partitioned storage
VI
Thực tế dùng cả hai:
Ví dụ:
Cassandra = sharding + partitioning
Clickhouse = sharding + partitioning
14) Architecture Lessons / Bài học kiến trúc
EN
- partition first, shard later
- avoid premature sharding
- catalog service required for routing
- avoid cross-shard joins
VI
- partition trước, shard khi cần
- tránh sharding quá sớm
- cần catalog routing
- tránh join cross shard
15) Diagram Summary / Tóm tắt bằng sơ đồ
EN
Partitioning:
DB instance
└── table partition A
└── table partition B
└── table partition C
Sharding:
Cluster
├── DB shard 1
├── DB shard 2
└── DB shard 3
VI
Partitioning:
1 DB
└── partition A
└── partition B
Sharding:
Cluster
├── node 1
├── node 2
└── node 3
[…] Sharding vs Partitioning […]