Distributed DBMS
What is a Distributed Database?
A distributed database is a collection of multiple, logically interrelated databases distributed over a computer network. Users interact with it as if it were a single database, but data is physically stored across multiple sites (nodes).
Key advantages:
- Improved performance through local data access
- Higher availability — failure of one node doesn't bring down the system
- Scalability — add more nodes to handle more data/load
- Geographic distribution — data closer to users
Data Fragmentation
Fragmentation divides a relation into smaller pieces stored at different sites:
| Type | Description | Example |
|---|---|---|
| Horizontal Fragmentation | Rows are divided among sites (like partitioning) | Customers in US stored at US site; EU customers at EU site |
| Vertical Fragmentation | Columns are divided among sites | Employee name/dept at HQ; salary/benefits at HR site |
| Mixed Fragmentation | Combination of horizontal and vertical | US customers' names at US site; US customers' orders at order site |
Data Replication
Replication stores copies of data at multiple sites to improve availability and read performance.
| Strategy | Description | Trade-off |
|---|---|---|
| Full Replication | Every site has a complete copy of the database | Best read performance; expensive writes (update all copies) |
| No Replication | Each fragment stored at exactly one site | No redundancy; site failure = data unavailable |
| Partial Replication | Some fragments replicated, others not | Balance between availability and update cost |
| Synchronous Replication | All replicas updated before transaction commits | Strong consistency; higher latency |
| Asynchronous Replication | Primary commits first; replicas updated later | Lower latency; possible stale reads |
Two-Phase Commit (2PC)
The Two-Phase Commit protocol ensures atomicity of distributed transactions — either all sites commit or all abort.
Phase 1 — Prepare (Voting):
- The coordinator sends a PREPARE message to all participants.
- Each participant writes a PREPARE record to its log and replies VOTE-COMMIT (ready) or VOTE-ABORT (cannot commit).
Phase 2 — Commit/Abort:
- If all participants voted COMMIT, the coordinator sends COMMIT to all. Otherwise, it sends ABORT.
- Each participant commits or aborts and sends an ACK to the coordinator.
- The coordinator writes a COMPLETE record to its log.
Problem: 2PC is a blocking protocol — if the coordinator crashes after Phase 1, participants are blocked waiting for a decision. This is addressed by Three-Phase Commit (3PC).
CAP Theorem
The CAP Theorem (Brewer's Theorem) states that a distributed system can guarantee at most two of the following three properties simultaneously:
| Property | Description |
|---|---|
| Consistency (C) | Every read receives the most recent write or an error. All nodes see the same data at the same time. |
| Availability (A) | Every request receives a response (not necessarily the latest data). The system is always operational. |
| Partition Tolerance (P) | The system continues to operate even when network partitions (communication failures between nodes) occur. |
Since network partitions are unavoidable in distributed systems, the real choice is between CP (consistency + partition tolerance) and AP (availability + partition tolerance):
- CP systems: MongoDB, HBase, ZooKeeper — sacrifice availability during partitions
- AP systems: Cassandra, CouchDB, DynamoDB — sacrifice consistency during partitions
- CA systems: Traditional RDBMS (MySQL, PostgreSQL) — not partition tolerant (single node)
BASE vs ACID
| Property | ACID (Traditional RDBMS) | BASE (NoSQL / Distributed) |
|---|---|---|
| Consistency | Strong consistency — always consistent | Basically Available — may be temporarily inconsistent |
| State | Consistent after every transaction | Soft state — state may change over time without input |
| Availability | May sacrifice availability for consistency | Eventually consistent — will become consistent over time |
| Use case | Banking, financial systems, ERP | Social media, e-commerce, real-time analytics |
| Examples | MySQL, PostgreSQL, Oracle | Cassandra, DynamoDB, MongoDB |
Ready to Level Up Your Skills?
Explore 500+ free tutorials across 20+ languages and frameworks.