A distributed database is a collection of multiple, logically interrelated databases distributed over a computer network. Users interact with it as if it were a single database, but data is physically stored across multiple sites (nodes).
Key advantages:
Fragmentation divides a relation into smaller pieces stored at different sites:
| Type | Description | Example |
|---|---|---|
| Horizontal Fragmentation | Rows are divided among sites (like partitioning) | Customers in US stored at US site; EU customers at EU site |
| Vertical Fragmentation | Columns are divided among sites | Employee name/dept at HQ; salary/benefits at HR site |
| Mixed Fragmentation | Combination of horizontal and vertical | US customers' names at US site; US customers' orders at order site |
Replication stores copies of data at multiple sites to improve availability and read performance.
| Strategy | Description | Trade-off |
|---|---|---|
| Full Replication | Every site has a complete copy of the database | Best read performance; expensive writes (update all copies) |
| No Replication | Each fragment stored at exactly one site | No redundancy; site failure = data unavailable |
| Partial Replication | Some fragments replicated, others not | Balance between availability and update cost |
| Synchronous Replication | All replicas updated before transaction commits | Strong consistency; higher latency |
| Asynchronous Replication | Primary commits first; replicas updated later | Lower latency; possible stale reads |
The Two-Phase Commit protocol ensures atomicity of distributed transactions - either all sites commit or all abort.
Phase 1 - Prepare (Voting):
Phase 2 - Commit/Abort:
Problem: 2PC is a blocking protocol - if the coordinator crashes after Phase 1, participants are blocked waiting for a decision. This is addressed by Three-Phase Commit (3PC).
The CAP Theorem (Brewer's Theorem) states that a distributed system can guarantee at most two of the following three properties simultaneously:
| Property | Description |
|---|---|
| Consistency (C) | Every read receives the most recent write or an error. All nodes see the same data at the same time. |
| Availability (A) | Every request receives a response (not necessarily the latest data). The system is always operational. |
| Partition Tolerance (P) | The system continues to operate even when network partitions (communication failures between nodes) occur. |
Since network partitions are unavoidable in distributed systems, the real choice is between CP (consistency + partition tolerance) and AP (availability + partition tolerance):
| Property | ACID (Traditional RDBMS) | BASE (NoSQL / Distributed) |
|---|---|---|
| Consistency | Strong consistency - always consistent | Basically Available - may be temporarily inconsistent |
| State | Consistent after every transaction | Soft state - state may change over time without input |
| Availability | May sacrifice availability for consistency | Eventually consistent - will become consistent over time |
| Use case | Banking, financial systems, ERP | Social media, e-commerce, real-time analytics |
| Examples | MySQL, PostgreSQL, Oracle | Cassandra, DynamoDB, MongoDB |
Explore 500+ free tutorials across 20+ languages and frameworks.