Tutorials Logic, IN +91 8092939553 info@tutorialslogic.com
FAQs Support
Navigation
Home About Us Contact Us Blogs FAQs
Tutorials
All Tutorials
Services
Academic Projects Resume Writing Interview Questions Website Development
Compiler Tutorials

Distributed DBMS

What is a Distributed Database?

A distributed database is a collection of multiple, logically interrelated databases distributed over a computer network. Users interact with it as if it were a single database, but data is physically stored across multiple sites (nodes).

Key advantages:

  • Improved performance through local data access
  • Higher availability — failure of one node doesn't bring down the system
  • Scalability — add more nodes to handle more data/load
  • Geographic distribution — data closer to users

Data Fragmentation

Fragmentation divides a relation into smaller pieces stored at different sites:

TypeDescriptionExample
Horizontal FragmentationRows are divided among sites (like partitioning)Customers in US stored at US site; EU customers at EU site
Vertical FragmentationColumns are divided among sitesEmployee name/dept at HQ; salary/benefits at HR site
Mixed FragmentationCombination of horizontal and verticalUS customers' names at US site; US customers' orders at order site

Data Replication

Replication stores copies of data at multiple sites to improve availability and read performance.

StrategyDescriptionTrade-off
Full ReplicationEvery site has a complete copy of the databaseBest read performance; expensive writes (update all copies)
No ReplicationEach fragment stored at exactly one siteNo redundancy; site failure = data unavailable
Partial ReplicationSome fragments replicated, others notBalance between availability and update cost
Synchronous ReplicationAll replicas updated before transaction commitsStrong consistency; higher latency
Asynchronous ReplicationPrimary commits first; replicas updated laterLower latency; possible stale reads

Two-Phase Commit (2PC)

The Two-Phase Commit protocol ensures atomicity of distributed transactions — either all sites commit or all abort.

Phase 1 — Prepare (Voting):

  1. The coordinator sends a PREPARE message to all participants.
  2. Each participant writes a PREPARE record to its log and replies VOTE-COMMIT (ready) or VOTE-ABORT (cannot commit).

Phase 2 — Commit/Abort:

  1. If all participants voted COMMIT, the coordinator sends COMMIT to all. Otherwise, it sends ABORT.
  2. Each participant commits or aborts and sends an ACK to the coordinator.
  3. The coordinator writes a COMPLETE record to its log.

Problem: 2PC is a blocking protocol — if the coordinator crashes after Phase 1, participants are blocked waiting for a decision. This is addressed by Three-Phase Commit (3PC).

CAP Theorem

The CAP Theorem (Brewer's Theorem) states that a distributed system can guarantee at most two of the following three properties simultaneously:

PropertyDescription
Consistency (C)Every read receives the most recent write or an error. All nodes see the same data at the same time.
Availability (A)Every request receives a response (not necessarily the latest data). The system is always operational.
Partition Tolerance (P)The system continues to operate even when network partitions (communication failures between nodes) occur.

Since network partitions are unavoidable in distributed systems, the real choice is between CP (consistency + partition tolerance) and AP (availability + partition tolerance):

  • CP systems: MongoDB, HBase, ZooKeeper — sacrifice availability during partitions
  • AP systems: Cassandra, CouchDB, DynamoDB — sacrifice consistency during partitions
  • CA systems: Traditional RDBMS (MySQL, PostgreSQL) — not partition tolerant (single node)

BASE vs ACID

PropertyACID (Traditional RDBMS)BASE (NoSQL / Distributed)
ConsistencyStrong consistency — always consistentBasically Available — may be temporarily inconsistent
StateConsistent after every transactionSoft state — state may change over time without input
AvailabilityMay sacrifice availability for consistencyEventually consistent — will become consistent over time
Use caseBanking, financial systems, ERPSocial media, e-commerce, real-time analytics
ExamplesMySQL, PostgreSQL, OracleCassandra, DynamoDB, MongoDB

Previous Next

Ready to Level Up Your Skills?

Explore 500+ free tutorials across 20+ languages and frameworks.