A DBMS must be able to recover from various types of failures while maintaining the ACID properties of transactions:
| Failure Type | Description | Recovery Method |
|---|---|---|
| Transaction Failure | Logical error (divide by zero, constraint violation) or system error (deadlock) | Transaction rollback (UNDO) |
| System Crash | Power failure, OS crash - volatile memory (buffer) is lost, disk is intact | Log-based recovery (REDO/UNDO) |
| Disk Failure | Head crash, bad sectors - disk data is lost or corrupted | Backup + archive log restore |
| Network Failure | Communication failure in distributed systems | Two-phase commit, retry protocols |
The most common recovery technique. Every database modification is recorded in a log (also called a journal or write-ahead log) before it is applied to the database.
Each log record contains:
The Write-Ahead Logging (WAL) protocol is the foundation of log-based recovery. It has two rules:
This ensures that if a crash occurs, the log always has enough information to either redo committed transactions or undo uncommitted ones.
| Operation | Purpose | When Applied |
|---|---|---|
| REDO | Re-apply changes of committed transactions that may not have been written to disk | Transaction has COMMIT in log but changes may be in buffer only |
| UNDO | Reverse changes of uncommitted transactions | Transaction has no COMMIT in log (was in progress when crash occurred) |
Without checkpoints, recovery would require scanning the entire log from the beginning. A checkpoint is a snapshot of the database state written to disk at regular intervals.
Checkpoint process:
Recovery with checkpoints: Scan the log only from the most recent checkpoint. Transactions that committed after the checkpoint need REDO; transactions that were active at the checkpoint and didn't commit need UNDO.
Shadow paging is an alternative to log-based recovery. The database maintains two page tables:
On commit, the current page tl-table becomes the new shadow. On abort, simply discard the current page tl-table and restore the shadow. Disadvantage: Causes data fragmentation and is less efficient than WAL for most workloads.
ARIES (Algorithm for Recovery and Isolation Exploiting Semantics) is the industry-standard recovery algorithm used by most modern DBMS (IBM DB2, SQL Server, PostgreSQL). It has three phases:
| Backup Type | Description | Recovery Time | Storage |
|---|---|---|---|
| Full Backup | Complete copy of the entire database | Fastest restore | Largest |
| Incremental Backup | Only changes since the last backup (full or incremental) | Slowest restore (chain of backups) | Smallest |
| Differential Backup | All changes since the last full backup | Medium restore (full + one differential) | Medium |
| Archive Log Backup | Backup of transaction logs for point-in-time recovery | Enables recovery to any point in time | Varies |
Explore 500+ free tutorials across 20+ languages and frameworks.