Database Recovery
Types of Failures
A DBMS must be able to recover from various types of failures while maintaining the ACID properties of transactions:
| Failure Type | Description | Recovery Method |
|---|---|---|
| Transaction Failure | Logical error (divide by zero, constraint violation) or system error (deadlock) | Transaction rollback (UNDO) |
| System Crash | Power failure, OS crash — volatile memory (buffer) is lost, disk is intact | Log-based recovery (REDO/UNDO) |
| Disk Failure | Head crash, bad sectors — disk data is lost or corrupted | Backup + archive log restore |
| Network Failure | Communication failure in distributed systems | Two-phase commit, retry protocols |
Log-Based Recovery
The most common recovery technique. Every database modification is recorded in a log (also called a journal or write-ahead log) before it is applied to the database.
Each log record contains:
- Transaction ID — which transaction made the change
- Data item — which data was modified
- Old value (before image) — value before the change (for UNDO)
- New value (after image) — value after the change (for REDO)
- Log record type — START, COMMIT, ABORT, UPDATE
Write-Ahead Logging (WAL)
The Write-Ahead Logging (WAL) protocol is the foundation of log-based recovery. It has two rules:
- Before a data item is written to disk, the UNDO portion of its log record must be written to stable storage (log).
- Before a transaction commits, all its log records (including the COMMIT record) must be written to stable storage.
This ensures that if a crash occurs, the log always has enough information to either redo committed transactions or undo uncommitted ones.
REDO and UNDO Operations
| Operation | Purpose | When Applied |
|---|---|---|
| REDO | Re-apply changes of committed transactions that may not have been written to disk | Transaction has COMMIT in log but changes may be in buffer only |
| UNDO | Reverse changes of uncommitted transactions | Transaction has no COMMIT in log (was in progress when crash occurred) |
Checkpointing
Without checkpoints, recovery would require scanning the entire log from the beginning. A checkpoint is a snapshot of the database state written to disk at regular intervals.
Checkpoint process:
- Suspend all new transactions temporarily
- Write all dirty buffer pages to disk
- Write a CHECKPOINT record to the log
- Resume transactions
Recovery with checkpoints: Scan the log only from the most recent checkpoint. Transactions that committed after the checkpoint need REDO; transactions that were active at the checkpoint and didn't commit need UNDO.
Shadow Paging
Shadow paging is an alternative to log-based recovery. The database maintains two page tables:
- Current page table: Points to the current (modified) pages in the buffer
- Shadow page table: Points to the stable (pre-transaction) pages on disk
On commit, the current page table becomes the new shadow. On abort, simply discard the current page table and restore the shadow. Disadvantage: Causes data fragmentation and is less efficient than WAL for most workloads.
ARIES Recovery Algorithm
ARIES (Algorithm for Recovery and Isolation Exploiting Semantics) is the industry-standard recovery algorithm used by most modern DBMS (IBM DB2, SQL Server, PostgreSQL). It has three phases:
- Analysis Phase: Scan the log forward from the last checkpoint to identify which transactions were active at the time of crash and which pages were dirty.
- REDO Phase: Scan the log forward from the earliest dirty page and redo all logged operations to bring the database to the state at the time of crash.
- UNDO Phase: Scan the log backward and undo all operations of transactions that were active (uncommitted) at the time of crash.
Backup Strategies
| Backup Type | Description | Recovery Time | Storage |
|---|---|---|---|
| Full Backup | Complete copy of the entire database | Fastest restore | Largest |
| Incremental Backup | Only changes since the last backup (full or incremental) | Slowest restore (chain of backups) | Smallest |
| Differential Backup | All changes since the last full backup | Medium restore (full + one differential) | Medium |
| Archive Log Backup | Backup of transaction logs for point-in-time recovery | Enables recovery to any point in time | Varies |
Ready to Level Up Your Skills?
Explore 500+ free tutorials across 20+ languages and frameworks.