Database Recovery WAL, Checkpointing, ARIES: Tutorial, Examples, FAQs & Interview Tips

Types of Failures

A DBMS must be able to recover from various types of failures while maintaining the ACID properties of transactions:

Failure Type	Description	Recovery Method
Transaction Failure	Logical error (divide by zero, constraint violation) or system error (deadlock)	Transaction rollback (UNDO)
System Crash	Power failure, OS crash - volatile memory (buffer) is lost, disk is intact	Log-based recovery (REDO/UNDO)
Disk Failure	Head crash, bad sectors - disk data is lost or corrupted	Backup + archive log restore
Network Failure	Communication failure in distributed systems	Two-phase commit, retry protocols

Log-Based Recovery

The most common recovery technique. Every database modification is recorded in a log (also called a journal or write-ahead log) before it is applied to the database.

Each log record contains:

Transaction ID - which transaction made the change
Data item - which data was modified
Old value (before image) - value before the change (for UNDO)
New value (after image) - value after the change (for REDO)
Log record type - START, COMMIT, ABORT, UPDATE

Write-Ahead Logging (WAL)

The Write-Ahead Logging (WAL) protocol is the foundation of log-based recovery. It has two rules:

Before a data item is written to disk, the UNDO portion of its log record must be written to stable storage (log).
Before a transaction commits, all its log records (including the COMMIT record) must be written to stable storage.

This ensures that if a crash occurs, the log always has enough information to either redo committed transactions or undo uncommitted ones.

REDO and UNDO Operations

Operation	Purpose	When Applied
REDO	Re-apply changes of committed transactions that may not have been written to disk	Transaction has COMMIT in log but changes may be in buffer only
UNDO	Reverse changes of uncommitted transactions	Transaction has no COMMIT in log (was in progress when crash occurred)

Checkpointing

Without checkpoints, recovery would require scanning the entire log from the beginning. A checkpoint is a snapshot of the database state written to disk at regular intervals.

Checkpoint process:

Suspend all new transactions temporarily
Write all dirty buffer pages to disk
Write a CHECKPOINT record to the log
Resume transactions

Recovery with checkpoints: Scan the log only from the most recent checkpoint. Transactions that committed after the checkpoint need REDO; transactions that were active at the checkpoint and didn't commit need UNDO.

Shadow Paging

Shadow paging is an alternative to log-based recovery. The database maintains two page tables:

Current page table: Points to the current (modified) pages in the buffer
Shadow page table: Points to the stable (pre-transaction) pages on disk

On commit, the current page tl-table becomes the new shadow. On abort, simply discard the current page tl-table and restore the shadow. Disadvantage: Causes data fragmentation and is less efficient than WAL for most workloads.

ARIES Recovery Algorithm

ARIES (Algorithm for Recovery and Isolation Exploiting Semantics) is the industry-standard recovery algorithm used by most modern DBMS (IBM DB2, SQL Server, PostgreSQL). It has three phases:

Analysis Phase: Scan the log forward from the last checkpoint to identify which transactions were active at the time of crash and which pages were dirty.
REDO Phase: Scan the log forward from the earliest dirty page and redo all logged operations to bring the database to the state at the time of crash.
UNDO Phase: Scan the log backward and undo all operations of transactions that were active (uncommitted) at the time of crash.

Backup Strategies

Backup Type	Description	Recovery Time	Storage
Full Backup	Complete copy of the entire database	Fastest restore	Largest
Incremental Backup	Only changes since the last backup (full or incremental)	Slowest restore (chain of backups)	Smallest
Differential Backup	All changes since the last full backup	Medium restore (full + one differential)	Medium
Archive Log Backup	Backup of transaction logs for point-in-time recovery	Enables recovery to any point in time	Varies

Previous Next