Top 50 DBMS Interview Questions
Curated questions covering normalization, ACID properties, transactions, indexing, ER models, SQL, concurrency control, and database design.
What is a DBMS?
A Database Management System (DBMS) is software that manages databases - storing, retrieving, and manipulating data while providing security, integrity, and concurrency control. Examples: MySQL, PostgreSQL, Oracle, MongoDB.
What is the difference between DBMS and RDBMS?
- DBMS - stores data as files. No relationship between tables. No ACID guarantees.
- RDBMS - stores data in related tables. Enforces relationships via foreign keys. Supports ACID transactions. Examples: MySQL, PostgreSQL, Oracle.
What are ACID properties?
- Atomicity - transaction is all-or-nothing. If any part fails, entire transaction rolls back.
- Consistency - transaction brings database from one valid state to another.
- Isolation - concurrent transactions execute as if sequential. Intermediate states not visible.
- Durability - committed transactions persist even after system failure.
What is normalization? Explain 1NF, 2NF, and 3NF.
- 1NF - atomic values, no repeating groups, each row unique.
- 2NF - in 1NF + no partial dependency (every non-key attribute depends on entire primary key).
- 3NF - in 2NF + no transitive dependency (non-key attributes depend only on primary key).
- BCNF - stricter 3NF: every determinant must be a candidate key.
What is denormalization?
Denormalization intentionally introduces redundancy to improve read performance. Used for read-heavy workloads, complex JOIN queries, or analytics. Trade-off: faster reads but slower writes and more storage.
What is the difference between primary key and foreign key?
- Primary Key - uniquely identifies each row. Cannot be NULL. Only one per table.
- Foreign Key - references the primary key of another table. Enforces referential integrity. Can be NULL.
What is an ER diagram?
An Entity-Relationship (ER) diagram visually represents the database schema. Components: entities (rectangles), attributes (ovals), relationships (diamonds), cardinality (1:1, 1:N, M:N).
What are the different types of SQL commands?
- DDL - CREATE, ALTER, DROP, TRUNCATE. Defines schema.
- DML - SELECT, INSERT, UPDATE, DELETE. Manipulates data.
- DCL - GRANT, REVOKE. Controls access.
- TCL - COMMIT, ROLLBACK, SAVEPOINT. Manages transactions.
What is the difference between DELETE, TRUNCATE, and DROP?
- DELETE - DML. Removes specific rows. Can be rolled back. Triggers fire.
- TRUNCATE - DDL. Removes all rows. Cannot be rolled back. Faster. Resets auto-increment.
- DROP - DDL. Removes entire table including structure. Cannot be rolled back.
What are SQL JOINs?
- INNER JOIN - rows with matching values in both tables.
- LEFT JOIN - all rows from left + matching from right (NULL if no match).
- RIGHT JOIN - all rows from right + matching from left.
- FULL OUTER JOIN - all rows from both tables.
- CROSS JOIN - Cartesian product.
- SELF JOIN - table joined with itself.
SELECT u.name, o.total
FROM users u
INNER JOIN orders o ON u.id = o.user_id;
SELECT u.name, o.total
FROM users u
LEFT JOIN orders o ON u.id = o.user_id;What is an index?
An index is a data structure (typically B-tree) that speeds up data retrieval without scanning the entire table. Trade-off: faster SELECT but slower INSERT/UPDATE/DELETE. Create indexes on frequently queried columns, foreign keys, and WHERE/ORDER BY columns.
What is the difference between clustered and non-clustered index?
- Clustered index - physically sorts and stores rows in index order. Only one per table. Primary key is usually the clustered index.
- Non-clustered index - separate structure with pointers to rows. Multiple allowed per table.
What is a transaction?
A transaction is a sequence of operations treated as a single unit. Use BEGIN/COMMIT to save changes, ROLLBACK to undo.
BEGIN TRANSACTION;
UPDATE accounts SET balance = balance - 500 WHERE id = 1;
UPDATE accounts SET balance = balance + 500 WHERE id = 2;
COMMIT;What are transaction isolation levels?
- READ UNCOMMITTED - dirty reads possible.
- READ COMMITTED - only reads committed data.
- REPEATABLE READ - same query returns same results within transaction.
- SERIALIZABLE - highest isolation. Prevents phantom reads.
What is a deadlock?
A deadlock occurs when transactions wait for each other to release locks, creating a circular dependency. Resolution: timeout, deadlock detection, or prevention by acquiring locks in consistent order.
What is the difference between WHERE and HAVING?
- WHERE - filters rows before grouping. Cannot use aggregate functions.
- HAVING - filters groups after GROUP BY. Can use aggregate functions.
SELECT department, COUNT(*) as cnt
FROM employees
WHERE salary > 30000
GROUP BY department
HAVING COUNT(*) > 5;What is a view?
A view is a virtual table based on a SELECT query. Does not store data physically. Benefits: simplifies complex queries, provides security, and presents data in a specific format.
CREATE VIEW active_users AS
SELECT id, name, email FROM users WHERE status = "active";
SELECT * FROM active_users;What is a stored procedure?
A stored procedure is a precompiled set of SQL statements stored in the database. Benefits: reusability, reduced network traffic, better performance, and centralized business logic.
DELIMITER //
CREATE PROCEDURE GetUser(IN userId INT)
BEGIN
SELECT * FROM users WHERE id = userId;
END //
DELIMITER ;
CALL GetUser(1);What is the difference between stored procedure and function?
- Stored Procedure - can return multiple result sets. Uses OUT parameters. Called with CALL. Can perform DML.
- Function - must return a single value. Called within SQL expressions. Cannot perform DML in most DBMS.
What is a trigger?
A trigger automatically executes in response to INSERT, UPDATE, or DELETE events. Types: BEFORE and AFTER. Used for auditing, enforcing business rules, and maintaining derived data.
CREATE TRIGGER before_update
BEFORE UPDATE ON users
FOR EACH ROW
BEGIN
INSERT INTO audit_log(user_id, changed_at) VALUES (OLD.id, NOW());
END;What is the difference between UNION and UNION ALL?
- UNION - combines result sets and removes duplicates. Slower.
- UNION ALL - combines result sets keeping all rows. Faster.
What are window functions?
Window functions perform calculations across related rows without collapsing them into groups. Common: ROW_NUMBER(), RANK(), DENSE_RANK(), LAG(), LEAD(), SUM() OVER(), AVG() OVER().
SELECT name, salary, department,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) as dept_rank
FROM employees;What is the difference between optimistic and pessimistic locking?
- Pessimistic - locks the record when read. Prevents others from modifying. High-conflict scenarios.
- Optimistic - no lock on read. Checks version at update time. Fails if another transaction modified. Low-conflict scenarios.
What is a composite key?
A composite key is a primary key made up of two or more columns. Used when no single column uniquely identifies a row. Example: order_items table with (order_id, product_id) as composite primary key.
What is the difference between SQL and NoSQL?
- SQL - structured, tabular, fixed schema, ACID, vertical scaling. Best for complex queries and relationships.
- NoSQL - flexible schema (document, key-value, graph, column), horizontal scaling, eventual consistency. Best for large-scale unstructured data.
What is a candidate key?
- Candidate key - minimal set of attributes that uniquely identifies a row. A table can have multiple candidate keys.
- Primary key - the chosen candidate key. Only one per table. Cannot be NULL.
- Super key - any set of attributes that uniquely identifies a row (may be non-minimal).
What is the difference between a natural join and equi-join?
- Equi-join - joins on equality of specified columns. Duplicate join columns appear in result.
- Natural join - automatically joins on all columns with the same name. Duplicate columns removed.
-- Equi-join
SELECT * FROM users u JOIN orders o ON u.id = o.user_id;
-- Natural join
SELECT * FROM users NATURAL JOIN orders;What is the difference between a surrogate key and a natural key?
- Natural key - derived from actual data (email, SSN). Has business meaning. Can change.
- Surrogate key - artificial key with no business meaning (auto-increment ID, UUID). Never changes. Preferred for primary keys.
What is the difference between a hash index and B-tree index?
- B-tree - supports range queries, ORDER BY, GROUP BY, and equality. Most versatile. Default in most DBMS.
- Hash - only supports equality comparisons. Faster for exact lookups but no range queries.
What is the difference between a full table scan and index scan?
- Full table scan (COLLSCAN) - reads every row. O(n). Used when no suitable index exists.
- Index scan (IXSCAN) - uses index to find rows. O(log n). Much faster for selective queries.
- Use EXPLAIN to check which scan type is used.
What is a correlated subquery?
- Non-correlated - executes once independently. Result is reused.
- Correlated - references outer query columns. Executes once per outer row. Can be slow.
-- Non-correlated
SELECT * FROM users WHERE id IN (SELECT user_id FROM orders);
-- Correlated (slow)
SELECT * FROM users u
WHERE (SELECT COUNT(*) FROM orders o WHERE o.user_id = u.id) > 5;What is a materialized view?
- Regular view - virtual. Query re-executed every access. Always current data.
- Materialized view - stores query result physically. Must be refreshed. Faster for complex queries.
What is the difference between OLTP and OLAP?
- OLTP - many short concurrent transactions. Normalized schema. Fast writes. Examples: banking, e-commerce.
- OLAP - complex analytical queries on large datasets. Denormalized (star/snowflake schema). Fast reads. Examples: data warehouses.
What is a data warehouse?
A data warehouse is a centralized repository for structured data from multiple sources, optimized for analytical queries. Uses ETL (Extract, Transform, Load) processes. Common schemas: star schema and snowflake schema.
What is the difference between star schema and snowflake schema?
- Star schema - fact table at center connected directly to dimension tables. Denormalized. Simpler queries. Faster.
- Snowflake schema - dimension tables are normalized into multiple related tables. More complex queries. Less storage.
What is database sharding?
Sharding distributes data across multiple database servers (shards) based on a shard key. Each shard holds a subset of the data. Enables horizontal scaling. Challenges: cross-shard queries, rebalancing, and consistency.
What is database replication?
- Master-slave replication - master handles writes, slaves handle reads. Improves read scalability.
- Master-master replication - both nodes handle writes. Requires conflict resolution.
- Synchronous - waits for replica confirmation. Slower but consistent.
- Asynchronous - does not wait. Faster but may have replication lag.
What is the CAP theorem?
- CAP theorem - distributed system can only guarantee two of: Consistency, Availability, Partition tolerance.
- CA - consistent and available but not partition-tolerant (traditional RDBMS).
- CP - consistent and partition-tolerant but not always available (MongoDB, HBase).
- AP - available and partition-tolerant but eventually consistent (Cassandra, DynamoDB).
What is the difference between a B-tree and B+ tree index?
- B-tree - stores data in both internal and leaf nodes. Faster for single-key lookups.
- B+ tree - stores data only in leaf nodes. Leaf nodes linked for efficient range scans. Used by most RDBMS (MySQL InnoDB, PostgreSQL).
What is a covering index?
- Regular index - contains only indexed columns. DBMS must do a second lookup to fetch other columns.
- Covering index - contains all columns needed by a query. No second lookup needed. Faster for specific queries.
-- Covering index for: SELECT name, email FROM users WHERE age > 25
CREATE INDEX idx_covering ON users(age, name, email);
-- All needed columns in index - no table lookup!What is the difference between a check constraint and a trigger for validation?
- CHECK constraint - declarative. Enforced by DBMS automatically. Simpler. Cannot reference other tables.
- Trigger - procedural. More flexible. Can reference other tables. More overhead.
CREATE TABLE products (
price DECIMAL(10,2) CHECK (price > 0),
stock INT CHECK (stock >= 0)
);What is the difference between shared lock and exclusive lock?
- Shared lock (S lock) - multiple transactions can hold simultaneously. Used for reads.
- Exclusive lock (X lock) - only one transaction can hold. Used for writes. Blocks all other locks.
What is the difference between horizontal and vertical database scaling?
- Vertical scaling - adding more CPU/RAM/storage to a single server. Limited by hardware ceiling.
- Horizontal scaling - adding more database servers. Requires sharding or replication. More complex but unlimited scaling.
What is the difference between a schema migration and data migration?
- Schema migration - changes to database structure (adding columns, creating tables). Managed with Flyway or Liquibase.
- Data migration - moving or transforming data from one format/location to another. Often done with ETL processes.
What is the difference between a graph database and relational database?
- Relational - data in tables. Relationships via JOINs. Efficient for structured data.
- Graph database (Neo4j) - data as nodes and edges. Relationships are first-class citizens. Efficient for highly connected data (social networks, recommendations).
What is the difference between a sequence and auto-increment?
- AUTO_INCREMENT (MySQL) - automatically generates unique integers for a column. Table-specific.
- SEQUENCE (PostgreSQL, Oracle) - a database object generating unique numbers. Can be shared across tables. More flexible.
-- MySQL
CREATE TABLE users (id INT AUTO_INCREMENT PRIMARY KEY);
-- PostgreSQL
CREATE SEQUENCE user_id_seq;
CREATE TABLE users (id INT DEFAULT nextval('user_id_seq') PRIMARY KEY);What is the difference between eventual consistency and strong consistency?
- Strong consistency - all reads reflect the most recent write. All nodes see same data. Slower.
- Eventual consistency - reads may return stale data temporarily. Nodes eventually converge. Faster, more available.
What is the difference between a primary replica and secondary replica?
- Primary replica - receives all write operations. Source of truth.
- Secondary replica - receives replicated data from primary. Handles reads. Provides failover.
- Replication lag - delay between write on primary and its appearance on secondary.
What is the difference between OLTP and HTAP?
- OLTP - optimized for transactional workloads (INSERT, UPDATE, DELETE).
- HTAP (Hybrid Transactional/Analytical Processing) - handles both transactional and analytical workloads in the same system. Examples: TiDB, SingleStore.
What is the difference between a relational model and document model?
- Relational - data in tables with fixed schemas. Relationships via foreign keys. Strong consistency. SQL queries.
- Document - data as self-contained documents (JSON/BSON). Flexible schema. Embedded relationships. Better for hierarchical data.