PostgreSQL interview questions covering SQL, indexes, joins, transactions, constraints, query plans, JSONB, partitioning, and performance.
PostgreSQL is an open-source object-relational database known for correctness, SQL standards support, extensibility, strong transactions, advanced indexing, JSONB, full-text search, partitioning, replication, and mature operational features. It is often chosen when teams need reliability, complex queries, and strong data integrity.
MVCC, or Multi-Version Concurrency Control, lets readers and writers work concurrently by keeping multiple row versions. Readers see a consistent snapshot without blocking ordinary writers, and writers create new row versions instead of overwriting rows in place. MVCC improves concurrency but creates dead tuples that vacuum must clean.
A transaction groups SQL statements so they commit together or roll back together. PostgreSQL transactions provide ACID behavior: atomicity, consistency, isolation, and durability. Use transactions for workflows such as transferring funds, creating orders, or updating several related tables.
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;
PostgreSQL supports Read Committed, Repeatable Read, and Serializable. Read Committed is the default and each statement sees a fresh committed snapshot. Repeatable Read keeps one snapshot for the transaction. Serializable provides the strongest isolation and may abort transactions to prevent anomalies.
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
-- read and write data
COMMIT;
A deadlock happens when transactions wait on each other in a cycle, so none can proceed. PostgreSQL detects deadlocks and aborts one transaction. Prevent them by locking rows in consistent order, keeping transactions short, and avoiding unnecessary locks.
EXPLAIN shows the query plan PostgreSQL expects to use. EXPLAIN ANALYZE actually runs the query and reports real timing and row counts. It is essential for diagnosing slow queries because it reveals scans, joins, estimates, sort operations, and whether indexes are used.
EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 42;
B-tree is the default PostgreSQL index type. It works well for equality, range searches, ordering, and many common WHERE clauses. It is commonly used for primary keys, foreign keys, timestamps, status fields, and lookup columns.
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
A composite index includes multiple columns. Column order matters because PostgreSQL can use the leftmost prefix effectively. For example, an index on (customer_id, created_at) helps queries filtering by customer_id and ordering or filtering by created_at.
CREATE INDEX idx_orders_customer_created
ON orders(customer_id, created_at DESC);
A partial index indexes only rows matching a predicate. It is useful when queries frequently target a subset, such as active users, unpaid invoices, or non-deleted records. Partial indexes are smaller and faster when the predicate matches query patterns.
CREATE INDEX idx_users_active_email
ON users(email)
WHERE deleted_at IS NULL;
An expression index indexes the result of an expression instead of a raw column. It is useful for case-insensitive lookups, date extraction, computed keys, or normalized values. The query must use a matching expression for the index to help.
CREATE INDEX idx_users_lower_email ON users (lower(email));
SELECT * FROM users WHERE lower(email) = 'a@example.com';
GIN indexes are useful for composite values where a row can contain many searchable elements, such as JSONB, arrays, and full-text search vectors. They are powerful for containment and membership queries but can be larger and more expensive to update than B-tree indexes.
CREATE INDEX idx_events_payload_gin ON events USING gin(payload);
GiST indexes are generalized search tree indexes used for data types and operators such as geometric data, ranges, nearest-neighbor searches, and extensions like PostGIS. They are flexible but should be chosen based on operator support and query patterns.
PostgreSQL may skip an index if the table is small, the predicate is not selective, statistics are stale, the query expression does not match the index, the data type cast prevents usage, or a sequential scan is estimated cheaper. Use EXPLAIN ANALYZE before assuming the planner is wrong.
A primary key uniquely identifies rows in a table. A foreign key enforces that a value references an existing row in another table. Foreign keys protect data integrity and make relationships explicit, but they also add checks during writes.
CREATE TABLE orders (
id bigserial PRIMARY KEY,
customer_id bigint NOT NULL REFERENCES customers(id)
);
Constraints enforce rules in the database, such as NOT NULL, UNIQUE, CHECK, PRIMARY KEY, FOREIGN KEY, and EXCLUDE. They protect data even when multiple applications or scripts write to the same database.
ALTER TABLE products
ADD CONSTRAINT price_positive CHECK (price > 0);
UPSERT inserts a row or updates an existing row when a uniqueness conflict occurs. PostgreSQL implements it with INSERT ... ON CONFLICT. It is useful for idempotent writes, counters, sync jobs, and external identifiers.
INSERT INTO users(email, name)
VALUES ('a@example.com', 'Asha')
ON CONFLICT (email)
DO UPDATE SET name = EXCLUDED.name;
RETURNING returns values from inserted, updated, or deleted rows. It avoids an extra SELECT after writes and is commonly used to get generated IDs, timestamps, or updated values.
INSERT INTO posts(title)
VALUES ('Hello')
RETURNING id, created_at;
Common Table Expressions, or CTEs, define named temporary result sets using WITH. They improve readability for complex queries and recursive logic. In modern PostgreSQL, simple CTEs may be inlined, but materialization behavior can still matter for performance.
WITH recent_orders AS (
SELECT * FROM orders WHERE created_at >= now() - interval '7 days'
)
SELECT customer_id, count(*)
FROM recent_orders
GROUP BY customer_id;
Window functions calculate values across related rows without collapsing the result like GROUP BY. They are useful for ranking, running totals, previous or next row comparisons, and per-group analytics.
SELECT
customer_id,
total,
row_number() OVER (PARTITION BY customer_id ORDER BY created_at DESC) AS rn
FROM orders;
JSONB stores JSON data in a binary format that supports indexing and efficient querying. It is useful for flexible attributes, event payloads, and integration data. Do not use JSONB as a replacement for relational modeling when fields need strong constraints and joins.
SELECT * FROM events
WHERE payload @> '{"type":"signup"}'::jsonb;
Arrays can model small lists of scalar values, such as tags or flags, when relational joins are not needed. Avoid arrays when individual elements need foreign keys, frequent updates, complex querying, or independent lifecycle management.
SELECT * FROM posts WHERE 'postgres' = ANY(tags);
A view is a stored query exposed like a table. Views simplify repeated queries, hide complexity, and provide stable interfaces. They do not store data by default, so performance depends on the underlying query.
CREATE VIEW active_users AS
SELECT id, email FROM users WHERE deleted_at IS NULL;
A materialized view stores query results physically. It can speed expensive reports or dashboards, but it becomes stale until refreshed. Use it when data freshness requirements tolerate refresh intervals.
CREATE MATERIALIZED VIEW daily_sales AS
SELECT date_trunc('day', created_at) AS day, sum(total) AS revenue
FROM orders
GROUP BY 1;
REFRESH MATERIALIZED VIEW daily_sales;
Stored functions and procedures run logic inside the database. They can centralize data-heavy operations and reduce round trips, but overusing them can hide business logic, complicate testing, and couple application behavior tightly to the database.
Triggers run automatically in response to table events such as INSERT, UPDATE, or DELETE. They are useful for audit logs, denormalized counters, and invariants. They can also surprise developers because side effects happen outside visible application code.
Partitioning splits a logical table into smaller physical partitions, commonly by range, list, or hash. It helps manage large tables, retention, and query pruning when predicates match the partition key. Poor partition strategy can add complexity without improving performance.
CREATE TABLE events (
id bigserial,
created_at date NOT NULL,
payload jsonb
) PARTITION BY RANGE (created_at);
VACUUM cleans up dead row versions created by MVCC so space can be reused and table bloat is controlled. VACUUM also helps prevent transaction ID wraparound. Regular vacuuming is essential for write-heavy databases.
Autovacuum automatically runs vacuum and analyze based on table activity. If it is too weak for a write-heavy workload, tables can bloat, queries can slow down, and wraparound risk can grow. Monitor autovacuum activity, dead tuples, and table bloat.
ANALYZE collects table statistics used by the query planner. Stale or inaccurate statistics can cause bad query plans. Autovacuum usually runs analyze automatically, but manual ANALYZE can help after large data changes.
ANALYZE orders;
Connection pooling reuses database connections instead of opening one per request. PostgreSQL connections are relatively heavy, so high-traffic apps often use PgBouncer or application pools. Pool size should match database capacity and workload, not simply the number of web requests.
WAL, or Write-Ahead Logging, records changes before they are applied to data files. It supports crash recovery, replication, and point-in-time recovery. Heavy write workloads can be limited by WAL volume, disk throughput, or replication lag.
Replication copies data from a primary server to one or more replicas. Streaming replication is commonly used for read scaling, high availability, and disaster recovery. Replicas can lag behind the primary, so applications must understand read-after-write requirements.
Point-in-time recovery, or PITR, restores a backup and replays WAL to a specific time before an incident. It is useful after accidental deletes, bad migrations, or corruption. PITR requires base backups, archived WAL, and tested restore procedures.
PostgreSQL uses roles for users and groups. Privileges control access to databases, schemas, tables, sequences, functions, and more. Production systems should use least privilege and avoid running applications as superusers.
CREATE ROLE app_user LOGIN PASSWORD 'change-me';
GRANT CONNECT ON DATABASE appdb TO app_user;
GRANT SELECT, INSERT, UPDATE ON users TO app_user;
A schema is a namespace inside a database that groups objects such as tables, views, and functions. search_path controls which schemas PostgreSQL checks when an object name is not qualified. In production, qualify important objects or manage search_path carefully to avoid security and ambiguity problems.
CREATE SCHEMA billing;
CREATE TABLE billing.invoices (id bigserial PRIMARY KEY);
SET search_path TO billing, public;
Row-level security, or RLS, restricts which rows a role can see or modify based on policies. It is useful for multi-tenant systems, but policies must be tested carefully because a mistake can expose or block data.
ALTER TABLE invoices ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON invoices
USING (tenant_id = current_setting('app.tenant_id')::uuid);
Full-text search tokenizes documents and queries into lexemes, then ranks matches. It is useful for basic search without a separate search engine. Use tsvector, tsquery, and GIN indexes for performant search.
CREATE INDEX idx_posts_search
ON posts USING gin(to_tsvector('english', title || ' ' || body));
SELECT * FROM posts
WHERE to_tsvector('english', title || ' ' || body) @@ plainto_tsquery('postgres');
Advisory locks are application-defined locks managed by PostgreSQL. They are useful for leader election, scheduled job locking, or protecting shared resources. They require discipline because PostgreSQL does not know what business object the lock represents.
SELECT pg_try_advisory_lock(12345);
-- do protected work
SELECT pg_advisory_unlock(12345);
Joins combine rows from tables based on conditions. PostgreSQL can use nested loop, hash join, or merge join strategies depending on data size, indexes, statistics, and sort order. EXPLAIN helps show which strategy was chosen.
INNER JOIN returns only matching rows from both tables. LEFT JOIN returns all rows from the left table and matching rows from the right table, using NULLs where no match exists. LEFT JOIN is useful for optional relationships and missing-data reports.
SELECT customers.email, orders.id
FROM customers
LEFT JOIN orders ON orders.customer_id = customers.id;
Use logs, pg_stat_statements, EXPLAIN ANALYZE, application tracing, and database metrics. Look at total time, mean time, rows read, rows returned, shared buffer hits, temporary files, lock waits, and query frequency.
pg_stat_statements is an extension that tracks normalized query statistics. It helps identify high-cost queries by total time, average time, calls, rows, and I/O. It is one of the most useful extensions for performance tuning.
SELECT query, calls, total_exec_time, mean_exec_time
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;
Prepared statements let PostgreSQL parse and plan a statement once, then execute it multiple times with different parameters. They can improve repeated query performance and help prevent SQL injection when parameters are bound correctly. Be aware that generic plans are not always optimal for every parameter distribution.
PREPARE find_user(text) AS
SELECT id, email FROM users WHERE email = $1;
EXECUTE find_user('a@example.com');
A lock wait happens when one transaction waits for another transaction lock. Long lock waits can slow or block production traffic. Monitor blocking queries, keep transactions short, and avoid running unsafe migrations during peak traffic.
SELECT pid, wait_event_type, wait_event, query
FROM pg_stat_activity
WHERE wait_event_type = 'Lock';
Use CREATE INDEX CONCURRENTLY for large production tables so writes are not blocked the same way as a normal index build. It takes longer and cannot run inside a transaction block, but it is safer for live systems.
CREATE INDEX CONCURRENTLY idx_orders_created_at
ON orders(created_at);
Migrations can lock tables, rewrite data, backfill rows, or create heavy indexes. Safe migrations split schema changes from data backfills, use concurrent indexes, batch updates, set timeouts, and test on production-like data volume.
Query optimization means improving query shape, indexes, statistics, schema design, and configuration so PostgreSQL can execute work efficiently. Start with measurements: EXPLAIN ANALYZE, pg_stat_statements, row counts, selectivity, and real latency.
Common mistakes include using superuser accounts for apps, weak passwords, open network access, missing TLS, overbroad grants, unsafe dynamic SQL, unencrypted backups, exposed stateful replicas, and storing secrets in logs or migration files.
Common anti-patterns include missing indexes on foreign keys, indexing every column blindly, long idle transactions, unbounded connection pools, using JSONB for strongly relational data, ignoring autovacuum, running blocking migrations in peak hours, and tuning without EXPLAIN.
A strong demo schema uses primary keys, foreign keys, constraints, indexes that match queries, timestamps, and realistic relationships. Explain both data integrity and access patterns, because good schema design depends on how data is queried and updated.
CREATE TABLE customers (
id bigserial PRIMARY KEY,
email text NOT NULL UNIQUE,
created_at timestamptz NOT NULL DEFAULT now()
);
CREATE TABLE orders (
id bigserial PRIMARY KEY,
customer_id bigint NOT NULL REFERENCES customers(id),
total numeric(12,2) NOT NULL CHECK (total >= 0),
created_at timestamptz NOT NULL DEFAULT now()
);
CREATE INDEX idx_orders_customer_created
ON orders(customer_id, created_at DESC);
Explore 500+ free tutorials across 20+ languages and frameworks.