System Design Interview Questions: Answers, Coding Prep & FAQs

01

How should you start a system design interview?

Start by clarifying requirements before drawing architecture. Ask about users, core features, read/write patterns, latency goals, availability, scale, data retention, security, and constraints. A strong answer avoids designing a huge system before understanding what problem must be solved.

02

What functional and non-functional requirements should you clarify?

Functional requirements describe what the system does, such as upload photos or send messages. Non-functional requirements describe qualities such as latency, throughput, availability, durability, security, cost, and compliance. Both change design decisions.

Example

Functional: create order, pay order, view order history
Non-functional: p95 latency < 300ms, 99.9% availability, audit logs retained 1 year

03

How do you perform capacity estimation?

Estimate traffic, storage, bandwidth, and compute from assumptions. Example: 10 million daily active users, 5 requests per user per day, and 500 bytes per request gives 50 million requests/day and about 25 GB/day of request payload before overhead. Use estimates to size services and identify bottlenecks, not to pretend exactness.

Example

QPS = daily_requests / 86,400
50,000,000 / 86,400 ~= 579 QPS
Peak QPS may be 3x to 10x average

04

How do you design APIs in system design?

API design should define resources, operations, request/response shape, authentication, pagination, error format, idempotency, and versioning. Keep APIs aligned with use cases, not database tables only.

Example

POST /v1/orders
GET /v1/orders/{orderId}
GET /v1/users/{userId}/orders?cursor=abc&limit=20

05

How do you choose between REST, GraphQL, and gRPC?

REST is simple and works well for resource-based public APIs. GraphQL helps clients fetch flexible shapes but needs careful authorization and query cost controls. gRPC is efficient for internal service-to-service communication with strongly typed contracts and streaming support.

06

How do you design a data model?

Start from access patterns and consistency needs. Identify entities, relationships, primary keys, indexes, retention, and query patterns. A relational model is often best for transactions and joins, while NoSQL can fit high-scale key-value, document, or event workloads.

07

How do read/write ratios affect architecture?

Read-heavy systems often benefit from caching, replicas, CDNs, and denormalized read models. Write-heavy systems need batching, partitioning, queues, backpressure, and careful database write capacity. The ratio helps decide where to optimize first.

08

What is horizontal scaling?

Horizontal scaling adds more machines or instances instead of making one machine larger. It works best with stateless services because any instance can handle any request. Stateful components require partitioning, replication, or coordination.

09

Why are stateless services useful?

Stateless services do not keep user session or workflow state in local memory. They are easier to scale, replace, deploy, and load balance. State should live in databases, caches, queues, or external stores.

Example

Client -> Load Balancer -> API instance A/B/C
Session state -> Redis or database, not local process memory

10

What does a load balancer do?

A load balancer distributes traffic across healthy service instances. It can provide health checks, TLS termination, routing, sticky sessions, rate limits, and failover. Bad health checks or uneven routing can cause outages even when servers are running.

11

What is database replication?

Replication copies data from one database node to others. It improves read scalability and availability, but replicas can lag. Systems that read from replicas must handle stale reads or route critical reads to the primary.

12

What is database sharding?

Sharding splits data across multiple database nodes by a shard key, such as user_id or tenant_id. It increases write and storage capacity but makes cross-shard queries, rebalancing, transactions, and operational work more complex.

Example

shard_id = hash(user_id) % number_of_shards

13

How do you choose a shard key?

Choose a key that distributes data evenly, matches common queries, avoids hot spots, and minimizes cross-shard operations. A poor shard key can overload one shard or force expensive fan-out queries.

14

What is consistent hashing?

Consistent hashing maps keys and nodes onto a ring so adding or removing nodes moves only a subset of keys. It is useful for caches, distributed storage, and sharded systems where rebalancing cost matters.

15

How does caching improve system design?

Caching stores frequently used data closer to the user or service, reducing latency and backend load. Caches can be browser caches, CDN caches, application caches, distributed caches, or database query caches. They add invalidation and consistency tradeoffs.

16

What is cache-aside?

In cache-aside, the application checks the cache first. On a miss, it reads from the database, writes the result to cache with a TTL, and returns it. On updates, it invalidates or refreshes the cache.

Example

value = cache.get(key)
if value is None:
    value = database.read(key)
    cache.set(key, value, ttl=300)

17

What is cache invalidation?

Cache invalidation removes or updates stale cached data. Strategies include TTLs, explicit delete on write, versioned keys, event-driven invalidation, and write-through caching. The risk is serving stale data or causing cache stampedes.

18

What is a cache stampede?

A cache stampede happens when many requests miss the same key at once and all hit the database. Mitigations include locks, request coalescing, TTL jitter, stale-while-revalidate, and prewarming important keys.

19

What is a CDN?

A Content Delivery Network caches static or semi-static content near users. It reduces latency, bandwidth cost, and origin load. CDNs are common for images, videos, downloads, scripts, styles, and public API responses that can be cached safely.

20

What is object storage used for?

Object storage stores large blobs such as images, videos, backups, exports, and documents. Instead of putting large files in a database, store metadata in the database and file content in object storage.

Example

Database: file_id, owner_id, object_key, content_type, size
Object storage: uploads/2026/06/file.png

21

Why use message queues?

Queues decouple producers from consumers, absorb traffic spikes, enable retries, and move slow work out of request paths. They are useful for emails, payments, video processing, notifications, indexing, and integration workflows.

Example

API -> Queue -> Worker -> Email provider
API returns quickly while worker handles slow email sending

22

What is event-driven architecture?

Event-driven architecture publishes events when facts occur, such as OrderPaid or UserRegistered. Consumers react asynchronously. It improves decoupling and scalability, but introduces eventual consistency, ordering, replay, and idempotency challenges.

23

What is idempotency?

Idempotency means repeating the same operation has the same effect as running it once. It is essential for retries, payments, order creation, and message processing. Use idempotency keys or unique constraints to prevent duplicate side effects.

Example

POST /payments
Header: Idempotency-Key: pay_123

If key already processed, return original result instead of charging again.

24

How do you design rate limiting?

Rate limiting restricts requests by user, IP, API key, tenant, or endpoint. Common algorithms include fixed window, sliding window, token bucket, and leaky bucket. The design must define limits, burst behavior, storage, and error responses.

25

What is backpressure?

Backpressure slows or rejects incoming work when downstream systems cannot keep up. It protects databases, queues, and services from collapse. Techniques include bounded queues, rate limits, circuit breakers, load shedding, and retry budgets.

26

What is a circuit breaker?

A circuit breaker stops calling a failing dependency after errors cross a threshold. It moves from closed to open, then half-open to test recovery. It prevents cascading failure and gives dependencies time to recover.

Example

closed: send requests normally
open: fail fast
half-open: allow a few test requests

27

What is eventual consistency?

Eventual consistency means replicas or derived views may be temporarily stale but converge later. It is common in distributed systems, caches, search indexes, and async workflows. Product behavior must make stale states acceptable or visible.

28

What is strong consistency?

Strong consistency means reads observe the latest successful write according to the system contract. It simplifies user expectations but may increase latency, reduce availability during partitions, or require coordination across nodes.

29

How do you choose between SQL and NoSQL?

Choose SQL for relational data, joins, constraints, transactions, and flexible querying. Choose NoSQL when access patterns are simple but scale, availability, or schema flexibility dominate. Many real systems use both for different parts.

30

What is search indexing?

Search indexing creates searchable representations of data in systems like Elasticsearch, OpenSearch, Solr, or database full-text indexes. Search is often eventually consistent because documents are indexed asynchronously after source data changes.

31

How do you design real-time updates?

Real-time updates can use WebSockets, Server-Sent Events, push notifications, or polling. Choose based on directionality, scale, firewall compatibility, latency needs, and client type. WebSockets fit bidirectional communication; SSE fits server-to-client streams.

32

How do you design a multi-region system?

A multi-region design runs services in more than one geographic region for latency, availability, or disaster recovery. Key decisions include active-active vs active-passive, data replication, conflict resolution, routing, failover, and compliance.

33

What is failover?

Failover moves traffic or leadership from a failed component to a healthy one. It can be automatic or manual. Good failover design includes health checks, data replication, runbooks, testing, and clear rollback paths.

34

What are RPO and RTO?

RPO, Recovery Point Objective, is how much data loss is acceptable. RTO, Recovery Time Objective, is how long recovery may take. These numbers drive backup frequency, replication design, standby capacity, and disaster recovery cost.

Example

RPO = 5 minutes means at most 5 minutes of data loss
RTO = 30 minutes means service should recover within 30 minutes

35

What are SLIs, SLOs, and SLAs?

SLI is the measured signal, such as successful request latency. SLO is the internal target, such as 99.9% of requests under 300 ms. SLA is the external commitment, often with contractual consequences.

36

What is observability?

Observability is the ability to understand system behavior from outputs such as metrics, logs, traces, events, and profiles. A well-designed system exposes enough signals to diagnose failures without guessing.

37

What should you monitor in a production system?

Monitor request rate, error rate, latency, saturation, queue depth, cache hit rate, database connections, replication lag, CPU, memory, disk, dependency health, and business KPIs. Tie alerts to user impact, not every noisy internal metric.

38

How do logs, metrics, and traces differ?

Logs capture discrete events and context. Metrics are numeric time-series signals. Traces follow a request across services. Together they help answer what happened, how often it happens, and where time was spent.

39

What is distributed tracing?

Distributed tracing follows a request through multiple services using trace IDs and spans. It helps identify slow dependencies, fan-out patterns, retries, and service boundaries. It is especially useful in microservices.

40

How do you design authentication and authorization?

Authentication verifies identity; authorization controls access. Design choices include sessions, JWTs, OAuth, API keys, RBAC, ABAC, tenant isolation, token expiration, revocation, and audit logging.

41

How do you handle file uploads at scale?

Use pre-signed object storage uploads, store metadata in a database, process files asynchronously, scan for malware, generate thumbnails in workers, and serve files through CDN. Avoid proxying large files through app servers when possible.

Example

Client -> API asks for signed upload URL
Client -> Object Storage direct upload
Storage event -> Queue -> Worker processes file

42

How would you design URL shortening?

Core requirements include creating short codes, redirecting quickly, preventing collisions, analytics, expiration, abuse controls, and high read traffic. A simple design stores code to long URL mapping in a key-value store and caches popular redirects.

Example

POST /urls -> { code: "aB91x" }
GET /aB91x -> 302 Location: https://example.com/long-url

43

How would you design a notification system?

A notification system needs templates, user preferences, channel routing, retries, deduplication, rate limits, provider failover, and delivery tracking. Use queues so slow providers do not block user requests.

44

How would you design a chat system?

A chat system needs message storage, conversation membership, real-time delivery, offline sync, unread counts, push notifications, ordering, and abuse controls. WebSockets can handle online delivery, while persistent storage and queues handle offline and retry behavior.

45

How would you design a news feed?

A news feed can use fanout-on-write, fanout-on-read, or a hybrid approach. Fanout-on-write precomputes feeds for fast reads but can be expensive for celebrities. Fanout-on-read computes on demand but can be slower. Hybrid designs treat high-fanout users differently.

46

How would you design a payment system?

Payment systems need idempotency, audit logs, secure token handling, state machines, retries, provider webhooks, reconciliation, fraud checks, and strong consistency around money-moving records. Never rely only on client-side payment status.

47

How would you design a leaderboard?

Leaderboards need ranking, score updates, top-N queries, nearby-rank queries, tie handling, and periodic resets. Redis sorted sets are a common fit for real-time rankings, while durable storage keeps historical records.

Example

ZADD leaderboard 1200 user:42
ZREVRANGE leaderboard 0 9 WITHSCORES
ZREVRANK leaderboard user:42

48

What are common system design mistakes?

Common mistakes include skipping requirements, overengineering too early, ignoring data consistency, forgetting failure modes, missing observability, assuming infinite cache, picking trendy tools without tradeoffs, and not explaining bottlenecks or alternatives.

49

How should you present tradeoffs in a system design interview?

State the decision, why it fits the requirements, what it costs, and when you would change it. Example: choose cache-aside for simple read scaling, but mention stale data, invalidation, stampede prevention, and metrics needed to validate it.

50

What is a good final checklist for a system design answer?

End by reviewing requirements, bottlenecks, scaling path, failure modes, data consistency, security, observability, cost, and rollout plan. This shows you can operate the design, not just draw boxes.

Example

Checklist:
- Requirements met?
- Bottlenecks identified?
- Failure modes handled?
- Data consistency clear?
- Metrics/logs/traces planned?
- Security and cost addressed?

Top 50 System Design Interview Questions

How should you start a system design interview?

What functional and non-functional requirements should you clarify?

How do you perform capacity estimation?

How do you design APIs in system design?

How do you choose between REST, GraphQL, and gRPC?

How do you design a data model?

How do read/write ratios affect architecture?

What is horizontal scaling?

Why are stateless services useful?

What does a load balancer do?

What is database replication?

What is database sharding?

How do you choose a shard key?

What is consistent hashing?

How does caching improve system design?

What is cache-aside?

What is cache invalidation?

What is a cache stampede?

What is a CDN?

What is object storage used for?

Why use message queues?

What is event-driven architecture?

What is idempotency?

How do you design rate limiting?

What is backpressure?

What is a circuit breaker?

What is eventual consistency?

What is strong consistency?

How do you choose between SQL and NoSQL?

What is search indexing?

How do you design real-time updates?

How do you design a multi-region system?

What is failover?

What are RPO and RTO?

What are SLIs, SLOs, and SLAs?

What is observability?

What should you monitor in a production system?

How do logs, metrics, and traces differ?

What is distributed tracing?

How do you design authentication and authorization?

How do you handle file uploads at scale?

How would you design URL shortening?

How would you design a notification system?

How would you design a chat system?

How would you design a news feed?

How would you design a payment system?

How would you design a leaderboard?

What are common system design mistakes?

How should you present tradeoffs in a system design interview?

What is a good final checklist for a system design answer?

Use System Design interview prep to move into practice and application.

Popular Tutorials

Ready to Level Up Your Skills?