The most important system design questions often revolve around designing large-scale, distributed systems that require dealing with immense load, data, and a variety of complex technical challenges.

Core Design Topics

  • Scalability & Availability: How to design a system that can handle increasing load (scaling horizontally vs. vertically, load balancing) and remain operational despite failures (fault tolerance, failover, replication).
  • Data Storage & Management: Understanding the differences between SQL vs. NoSQL databases, when to use which, and strategies like Sharding/Data Partitioning and data replication.
  • Performance Optimization: Techniques to reduce latency and increase throughput, such as Caching (CDN, in-memory caches, cache-aside strategy) and database indexing.
  • Distributed System Concepts: Key principles like the CAP Theorem (Consistency, Availability, Partition Tolerance), different consistency models, and the use of message queues (e.g., Kafka, RabbitMQ) for asynchronous communication.
  • API Design & Communication: Designing well-structured APIs (e.g., REST, gRPC) and selecting the right communication protocols (e.g., HTTP vs. WebSockets).

1. Design a Social Media Feed (e.g., Instagram, Twitter, News Feed)

This design question tests your understanding of trade-offs between write and read latency, data consistency, and how to efficiently manage massive read traffic for personalized content.

Core Challenge: Efficiently generating, aggregating, and delivering a personalized, up-to-date feed to hundreds of millions of users with extremely low latency, despite high write volume (posting).

Key Concepts:

  • Feed Generation Model: Deciding between Fanout-on-Write (Push Model) and Fanout-on-Read (Pull Model) and understanding the performance implications of each.

    • Push Model (Fanout-on-Write): A post is immediately written to all follower inboxes. Pros: Fast feed loading. Cons: High write load, expensive for “celebrity” accounts with millions of followers.

    • Pull Model (Fanout-on-Read): A feed is generated on-demand when the user requests it by querying all followed users’ posts. Pros: Efficient for celebrities. Cons: Slow feed loading.

    • Hybrid Model: Often the best solution, using Push for most users and Pull for celebrities.

  • Data Storage:

    • Database: A fast, scalable database (e.g., NoSQL like Cassandra or a sharded relational DB) for storing posts, user data, and the graph (who follows whom).

    • Caching: Extensive use of a distributed cache (like Memcached/Redis) for user profiles, hot posts, and pre-calculated feed chunks to handle the massive read load.

  • Media Storage (CDN): Utilizing a Content Delivery Network (CDN) to store and serve static media (images, videos) to minimize latency and offload the main servers. This involves pre-processing media into various formats/resolutions.

  • Feed Ranking/Algorithm: The mechanism used to determine the order of posts (e.g., recency, engagement, machine learning models) before delivery.

2. Design a URL Shortening Service (e.g., TinyURL, Bitly)

This design question tests your understanding of mapping services, unique ID generation, and balancing the critical requirements of low latency redirection and high availability.

Core Challenge: Efficiently generating unique, short aliases for very long URLs and, more critically, performing extremely fast, highly available lookups and redirects for those short links at massive scale.

Key Concepts:

  • ID/Key Generation: The mechanism used to create the short unique string (the key). Common approaches include:
    • Base62/Base64 Conversion: Converting a sequential integer ID (from a separate service) into a 6-10 character string using alphanumeric characters ($0-9, a-z, A-Z$).
    • Hashing: Using cryptographic hashing (like SHA-256) on the original URL, then taking the first few characters. This risks collisions (two URLs generating the same short key), which must be handled.
    • Distributed Unique ID Generator: A dedicated service (e.g., using Snowflake) to dispense sequential IDs to generate the short key, ensuring uniqueness across the system.
  • Data Storage: A highly optimized database (often a NoSQL key-value store like Cassandra or DynamoDB) is used to store the mapping of $\text{short_key} \rightarrow \text{long_URL}$ because the primary operation is a simple, fast lookup.
  • Redirection: Implementing a fast HTTP redirect (usually a 301 for permanent moves or 302/307 for temporary ones) to send the user from the short link to the original long URL.
  • Availability vs. Consistency: High availability is paramount for the redirect service. A slight delay in seeing a newly created link is acceptable (eventual consistency is often fine), but the redirect service must never go down.

3. Design a Real-time Communication System (e.g., WhatsApp, Messenger)

This design question tests your understanding of low-latency networking, handling state, ensuring message delivery, and scaling highly concurrent, bidirectional connections.

Core Challenge: Establishing and maintaining persistent, two-way connections between users to send messages instantaneously, ensuring every message is delivered exactly once, even if the user is offline.

Key Concepts:

  • Persistent Connections: Utilizing WebSockets or long polling as the primary communication protocol to establish a persistent, bidirectional channel between the client and the server, enabling low-latency, real-time message exchange.
  • Message Processing and Buffering:
    • Message Queues (e.g., Kafka, RabbitMQ): Essential for decoupling the client-facing services from the message processing backend. They handle bursts of messages, provide a buffer, and ensure messages aren’t lost if a downstream service fails.
    • State Management: Tracking which connections are active and which users are currently online or offline. This typically uses a highly available cache (like Redis).
  • Message Delivery Guarantees: Implementing mechanisms to ensure reliable delivery, addressing scenarios like network failure and offline users:
    • Acknowledge (ACK) System: The recipient’s device sends an ACK back to the server upon receiving a message. The server then marks the message as delivered.
    • Storage (Offline Messages): Messages for offline users are persisted in a database until they reconnect. Upon reconnection, the user’s connection service queries the database and sends the backlog.
  • Push Notifications: Integrating with platform-specific services (e.g., FCM for Android, APNs for iOS) to wake up the recipient’s mobile application when a new message arrives while the app is in the background or closed.

4. Platform Design (Video Streaming & File Sharing)

This design question tests your understanding of handling large binary objects (BLOBs), managing petabytes of storage, ensuring high-speed access globally, and dealing with various security and processing challenges.

Core Challenge: Efficiently and reliably storing, processing, and distributing large media (video) or user files (documents) to a global user base with extremely high availability and low latency.

Key Concepts:

  • Large File Uploads: Designing a resilient mechanism to handle files that may take minutes or hours to upload.

    • Chunking/Resumable Uploads: Splitting large files into smaller chunks and allowing the client to resume the upload from the last successfully transferred chunk if the connection drops.

    • Direct Upload to Storage: Using signed URLs to allow clients to upload directly to cloud storage (e.g., S3), bypassing the main application servers.

  • Content Delivery Networks (CDN): Essential for minimizing latency. Static assets (videos, images, completed files) are cached on edge servers geographically closer to the end-user, significantly improving download/streaming speed.

  • Video Encoding/Transcoding (YouTube): The process of converting the raw uploaded video file into various standardized formats, resolutions (e.g., 480p, 720p, 1080p), and bitrates. This ensures the video can be played on different devices and adjusts quality based on the user’s current network bandwidth (Adaptive Bitrate Streaming).

  • Security & Access Control (Google Drive): Implementing robust authentication and authorization checks to ensure that files are only accessible to the owner and explicitly shared users. This involves complex permission management within the storage system and application layer.

  • Storage Tiers: Using a combination of storage solutions: high-speed hot storage (SSD) for frequently accessed files and cold storage (HDD, tape) for archives or rarely accessed files to optimize cost.

5. Design an API Rate Limiter

This design question tests your understanding of traffic control, distributed synchronization, distributed counters, and protecting backend services from overload.

Core Challenge: Implementing a mechanism to control the rate at which users or services can access an API, ensuring fair usage and preventing system overload or abuse (e.g., Denial of Service attacks).

Key Concepts:

  • Rate Limiting Algorithms: Choosing the correct strategy to define and enforce limits:

    • Sliding Window Log: The most accurate but most expensive method. It stores a timestamp for every request and calculates the number of requests within the window.

    • Sliding Window Counter: A highly scalable and popular approach. It divides the time window into smaller intervals and uses counters for each, offering a better balance of accuracy and performance than the simple Fixed Window.

    • Token Bucket/Leaky Bucket: Used to smooth out bursts of traffic. Requests “consume” tokens from a bucket, or fill a bucket like water, only allowing requests to proceed at a steady rate.

  • Distributed Counters: The rate limiter must work across multiple servers. A distributed cache (like Redis) is essential for storing and atomicially incrementing the counters that track usage for each user/API key. Atomicity is critical to prevent race conditions.

  • Placement of the Limiter: Deciding where the rate limiter sits in the architecture:

    • Gateway/Proxy Layer: Placing it early (e.g., Nginx, API Gateway) protects the entire downstream infrastructure.

    • Application Layer: Placing it within the application logic for finer-grained, per-API endpoint control.

  • Handling Over-Limit Requests: Defining the system’s behavior when a limit is exceeded, typically returning an HTTP status code 429 (Too Many Requests) and providing a Retry-After header.

6. Design a Typeahead/Autocomplete System (e.g., Google Search Suggestions)

This design question tests your understanding of data structures optimized for prefix matching, latency-critical read operations, and efficiently handling extremely large dictionaries of queries.

Core Challenge: Providing highly relevant, ranked search suggestions instantaneously (usually < 50 ms) as a user types, sourcing suggestions from a massive, constantly updated corpus of historical queries.

Key Concepts:

  • Data Structure: Utilizing a Trie (Prefix Tree) or a variation like a Radix Tree is fundamental. These tree structures are specifically designed for efficient prefix matching, allowing the system to quickly traverse the tree based on the characters typed by the user.
  • Ranking/Scoring: Suggestions are not just based on the prefix match, but also their relevance and popularity.
    • The system must calculate and store a score (e.g., frequency of past searches, recency, location) associated with the completed word/phrase in the Trie node.
    • This score is used to return the Top K most relevant suggestions for the given prefix.
  • Caching: Extensive use of distributed caching (e.g., Redis) is critical. Since the system is highly read-intensive and latency-sensitive, the entire Trie structure or highly popular prefixes are typically loaded and maintained in memory across the serving cluster.
  • Distributed Architecture: Sharding the data is necessary due to the massive corpus size. Sharding can be done based on the first one or two characters of the query (e.g., all queries starting with ‘A’ go to Server 1), or using Consistent Hashing for more even distribution.
  • Real-time Updates: Designing a pipeline to ingest and update the Trie with new popular queries quickly, typically by processing logs in near real-time and pushing incremental updates to the serving cluster.

7. Design a Distributed Caching System (e.g., Memcached, Redis Cluster)

This design question tests your understanding of data partitioning, fault tolerance, and improving read performance.

Core Challenge: Storing frequently accessed data across multiple servers for fast retrieval while ensuring high availability and efficient scaling.

Key Concepts:

  • Consistent Hashing: Critical for distributing cache keys evenly across nodes and minimizing cache misses when a node is added or removed.
  • Cache Eviction Policies: Implementing algorithms like LRU (Least Recently Used) or LFU (Least Frequently Used) to manage cache size when memory is full.
  • Data Replication: Deciding on the number of replicas needed for fault tolerance, allowing the system to handle node failures without losing cached data.

8. Design a Global Distributed Data Store (e.g., Google Spanner)

This is a highly advanced question that challenges your knowledge of achieving both high availability and strong consistency across multiple geographical regions.

Core Challenge: Designing a system that is globally distributed yet provides strong transactional consistency (ACID properties).

Key Concepts:

  • Two-Phase Commit (2PC) / Distributed Transactions: Mechanisms to ensure that a transaction is either committed or aborted across all participating nodes.
  • Global Clock/TrueTime: The need for a highly accurate, synchronized clock across all data centers to serialize transactions globally and guarantee external consistency.
  • Data Partitioning/Sharding: Segmenting the data across regions and replicas while managing cross-region queries.

9. Design a Web Crawler (e.g., Googlebot)

This tests your ability to handle massive asynchronous workloads, manage state, and deal with external systems (the internet).

Core Challenge: Systematically fetching and indexing billions of web pages from the internet efficiently, politely, and at high throughput.

Key Concepts:

  • Frontier: A service responsible for storing and prioritizing the URLs yet to be crawled, often implemented as a highly reliable distributed queue.
  • Politeness: Implementing logic to respect robots.txt rules and limit the crawl rate to avoid overwhelming target websites.
  • Duplicate Detection: Using techniques like shingling and MinHash to identify and discard pages with identical or near-identical content.

10. Design an Ad Server (e.g., Google Ads or Facebook Ads Backend)

This scenario focuses on extreme low-latency processing, auction mechanisms, and personalization at scale.

Core Challenge: Selecting and serving a personalized ad to a user in under 100 milliseconds from the moment a web page loads.

Key Concepts:

  • Inverted Index: Essential for quickly finding all relevant ads for a given keyword or user demographic.
  • Auction Engine: The core component that runs the bidding process among eligible ads to determine which ad wins and at what price.
  • Real-Time Data Pipelines: Using data streams (like Kafka) to collect user events and update ad performance metrics almost instantly.

11. Design a Stock Exchange/Trading Platform

This problem is a deep dive into concurrency, strong consistency, and high-frequency, low-latency transaction processing.

Core Challenge: Building an Order Matching Engine that handles extremely high volumes of buy and sell orders with precise sequencing and strong consistency guarantees.

Key Concepts:

  • Order Book Data Structure: Using specialized structures (like a Price-Time Priority Queue) to efficiently match buy and sell orders.
  • Concurrency Control: Utilizing techniques like single-writer architecture (e.g., LMAX Disruptor pattern) or pessimistic locking to maintain strong transactional integrity on the central order book.
  • Low Latency Networking: Using high-performance network protocols and infrastructure (like dedicated hardware or kernel bypass) to minimize communication time.

12. Design a Fraud Detection System

This design tests your ability to handle real-time data streams, apply machine learning models, and execute high-speed decision-making.

Core Challenge: Analyzing transactions in real time (low latency) to detect and flag suspicious activities before they are finalized.

Key Concepts:

  • Stream Processing: Using technologies like Apache Flink or Kafka Streams to process transactional data continuously as it arrives.
  • Rule Engine/ML Integration: Architecture for loading and applying pre-trained machine learning models or heuristic rules to the data stream for scoring.
  • Windowing: Defining sliding or tumbling time windows to calculate user or account aggregates (e.g., number of transactions in the last 5 minutes) crucial for fraud detection features.

13. Design a Live Commenting/Polling System (e.g., Live Sports Events, Twitch Chat)

This focuses on managing massive fanout, state, and low-latency message delivery for transient data.

Core Challenge: Broadcasting high-volume, real-time messages (comments, votes) to millions of concurrent viewers with minimal delay, and handling the temporary, high load.

Key Concepts:

  • WebSockets/Streaming: Utilizing persistent connections (WebSockets or proprietary protocols) to push messages instantly to clients.
  • Sharding by Event: Segmenting the chat/comment stream based on the live event (e.g., “Game ID,” “Stream ID”) to distribute the load across many servers.
  • Hot Spot Management: Techniques for isolating and scaling up components handling viral or extremely popular events.

14. Design a Distributed Counter Service (e.g., Counting Facebook Likes or Twitter Retweets)

This problem focuses on system reliability, concurrency control, and the consistency trade-offs for numerical data.

Core Challenge: Incrementing a massive number of counters (likes, views) concurrently and quickly, while maintaining high accuracy and availability.

Key Concepts:

  • Eventual Consistency: Often preferred for performance; updates are initially batched and aggregated in a distributed cache (like Redis) before being written back to the persistent store.
  • Atomic Operations: Using native database or cache features (like Redis’s INCR command) to ensure that concurrent increments don’t overwrite each other.
  • Sharding: Distributing the vast number of counters across multiple database or cache shards to handle the read/write load.

15. Design a Distributed Configuration Management System (e.g., ZooKeeper, etcd)

This is a meta-system design that addresses how distributed applications manage their own state and configuration.

Core Challenge: Storing critical, shared configuration data and service metadata that must be highly available and provide strong consistency for all reading clients.

Key Concepts:

  • Consensus Algorithms: Using protocols like Raft or Paxos to ensure that all nodes in the cluster agree on the same value for the configuration state.
  • Watchers/Notifications: Implementing a mechanism that allows client services to be notified immediately when a configuration value changes, rather than constantly polling.
  • Strong Consistency: The system must guarantee linearizability, meaning all clients see the configuration changes in the same order at the same time.

16. Design a Geospatial Service for Finding Points of Interest (POIs) (e.g., Yelp, Google Maps POIs)

Similar to the ride-sharing problem, this design explores advanced techniques for indexing and querying static geographical data.

Core Challenge: Storing millions of static locations (restaurants, stores) and efficiently querying the database to return all points within a specific geographic area (e.g., bounding box or radius).

Key Concepts:

  • Geospatial Databases: Leveraging databases with native geospatial indexing capabilities (e.g., PostGIS, MongoDB’s GeoSpatial Index).
  • Indexing Structures: Using structures like R-trees or Geohashing for efficient spatial lookups, which transform 2D coordinates into a 1D, searchable string format.
  • Reverse Geocoding: The process of converting coordinates back into a human-readable address.

17. Design a Real-time Telemetry and Monitoring System (e.g., Prometheus, Datadog)

This is a critical system for any tech company, testing your ability to handle massive time-series data ingest and querying.

Core Challenge: Collecting metrics, logs, and traces from millions of application instances in real-time, storing them efficiently, and allowing for rapid, aggregated querying.

Key Concepts:

  • Time-Series Database (TSDB): Choosing a specialized database (like Prometheus, InfluxDB, or a custom solution built on Cassandra) optimized for time-stamped data.
  • Data Aggregation: Implementing roll-up processes to downsample and summarize old data (e.g., aggregate 1-minute resolution data into 1-hour resolution after a week).
  • Alerting Pipeline: Designing the component responsible for evaluating incoming metrics against defined thresholds and triggering notifications.

18. Design a Single Sign-On (SSO) System (e.g., Auth0, Corporate SSO)

This scenario focuses on authentication, authorization, and secure communication protocols.

Core Challenge: Enabling a user to authenticate once and gain access to multiple independent software systems securely and seamlessly.

Key Concepts:

  • Protocols: Understanding and applying standards like OAuth 2.0 and OpenID Connect (OIDC) for authentication and authorization flows.
  • Token Management: Generating, validating, and revoking secure tokens (like JWTs - JSON Web Tokens) that carry user information across services.
  • Session Management: Centralized control over user sessions and the implementation of session termination/revocation.

19. Design a Data Processing Pipeline for Analytics (Batch ETL)

This question shifts focus from real-time services to large-scale, offline data movement and transformation.

Core Challenge: Building a robust, fault-tolerant pipeline to move vast amounts of raw data from operational databases (OLTP) to an analytical data warehouse (OLAP) for business intelligence.

Key Concepts:

  • ETL/ELT: Understanding the phases (Extract, Transform, Load or Extract, Load, Transform) and the tools used (e.g., Apache Spark, Hadoop MapReduce).
  • Workflow Orchestration: Using tools (like Apache Airflow) to manage dependencies between different data processing jobs and handle scheduling/retries.
  • Data Lake/Warehouse: Designing the target storage architecture (e.g., AWS S3, Snowflake) for columnar storage and efficient querying.

20. Design an Image Hosting and Upload Service (e.g., Imgur, Flickr)

This combines file storage, image manipulation, and global content delivery.

Core Challenge: Handling high-volume image uploads, generating multiple resized and optimized formats, and serving them quickly worldwide.

Key Concepts:

  • Asynchronous Processing: Using a Message Queue to decouple the image upload process from the heavy, time-consuming Image Resizing/Processing Service.
  • Content Delivery Network (CDN): Essential for caching various image formats and serving them with low latency.
  • Deduplication: Techniques to avoid storing the same image multiple times, often by hashing the image content and checking the hash against an index.

21. Design a Distributed API Gateway

This tests your understanding of edge-level traffic management, security, and service orchestration.

Core Challenge: Creating a unified, centralized entry point for all client requests, which provides routing, security, monitoring, and request throttling before requests reach the backend microservices.

Key Concepts:

  • Reverse Proxy & Load Balancing: The fundamental role of the gateway in distributing traffic across microservices.
  • Authentication & Authorization: Offloading security tasks (like token validation) from individual microservices to the gateway.
  • Circuit Breaker Pattern: Implementing logic to automatically stop forwarding requests to a failing service to prevent cascading failures.

22. Design a Load Balancer (L4 and L7)

This deep-dive question explores how traffic is managed at the network and application layers, testing reliability and routing algorithms.

Core Challenge: Distributing incoming network traffic efficiently across a group of backend servers to maximize throughput and ensure high availability.

Key Concepts:

  • L4 vs. L7: Understanding the difference between Layer 4 (Transport, uses IP/Port) and Layer 7 (Application, uses Headers/Cookies/URL Path) load balancing and their respective pros/cons.
  • Health Checks: Mechanisms to continuously monitor the health of backend servers and remove failing ones from the rotation instantly.
  • Algorithms: Comparing and contrasting various traffic distribution algorithms, such as Round Robin, Least Connections, and Weighted Algorithms.

23. Design an Identity Verification Service (KYC/AML)

This addresses complex, multi-step workflows, integration with third-party services, and compliance requirements.

Core Challenge: Creating a robust, auditable system to verify a user’s identity (Know Your Customer/KYC) and screen against watchlists (Anti-Money Laundering/AML).

Key Concepts:

  • Workflow Engine: Using a system (like a state machine or a dedicated workflow service) to manage the complex, multi-step process (e.g., capture ID $\rightarrow$ facial recognition $\rightarrow$ database lookup $\rightarrow$ manual review).
  • Audit Logging: Ensuring every step and decision is logged immutably for regulatory compliance.
  • Third-Party Integration: Designing secure, reliable interfaces to communicate with external identity and government databases.

24. Design a Recommendation System (e.g., Netflix Movie Suggestions)

A challenging problem that integrates data science concepts with scalable, low-latency infrastructure.

Core Challenge: Generating highly relevant and personalized suggestions for millions of users with low latency based on their history and similarities to others.

Key Concepts:

  • Candidate Generation: The process of quickly selecting a small subset of items (candidates) that might be relevant to the user (e.g., using collaborative filtering or content-based methods).
  • Scoring and Ranking: Using machine learning models to assign a relevance score to each candidate and ordering them before presentation.
  • Offline vs. Near Real-time: Differentiating between batch processing for model training and feature extraction versus fast, online serving of results.

25. Design a Personalized Search Engine (Focusing on the Backend)

This extends the standard search problem by introducing complex user history and context into the ranking algorithm.

Core Challenge: Improving search result relevance by dynamically incorporating a user’s past queries, clicked results, and profile information into the ranking score.

Key Concepts:

  • Inverted Index: The core data structure for fast text search, mapping words to document IDs.
  • Ranking Function: Modifying the standard term frequency/inverse document frequency (TF-IDF) or BM25 score with Personalization Features (e.g., penalizing results the user has already seen or boosting results from preferred categories).
  • Low Latency Feature Store: A highly performant service (often a distributed cache) to store and retrieve the user’s specific context and features required by the ranking model.

26. Design a Distributed File System (e.g., HDFS, Google File System)

This is a foundational problem that tests your understanding of data integrity, block management, and fault tolerance at the storage layer.

Core Challenge: Storing very large files reliably and efficiently across a cluster of commodity hardware servers.

Key Concepts:

  • Master/Slave Architecture: The role of the Namenode (Master) for managing metadata and the Datanodes (Slaves) for storing the actual file blocks.
  • Block Replication: Breaking files into fixed-size blocks and replicating each block across multiple Datanodes to guarantee fault tolerance and availability.
  • Data Integrity: Implementing mechanisms (e.g., checksums) to detect corruption of data blocks over time and initiating re-replication.

27. Design a Ride-Sharing Service (e.g., Uber, Lyft)

This design question tests your understanding of real-time geospatial processing, low-latency communication, matching algorithms, and managing millions of concurrent connections and moving objects.

Core Challenge: Instantly matching a rider with the nearest suitable driver, calculating accurate estimated times of arrival (ETAs) based on real-time traffic, and tracking the ride’s state from request to completion with minimal latency.

Key Concepts:

  • Real-time Geospatial Indexing: This is the heart of the system. To find the “nearest” driver efficiently among millions of locations, the system cannot use simple database queries. Instead, it must use specialized spatial indexing structures:

    • Geohashing / S2 Library (or H3): Dividing the entire map into a hierarchical grid of cells. Driver locations are indexed by the cell they currently occupy, allowing for rapid, region-based lookup of nearby drivers.

    • In-Memory Storage: Real-time driver locations are stored in a low-latency, in-memory data store (like Redis or an in-house solution) to handle the massive read/write throughput from continuous GPS updates.

  • Low-Latency Communication: Using WebSockets or MQTT to maintain persistent, bidirectional connections between the server and the rider/driver applications. This is necessary for real-time location updates (driver to server) and instant ride request notifications (server to driver).

  • Dispatch/Matching Engine: The service responsible for finding and assigning the best driver based on criteria beyond just distance (e.g., driver rating, ETA, vehicle type, current surge pricing). This is often a separate, optimized service.

  • ETA Calculation and Mapping: Using a Graph Database (or an external map API) where intersections are nodes and road segments are edges. Edge weights are dynamically adjusted based on real-time traffic data (often derived from driver locations) to calculate the most accurate route and Estimated Time of Arrival.

  • Trip State Machine: Using a robust finite state machine to track the lifecycle of a trip (Requested → Accepted → En Route → Arrived → Picked Up → Dropped Off → Completed). This is critical for reliable billing, notifications, and fault tolerance.

28. Design a Music Streaming Service (e.g., Apple Music, Spotify)

This design question tests your ability to handle petabytes of data, ensure high-quality, continuous media streaming globally, and build a highly effective recommendation engine.

Core Challenge: Storing and serving millions of audio files at multiple quality levels to hundreds of millions of concurrent users worldwide with minimal buffering, while providing a personalized discovery experience.

Key Concepts:

  • Content Storage and Transcoding:

    • Blob Storage: Utilizing massive, highly available, and durable storage (like S3/Google Cloud Storage) for the master copies of all audio files.

    • Transcoding Pipeline: Converting the raw, high-fidelity audio files into various formats (e.g., AAC, MP3, FLAC) and bitrates (e.g., 96kbps, 320kbps, lossless) to suit different devices and network conditions.

  • Global Content Delivery Network (CDN): Essential for streaming. Popular and frequently accessed songs are cached aggressively on edge servers closer to the user to minimize latency and ensure a seamless, buffer-free playback experience.

  • Adaptive Bitrate Streaming (ABS): The client application continuously monitors the user’s network speed and dynamically switches between different quality versions of the song (transcoded bitrates) to ensure continuous playback without interruption.

  • Metadata Storage and Search:

    • Metadata Database: A highly scalable database (e.g., Cassandra, sharded relational DB) stores all song information (metadata) like title, artist, album, genre, and lyrics. This is crucial for search and recommendations.

    • Search Engine: Implementing a dedicated search infrastructure (e.g., Elasticsearch) for fast, full-text search across millions of song titles, artists, and album names.

  • Recommendation Engine (ML Service): A machine learning pipeline that analyzes user behavior (plays, skips, playlist additions, time of day) and uses techniques like Collaborative Filtering to generate highly personalized daily playlists and discover new music, which is crucial for user retention.