HLD Classic Problems
AI Learning Mentor
Generative insights & diagnostic help
Designing a News Feed: Twitter / X
Designing a feed timeline involves distributing posts from authors to millions of followers in real-time. There are two primary system strategies:
- Push Model (Fan-out-on-write): When an author posts a tweet, the system immediately inserts it into the pre-computed feed timelines of all their followers in Redis.
Limitation: Breaks during **Celebrity writes** (e.g. if a user has 100M followers, a single write triggers 100M Redis operations). - Pull Model (Fan-out-on-read): Tweets are written to a database. When a follower refreshes their feed, the system queries the active authors they follow, pulls their latest posts, and merges them dynamically.
Limitation: High read latency if a user follows thousands of active accounts.
Designing Video Streaming: YouTube / Netflix
Architecting large-scale video delivery requires highly automated ingestion and low-latency global delivery pipelines:
Ingestion Pipeline
Videos are uploaded in raw formats. A pipeline chunks them into short segments (e.g. 5-10s) and transcodes them into standard resolutions (1080p, 720p, 480p) and standard streaming protocols: **HLS** (HTTP Live Streaming) or **DASH**. Transcoded chunks are cached across global Edge **Content Delivery Networks (CDNs)** to stream directly to users close to their geographical areas.
Designing Spatial Services: Uber / Ola
Matchmaking passengers with driver nodes requires high-speed spatial calculations. Standard databases cannot perform complex distance queries efficiently at scale.
Instead, we divide the physical map into a grid of discrete cells using a **Spatial Indexing** algorithm: **Geohash** (base-32 alphanumeric strings) or **Google H3** (hexagonal cells). Driver coordinates are mapped to specific geohash buckets in RAM. Finding nearby drivers becomes a rapid key-value query on index matches.
Designing a Distributed Rate Limiter
A rate limiter prevents DDoS attacks, API abuse, and server exhaustion by capping the number of requests a user can make within a time window. Crucial architectures include:
- Token Bucket: A bucket holds up to $N$ tokens, refilled at a constant rate. Each request consumes a token. Allows bursts.
- Sliding Window Log: Logs timestamps of requests in a Redis sorted set. Highly accurate but memory intensive.
- Sliding Window Counter: Combines requests of the previous window and active window, saving substantial memory.