Shorten URL
let’s design URL Shortener for interview prep, fully in HLD notes format (no code), starting from functional requirements → non-functional requirements → API endpoints → design approach → components → data model → scaling strategy.
I’ll make it structured so you can directly speak this in an interview.
1️⃣ Problem Statement
Design a service like Bitly that converts long URLs into short, unique aliases and redirects when someone accesses the short link.
2️⃣ Functional Requirements
Must-have
-
Shorten URL
-
User provides a long URL, system returns a unique short URL.
-
-
Redirect
-
Accessing a short URL should redirect to the original long URL.
-
-
Analytics (Basic)
-
Count number of times the short URL is used (click count).
-
-
Custom Alias (Optional)
-
Users can provide their own alias if available.
-
Good-to-have
-
Expiration date for short links.
-
Support for user accounts and history of their created links.
3️⃣ Non-Functional Requirements
-
Scalability – Handle high read traffic (redirects) and moderate write traffic (shortening).
-
Low Latency – Redirect in under 50ms for good user experience.
-
High Availability – System should be available 99.99% of the time.
-
Fault Tolerance – Survive cache or database failures.
-
Consistency – Short URL mapping must be consistent after creation.
-
Analytics Delay – Analytics can be eventually consistent (real-time not required).
4️⃣ Assumptions & Constraints
-
Traffic Pattern:
-
Reads: 100k requests/sec (redirects)
-
Writes: 10k requests/sec (shorten requests)
-
Read-to-write ratio ≈ 10:1
-
-
Short URL length: 6–8 characters.
-
Retention: Short URLs never expire unless explicitly set.
5️⃣ API Endpoints
Base URL: https://short.ly
POST /shorten
Request:
{
"longUrl": "https://example.com/very/long/url",
"customAlias": "myalias" // optional
}
Response:
{
"shortUrl": "https://short.ly/Ab3dE"
}
GET /{alias}
-
302 Redirect to original long URL.
GET /stats/{alias}
6️⃣ High-Level Design Approach
We’ll design for read-heavy workload:
-
Use cache-first approach for redirects.
-
Use persistent DB for durability.
-
Store analytics asynchronously to avoid slowing redirects.
7️⃣ Component Diagram (Textual)
Clients
-
Web Browser / Mobile App
Services
-
API Gateway / Load Balancer
-
Routes requests to shortening or redirect services.
-
-
Shortening Service
-
Generates alias.
-
Stores mapping in DB + cache.
-
-
Redirect Service
-
Looks up alias in cache → if miss, reads DB → redirects.
-
-
Analytics Service
-
Listens to click events and updates counters asynchronously.
-
Storage
-
Persistent DB: MySQL (sharded) / Cassandra.
8️⃣ Data Model
Table: url_map
Column | Type | Notes |
---|---|---|
alias | VARCHAR | Unique short key |
long_url | TEXT | Original URL |
created_at | DATETIME | |
expire_at | DATETIME | Optional expiration |
click_count | BIGINT | Can be eventually consistent |
Indexes:
-
Primary key: alias
-
Optional index on long_url for deduplication.
9️⃣ Alias Generation Strategies
-
Base62 encoding of ID
-
Generate sequential ID, convert to base62.
-
-
Random String
-
Generate random 6–8 char string, check for collision.
-
-
Hash + Collision Handling
-
Take hash of long URL, truncate, resolve collisions.
🔟 Scaling Strategy
Reads
-
Cache hot keys in Redis (cache hit latency ~1ms).
-
Use LRU eviction + TTL.
-
Use multiple Redis shards to handle traffic.
Writes
-
DB sharding by alias prefix or hash.
-
Replication (Master for writes, slaves for reads if SQL).
Analytics
-
Publish click events to Kafka.
-
Consumers aggregate clicks in DB periodically.
Fault Tolerance
-
Multiple service instances across availability zones.
-
Cache replication (Redis cluster / Sentinel).
-
DB replication + failover.
1️⃣1️⃣ Bottlenecks & Solutions
-
Cache Miss Storm: Pre-warm hot keys, add local in-memory cache.
-
Hot URL: Partition counters or use atomic increment in Redis.
-
DB Shard Hotspot: Hash alias to distribute evenly.
URL Shortener — HLD Diagram & Notes
This canvas contains the architecture diagram(s), sequence flows, and interview-ready talking points for a URL shortener (no code). Use this during interviews to explain components and flows.
1. Architecture (Component Diagram — ASCII)
+------------+ +-----------------+ +-------------+
| Client | ---> | Load Balancer / | ---> | API Gateway |
| (browser) | | CDN (optional)| +-------------+
+------------+ | |
v v
+---------------+ +----------------+
| Redirect Svc | | Shorten Svc |
| (stateless) | | (stateless) |
+---------------+ +----------------+
| |
cache lookup (hot keys) <----+ +--> write mapping
| |
+-----------+ +-----------+
| Cache | | Message |
| (Redis) | | Queue |
+-----------+ +-----------+
| |
v v
+-------------------------------+
| Persistent Store (DB / NOSQL)|
| e.g. MySQL (sharded) / |
| Cassandra |
+-------------------------------+
|
v
+----------------+
| Analytics Svc |
| (consumers) |
+----------------+
2. Sequence Diagram — Create Short URL (high-level)
Client -> API Gateway: POST /shorten { longUrl }
API Gateway -> Shorten Svc: validate URL, check custom alias
Shorten Svc -> DB: reserve ID / upsert mapping
Shorten Svc -> Cache: set(shortCode -> longUrl)
Shorten Svc -> Client: return shortUrl
Notes: If using base62(id) generation: service gets unique ID (DB insert or ID service) → converts to shortCode.
3. Sequence Diagram — Redirect
Client -> LB -> Redirect Svc: GET /{shortCode}
Redirect Svc -> Cache (Redis): GET shortCode
alt cache hit
Cache -> Redirect Svc: longUrl
Redirect Svc -> Client: 302 -> longUrl
Redirect Svc -> Queue: publish click event
else cache miss
Redirect Svc -> DB: SELECT longUrl WHERE shortCode
DB -> Redirect Svc: longUrl
Redirect Svc -> Cache: set(shortCode -> longUrl)
Redirect Svc -> Client: 302 -> longUrl
Redirect Svc -> Queue: publish click event
end
Queue -> Analytics Svc: click event stream
Analytics Svc -> DB: increment click counters (batch)
4. Sequence Diagram — Analytics Aggregation (background)
Kafka/RabbitMQ <- Redirect Svc: click events
Analytics Consumer -> Aggregator: group by shortCode
Aggregator -> DB (batch): update click_count (or write to timeseries store)
Note: Keep click counts eventually consistent to avoid adding latency to redirects.
5. Components & Responsibilities
-
API Gateway / Load Balancer: Rate limiting, auth (optional), routing.
-
Shorten Service: Validate input, generate alias, handle custom alias, write mapping.
-
Redirect Service: Fast path for redirects (cache-first), publishes click events.
-
Cache (Redis): Primary read cache for redirects; high hit-rate reduces DB load.
-
DB / Persistent Store: Durable mapping storage. SQL if you need relational features, NoSQL (Cassandra) for wide-scale writes.
-
Message Queue (Kafka): Buffer click events for asynchronous processing.
-
Analytics Service: Batch processing and aggregation of click metrics.
6. Data Model (example)
Table: url_map
-
short_code
(PK) — string (6–8 chars) -
long_url
— text -
created_at
— timestamp -
expire_at
— timestamp (nullable) -
owner_id
— optional user id -
meta
— optional JSON for tags, campaign -
click_count
— stored as eventually consistent value (can be derived)
Indexes: primary on short_code
; optional index on hash(long_url) for deduplication.
7. Alias Generation Options (talking points)
-
Base62(id): Insert row to DB (get increment id) → convert to base62. Simple but predictable.
-
Distributed ID (Snowflake) + Base62: Avoid central DB for ID allocation.
-
Random 6–8 char string: Generate randomly; if collision, retry. Good distribution, unpredictable.
-
Hash (URL) + collision resolution: Deterministic for same longUrl, but collisions must be handled.
Mention trade-offs: predictability vs security, collision probability, duplicate detection.
8. Caching Strategy
-
Cache
shortCode -> longUrl
in Redis with TTL and LRU. -
Warm/populate cache on shorten and on first redirect.
-
Consider local in-memory LRU on app instance for ultra-hot keys.
-
For hot-keys that cause heavy traffic, replicate or pin them to multiple cache nodes.
9. Database Sharding & Partitioning
-
Shard by
shortCode
hash range to distribute reads/writes evenly. -
If using sequential IDs (base62) avoid hotspot by using a distributed ID generator or randomizing alias.
-
Use read replicas for read-heavy queries (if using SQL).
10. Handling Hot Keys
-
Use multi-level caching (local + Redis).
-
Rate-limit redirects per alias if abuse suspected.
-
Serve read-only from multiple cache replicas.
11. Availability & Fault Tolerance
-
Deploy services in multiple AZs / data centers.
-
Redis in cluster/sentinel mode with failover.
-
DB replication and automated failover.
-
If DB unavailable, allow redirects for cached keys; return graceful error for misses.
12. Security & Abuse Prevention
-
Validate target URLs (block known malicious domains).
-
Rate-limit shorten requests per IP/account.
-
CAPTCHA or email verification for bulk creation.
-
Monitor for mass-creation patterns.
13. Monitoring & Metrics (what to show in interview)
-
Redirect latency, error rates, RPS (per endpoint).
-
Cache hit ratio, DB replica lag, queue backlog.
-
Number of short URLs created per minute.
-
Alerts for abnormal spikes or cache-miss storms.
14. Interview Talking Script (30–60s)
"I would build the service as two main stateless services — a shortening service and a redirect service — behind a load balancer. Shorten writes to a durable store and warms the cache; redirect is cache-first and publishes click events to Kafka for async analytics. For alias generation I'd prefer base62 over a distributed ID generator for simplicity, but if unpredictability is required I'd use random strings. To scale, use Redis for caching hot keys, shard the DB by alias hash, and replicate services across availability zones. Key trade-offs are predictability vs simplicity and eventual consistency for analytics to keep redirects fast."
15. Next steps (optional deliverables)
-
Diagram visualized (PNG) if you want a slide-ready image.
-
Add sequence diagrams in UML format for slides.
-
Convert the architecture into a 5–10 minute spoken walkthrough you can practice.
Comments
Post a Comment