system-architecture
π―Skillfrom yennanliu/cs_basics
system-architecture skill from yennanliu/cs_basics
Installation
npx skills add https://github.com/yennanliu/cs_basics --skill system-architectureSkill Details
System design and architecture expert for creating scalable distributed systems. Covers system design interviews, architecture patterns, and real-world case studies like Netflix, Twitter, Uber. Use when designing systems, writing architecture docs, or preparing for system design interviews.
Overview
# System Architecture Expert
When to use this Skill
Use this Skill when:
- Designing distributed systems
- Writing system design documentation
- Preparing for system design interviews
- Creating architecture diagrams
- Analyzing trade-offs between design choices
- Reviewing or improving existing system designs
System Design Framework
1. Requirements Gathering (5-10 minutes)
Functional Requirements:
- What are the core features?
- What actions can users perform?
- What are the inputs and outputs?
Non-Functional Requirements:
- Scale: How many users? How much data?
- Performance: Latency requirements? (p50, p95, p99)
- Availability: What uptime is needed? (99.9%, 99.99%)
- Consistency: Strong or eventual consistency?
Constraints:
- Budget limitations
- Technology stack constraints
- Team expertise
- Timeline
Example Questions:
```
- How many daily active users?
- What's the read:write ratio?
- What's the average data size?
- What's the peak load vs average load?
- Do we need real-time updates?
- Can we have data loss?
```
2. Capacity Estimation (Back-of-the-envelope)
Calculate:
```
Traffic:
- DAU = 100M users
- Each user makes 10 requests/day
- QPS = 100M * 10 / 86400 β 11,574 QPS
- Peak QPS = 2-3x average β 30,000 QPS
Storage:
- 100M users * 1KB per user = 100GB
- With 3x replication = 300GB
- Growth: 300GB * 365 days = 109.5TB/year
Bandwidth:
- QPS * average request size
- 11,574 * 10KB = 115.74MB/s
```
Memory/Cache:
- 80-20 rule: 20% of data gets 80% of traffic
- Cache = 20% of total data for hot data
3. High-Level Design
Core Components:
- Client Layer (Web, Mobile, Desktop)
- API Gateway / Load Balancer
- Application Servers (Business logic)
- Cache Layer (Redis, Memcached)
- Database (SQL, NoSQL, or both)
- Message Queue (Kafka, RabbitMQ)
- Object Storage (S3, GCS)
- CDN (CloudFront, Akamai)
Draw Architecture:
```
[Clients] β [CDN]
β
[Load Balancer]
β
[Application Servers]
β β β
[Cache] [DB] [Queue] β [Workers]
β
[Object Storage]
```
4. Database Design
SQL vs NoSQL Decision:
Use SQL when:
- ACID transactions required
- Complex queries with JOINs
- Structured data with relationships
- Examples: PostgreSQL, MySQL
Use NoSQL when:
- Massive scale (horizontal scaling)
- Flexible schema
- High write throughput
- Examples: Cassandra, DynamoDB, MongoDB
Sharding Strategy:
- Hash-based:
user_id % num_shards - Range-based: Users 1-100M on shard 1
- Geographic: US users on US shard
- Consistent hashing: For even distribution
Schema Design:
```sql
-- Example: URL Shortener
CREATE TABLE urls (
id BIGSERIAL PRIMARY KEY,
short_url VARCHAR(10) UNIQUE NOT NULL,
long_url TEXT NOT NULL,
user_id BIGINT,
created_at TIMESTAMP DEFAULT NOW(),
expires_at TIMESTAMP,
click_count INT DEFAULT 0,
INDEX (short_url),
INDEX (user_id)
);
```
5. Deep Dive Components
Caching Strategy:
- Cache-Aside: App reads from cache, loads from DB on miss
- Write-Through: Write to cache and DB together
- Write-Behind: Write to cache, async write to DB
Eviction Policies:
- LRU (Least Recently Used) - Most common
- LFU (Least Frequently Used)
- TTL (Time To Live)
Load Balancing:
- Round Robin: Simple, equal distribution
- Least Connections: Route to least busy server
- Consistent Hashing: Minimize redistribution
- Weighted: Based on server capacity
Message Queue Patterns:
- Pub/Sub: One-to-many (notifications)
- Work Queue: Task distribution (job processing)
- Fan-out: Broadcast to multiple queues
6. Scalability Patterns
Horizontal Scaling:
- Add more servers
- Use load balancers
- Stateless application servers
- Session stored in cache/DB
Vertical Scaling:
- Add more CPU/RAM to servers
- Limited by hardware
- Simpler but has limits
Microservices:
```
Monolith:
[Single App] β [DB]
Microservices:
[User Service] β [User DB]
[Post Service] β [Post DB]
[Feed Service] β [Feed DB]
```
Benefits:
- Independent scaling
- Technology flexibility
- Fault isolation
Drawbacks:
- Increased complexity
- Network latency
- Distributed transactions
7. Reliability & Availability
Replication:
- Master-Slave: One writer, multiple readers
- Master-Master: Multiple writers (conflict resolution needed)
- Multi-region: Geographic redundancy
Failover:
- Active-Passive: Standby server takes over
- Active-Active: Both servers handle traffic
Rate Limiting:
- Token bucket algorithm
- Leaky bucket algorithm
- Fixed window counter
- Sliding window log
Circuit Breaker:
```
States:
Closed β Normal operation
Open β Reject requests immediately
Half-Open β Test if service recovered
```
8. Common System Design Patterns
Content Delivery:
- Use CDN for static assets
- Geo-distributed edge servers
- Cache at edge locations
Data Consistency:
- Strong Consistency: Read reflects latest write (ACID)
- Eventual Consistency: Reads eventually reflect write (BASE)
- CAP Theorem: Choose 2 of 3: Consistency, Availability, Partition Tolerance
API Design:
```
RESTful:
GET /api/users/{id}
POST /api/users
PUT /api/users/{id}
DELETE /api/users/{id}
GraphQL:
query {
user(id: "123") {
name
posts {
title
}
}
}
```
9. System Design Template
Use this structure (based on system_design/00_template.md):
```markdown
# {System Name}
1. Requirements
Functional
- [List core features]
Non-Functional
- Scale: [Users, QPS, Data]
- Performance: [Latency requirements]
- Availability: [Uptime target]
2. Capacity Estimation
- Traffic: [QPS calculations]
- Storage: [Data size, growth]
- Bandwidth: [Network requirements]
3. API Design
```
[endpoint] - [description]
```
4. High-Level Architecture
[Diagram]
5. Database Schema
[Tables and relationships]
6. Detailed Design
Component 1
[Deep dive]
Component 2
[Deep dive]
7. Scalability
[How to scale each component]
8. Trade-offs
[Decisions and alternatives]
```
10. Real-World Examples
Reference case studies in system_design/:
- Netflix: Video streaming, recommendation
- Twitter: Timeline, tweet storage, trending
- Uber: Real-time matching, location tracking
- Instagram: Image storage, feed generation
- WhatsApp: Message delivery, presence
Common Patterns:
- News Feed: Fan-out on write vs fan-out on read
- Rate Limiter: Token bucket with Redis
- URL Shortener: Base62 encoding, hash collision
- Chat System: WebSocket, message queue
- Notification: Push notification service, APNs/FCM
Interview Tips
Time Management:
- Requirements: 10%
- High-level design: 25%
- Deep dive: 50%
- Wrap up: 15%
Communication:
- Think out loud
- Ask clarifying questions
- Discuss trade-offs
- Acknowledge limitations
What interviewers look for:
- Problem-solving approach
- Technical depth
- Trade-off analysis
- Scale awareness
- Communication skills
Common Mistakes to Avoid
- Jumping to solution without requirements
- Over-engineering simple problems
- Under-estimating scale requirements
- Ignoring single points of failure
- Not considering monitoring/alerting
- Forgetting about data consistency
- Missing security considerations
Project Context
- Templates in
system_design/00_template.md - Case studies in
system_design/*.md - Reference materials in
doc/system_design/ - Follow the established documentation pattern
More from this repository4
code-refactor-master skill from yennanliu/cs_basics
markdown-doc-writer skill from yennanliu/cs_basics
java-python-code-reviewer skill from yennanliu/cs_basics
java-developer skill from yennanliu/cs_basics