The Wall
At 100 users, everything was fine.
At 500 users, things slowed.
At 1,000 users, everything broke.
Here's what we learned scaling.
What Broke First
1. Database Queries
Simple queries that worked fine became slow.
No indexes. Full table scans.
Fix: Index frequently queried columns.
2. N+1 Queries
Fetch user, then posts, then comments, then...
Each request spawned dozens.
Fix: Batch queries. Use joins.
3. No Caching
Every request hit the database.
Fix: Redis cache for common queries.
What Broke Second
1. Background Jobs
Emails, notifications, webhooks.
Queue wasn't built for volume.
Fix: Proper job queue. Sidekiq.
2. File Storage
Uploads went to local disk.
Fix: S3. Cloudflare R2.
3. Session Management
Sessions in memory.
Fix: Redis sessions.
What We Should Have Built First
1. Indexes
Every foreign key. Every filterable column.
Don't wait until queries are slow.
2. Caching Layer
Redis from day one.
Cache expensive queries.
3. Job Queue
Sidekiq. BullMQ.
Process async work.
The Scaling Checklist
- Database indexes on foreign keys
- N+1 query elimination
- Redis caching layer
- Job queue for background work
- Object storage for files
- CDN for static assets
- Monitoring and alerts
The Honest Take
Don't optimize prematurely.
But don't ignore scaling signals.
When queries slow, fix them.
When jobs back up, fix them.
Small fixes prevent big rewrites.