When we launched, we were happy just having 100 daily users. But within months, we hit 10,000, then 100,000. And scaling problems piled up faster than users.

We aimed for 1 million users, but the architecture that worked for 1,000 couldn’t keep up. Looking back, here’s the architecture I wish I’d built from day one — and what we learned scaling under pressure.
Phase 1: The Monolith That Worked (Until It Didn’t)
Our first stack was simple:
- Spring Boot app
- MySQL database
- NGINX load balancer
- Everything deployed on one VM
[ Client ] → [ NGINX ] → [ Spring Boot App ] → [ MySQL ]
This setup handled 500 concurrent users easily. But at 5,000 concurrent users:
- CPU maxed out
- Queries slowed down
- Uptime dropped below 99%
Monitoring showed DB locks, GC pauses, and thread contention.
Phase 2: Throwing More Servers (But Missing the Real Bottleneck)
We added more app servers behind NGINX:
[ Client ] → [ NGINX ] → [ App1 | App2 | App3 ] → [ MySQL ]
It scaled reads fine. But writes still funneled into a single MySQL instance.
Under load tests:
| Users | Avg Response Time |
| ----- | ----------------- |
| 1000 | 120ms |
| 5000 | 480ms |
| 10000 | 3.2s |
The bottleneck wasn’t CPU — it was the database.
Phase 3: Introducing a Cache
We added Redis as a caching layer for read-heavy queries:
public User getUser(String id) {
User cached = redisTemplate.opsForValue().get(id);
if (cached != null) return cached;
User user = userRepository.findById(id).orElseThrow();
redisTemplate.opsForValue().set(id, user, 10, TimeUnit.MINUTES);
return user;
}
This reduced DB load by 60% and cut response times to under 200ms for cached reads.
Benchmark for 1,000 concurrent user profile requests:
| Approach | Avg Latency | DB Queries |
| ---------- | ----------- | ---------- |
| No Cache | 150ms | 1000 |
| With Cache | 20ms | 50 |
Phase 4: Breaking the Monolith
We broke out core features into microservices:
- User Service
- Post Service
- Feed Service
Each with its own database schema (same DB instance initially).
Inter-service communication used REST APIs:
@RestController
public class FeedController {
@GetMapping("/feed/{userId}")
public Feed getFeed(@PathVariable String userId) {
User user = userService.getUser(userId);
List<Post> posts = postService.getPostsForUser(userId);
return new Feed(user, posts);
}
}
But chaining REST calls caused latency inflation. One request fanned out into 3–4 internal requests.
At scale, this killed performance.
Phase 5: Messaging and Asynchronous Processing
We added Kafka for async workflows:
- User signup triggers Kafka event
- Downstream services consume events instead of synchronous REST
// Publish
kafkaTemplate.send("user-signed-up", newUserId);
// Consume
@KafkaListener(topics = "user-signed-up")
public void handleSignup(String userId) {
recommendationService.prepareWelcomeRecommendations(userId);
}
With Kafka, signup latency dropped from 1.2s to 300ms, since expensive downstream tasks ran out of band.
Phase 6: Scaling the Database
At 500,000 users, our MySQL instance couldn’t keep up — even with caching.
We added:
Read replicas → Split reads/writes
Sharding → User-based partitions (users 0–999k, 1M-2M, etc.)
Archive tables → Move cold data out of hot paths
Example query router:
if (userId < 1000000) {
return jdbcTemplate1.query(...);
} else {
return jdbcTemplate2.query(...);
}
This reduced write contention and query times across shards.
Phase 7: Observability
At 100,000+ users, debugging was a nightmare without visibility.
We added:
Distributed tracing (Jaeger + OpenTelemetry)
Centralized logs (ELK stack)
Prometheus + Grafana dashboards
Sample Grafana metrics:
| Metric | Value |
| -------------- | ------- |
| P95 latency | 280ms |
| DB connections | 120/200 |
| Kafka lag | 0 |
Before observability, diagnosing latency spikes took hours. After, minutes.
Phase 8: CDN and Edge Caching
At 1 million users, 40% of traffic hit static files (images, avatars, JS bundles).
We moved them to Cloudflare CDN with aggressive caching:
| Asset | Origin Latency | CDN Latency |
| ------------------ | -------------- | ----------- |
| /static/app.js | 400ms | 40ms |
| /images/avatar.png | 300ms | 35ms |
This offloaded 70% of traffic from origin servers.
Final Architecture I’d Build Sooner
If I could start over, I’d skip phases and build this earlier:
[ Client ]
↓
[ CDN + Edge Caching ]
↓
[ API Gateway → Service Mesh ]
↓
[ Microservices + Kafka + Redis Cache ]
↓
[ Sharded Database + Read Replicas ]
Key lessons:
Caching isn’t optional
DB scaling needs to be designed early
Async processing is critical
Observability pays off early
Scaling isn’t about “adding more servers” — it’s about removing bottlenecks at every layer.
Final Benchmark (1 Million Users, 1,000 RPS):
| Metric | Value |
| ------------------ | ------ |
| P95 API Latency | 210ms |
| Error Rate | <0.1% |
| Cache Hit Ratio | 85% |
| DB Query Rate | 50 qps |
| Kafka Consumer Lag | 0 |
Closing Thoughts
Scaling to a million users isn’t about fancy tech — it’s about solving the right problems in the right order.
The architecture that served your first 1,000 users won’t serve the next million.
Plan for failure modes before you hit them.
What architectural mistake cost you the most at scale? I’d love to hear.
Comments
Post a Comment