Skip to main content

Scaling to 1 Million Users: The Architecture I Wish I Knew Sooner

 






When we launched, we were happy just having 100 daily users. But within months, we hit 10,000, then 100,000. And scaling problems piled up faster than users.

We aimed for 1 million users, but the architecture that worked for 1,000 couldn’t keep up. Looking back, here’s the architecture I wish I’d built from day one — and what we learned scaling under pressure.

Phase 1: The Monolith That Worked (Until It Didn’t)

Our first stack was simple:

  • Spring Boot app
  • MySQL database
  • NGINX load balancer
  • Everything deployed on one VM
[ Client ][ NGINX ][ Spring Boot App ][ MySQL ]

This setup handled 500 concurrent users easily. But at 5,000 concurrent users:

  • CPU maxed out
  • Queries slowed down
  • Uptime dropped below 99%

Monitoring showed DB locks, GC pauses, and thread contention.

Phase 2: Throwing More Servers (But Missing the Real Bottleneck)

We added more app servers behind NGINX:

[ Client ][ NGINX ][ App1 | App2 | App3 ][ MySQL ]

It scaled reads fine. But writes still funneled into a single MySQL instance.

Under load tests:

| Users | Avg Response Time |
| ----- | ----------------- |
| 1000 | 120ms |
| 5000 | 480ms |
| 10000 | 3.2s |

The bottleneck wasn’t CPU — it was the database.

Phase 3: Introducing a Cache

We added Redis as a caching layer for read-heavy queries:

public User getUser(String id) {
User cached = redisTemplate.opsForValue().get(id);
if (cached != null) return cached;
User user = userRepository.findById(id).orElseThrow();
redisTemplate.opsForValue().set(id, user, 10, TimeUnit.MINUTES);
return user;
}

This reduced DB load by 60% and cut response times to under 200ms for cached reads.

Benchmark for 1,000 concurrent user profile requests:

| Approach   | Avg Latency | DB Queries |
| ---------- | ----------- | ---------- |
| No Cache | 150ms | 1000 |
| With Cache | 20ms | 50 |

Phase 4: Breaking the Monolith

We broke out core features into microservices:

  • User Service
  • Post Service
  • Feed Service

Each with its own database schema (same DB instance initially).

Inter-service communication used REST APIs:

@RestController
public class FeedController {
@GetMapping("/feed/{userId}")
public Feed getFeed(@PathVariable String userId) {
User user = userService.getUser(userId);
List<Post> posts = postService.getPostsForUser(userId);
return new Feed(user, posts);
}
}

But chaining REST calls caused latency inflation. One request fanned out into 3–4 internal requests.

At scale, this killed performance.

Phase 5: Messaging and Asynchronous Processing

We added Kafka for async workflows:

  • User signup triggers Kafka event
  • Downstream services consume events instead of synchronous REST
// Publish
kafkaTemplate.send("user-signed-up", newUserId);

// Consume
@KafkaListener(topics = "user-signed-up")
public void handleSignup(String userId) {
recommendationService.prepareWelcomeRecommendations(userId);
}

With Kafka, signup latency dropped from 1.2s to 300ms, since expensive downstream tasks ran out of band.

Phase 6: Scaling the Database

At 500,000 users, our MySQL instance couldn’t keep up — even with caching.

We added:

Read replicas → Split reads/writes
Sharding → User-based partitions (users 0–999k, 1M-2M, etc.)
Archive tables → Move cold data out of hot paths

Example query router:

if (userId < 1000000) {
return jdbcTemplate1.query(...);
} else {
return jdbcTemplate2.query(...);
}

This reduced write contention and query times across shards.

Phase 7: Observability

At 100,000+ users, debugging was a nightmare without visibility.

We added:

Distributed tracing (Jaeger + OpenTelemetry)
Centralized logs (ELK stack)
Prometheus + Grafana dashboards

Sample Grafana metrics:

| Metric         | Value   |
| -------------- | ------- |
| P95 latency | 280ms |
| DB connections | 120/200 |
| Kafka lag | 0 |

Before observability, diagnosing latency spikes took hours. After, minutes.

Phase 8: CDN and Edge Caching

At 1 million users, 40% of traffic hit static files (images, avatars, JS bundles).

We moved them to Cloudflare CDN with aggressive caching:

| Asset              | Origin Latency | CDN Latency |
| ------------------ | -------------- | ----------- |
| /static/app.js | 400ms | 40ms |
| /images/avatar.png | 300ms | 35ms |

This offloaded 70% of traffic from origin servers.

Final Architecture I’d Build Sooner

If I could start over, I’d skip phases and build this earlier:

[ Client ]  

[ CDN + Edge Caching ]

[ API Gateway → Service Mesh ]

[ Microservices + Kafka + Redis Cache ]

[ Sharded Database + Read Replicas ]

Key lessons:

Caching isn’t optional
DB scaling needs to be designed early
Async processing is critical
Observability pays off early

Scaling isn’t about “adding more servers” — it’s about removing bottlenecks at every layer.

Final Benchmark (1 Million Users, 1,000 RPS):

| Metric             | Value  |
| ------------------ | ------ |
| P95 API Latency | 210ms |
| Error Rate | <0.1% |
| Cache Hit Ratio | 85% |
| DB Query Rate | 50 qps |
| Kafka Consumer Lag | 0 |

Closing Thoughts

Scaling to a million users isn’t about fancy tech — it’s about solving the right problems in the right order.

The architecture that served your first 1,000 users won’t serve the next million.

Plan for failure modes before you hit them.

What architectural mistake cost you the most at scale? I’d love to hear.




Comments

Popular posts from this blog

GPT-5 Drops in July 2025: The AI Revolution That’s About to Explode Your World

  “It’s wild watching people use ChatGPT… knowing what’s coming.” — OpenAI insider Picture this: It’s July 2025, and the AI landscape is about to shatter into  before  and  after . If GPT-4 felt like a game-changer,  GPT-5  is set to rewrite the rules entirely. This isn’t some minor tweak — it’s a full-blown  paradigm shift , leaping from mind-blowing to straight-up revolutionary. And guess what? It’s hitting sooner than anyone dared to dream. Why July 2025 Is the Date That Changes Everything OpenAI doesn’t do slow rolls. Remember GPT-4? Total radio silence, then  bam  — the world flipped overnight. Back in February 2024, CEO Sam Altman teased that GPT-5 would follow GPT-4.5 “in months, not years”. Fast-forward to now, and summer 2025 is here, backed by internal whispers and recent leaks. Why does this timeline hit so hard? Because AI isn’t evolving — it’s  exploding . Experts thought we’d wait years for this level of tech, but OpenAI’s ...

ChatGPT Launched A NEW Feature That’s CRAZY! New MCP connectors for Google Drive, Box

  OpenAI’s ChatGPT is adding new features for business users, including integrations with different cloud services, meeting recordings, and MCP connection support for connecting to tools for deep research. Introduction to ChatGPT’s New Features ChatGPT has long been at the forefront of AI advancements, offering innovative solutions for various sectors. The latest updates bring a suite of features designed to streamline workflows and enhance user interaction. Among these, the meeting recording functionality stands out as a game-changer for professionals who rely on accurate documentation and seamless collaboration. As part of the launch, ChatGPT is gaining connectors for Dropbox, Box, SharePoint, OneDrive, and Google Drive. This allows ChatGPT to look for information across users’ own services to answer their questions. For instance, an analyst could use the company’s slide deck and documents to build out an investment thesis. OpenAI said that the new feature will follow an organiza...

How to Connect Your Zerodha Account to Claude Using Kite MCP

  Have you ever wished you could ask an AI Assistant to analyze your portfolio and tell you how your stocks are doing today? With the latest release of Kite MCP (Model Context Protocol) from Zerodha, that future is here. The MCP lets you connect your Zerodha account with Claude and ask it to work for you. This connection allows investors to chat with their portfolio and ask complex market questions, all in simple English. Whether you are a seasoned trader or a complete beginner, this integration will completely change your investing workflow. Understanding Kite MCP Kite MCP acts as a connector between your LLM (Large Language Model) and the external tools available, in a structured way. It is like a standardized way for LLMs to talk to or work with external systems, making it easier to perform multi-step tasks. The MCP also acts like a contextual data layer that allows AI to see the live data. The traditional Kite API gives us structured data based on manual queries. We would then ...