Why Every Senior Engineer Should Master Back-of-the-Envelope Calculations

February 2, 2026 (2w ago)

A few years ago, I sat in a design review where a team proposed building a real-time analytics pipeline. They'd spent three weeks designing the architecture. Kafka, Flink, Elasticsearch, Redis — the whole zoo.

A senior architect asked one question: "How many events per second are we expecting?"

"About 50."

Fifty events per second. A single PostgreSQL instance with a basic index handles that without breaking a sweat. The team had designed a system 1,000x more complex than necessary because nobody did the math first.

This is why back-of-the-envelope calculations matter. They're not about getting exact numbers. They're about getting the order of magnitude right so you don't build a rocket ship when you need a bicycle.

The Numbers Every Engineer Should Memorize

Before you can estimate anything, you need reference points. Memorize these:

Latency Numbers

1L1 cache reference:                 1 ns
2L2 cache reference:                 4 ns
3Main memory reference:             100 ns
4SSD random read:                   150 μs
5HDD random read:                    10 ms
6Network round trip (same DC):      500 μs
7Network round trip (cross-country):  60 ms
8Network round trip (cross-ocean):   150 ms

The key insight: There's a 10x-100x gap between each level. Memory is 1,000x faster than SSD. SSD is 100x faster than HDD. Same datacenter is 100x faster than cross-country.

Throughput Numbers

1Sequential read from SSD:          1 GB/s
2Sequential read from HDD:        100 MB/s
3Network bandwidth (1 Gbps):      125 MB/s
4Network bandwidth (10 Gbps):    1.25 GB/s

Storage Numbers

11 ASCII character:                  1 byte
2Average English word:               5 bytes
3Average tweet/message:            200 bytes
4Average JSON API response:          2 KB
5Average web page:                   2 MB
6Average photo (compressed):         2 MB
71 minute of HD video:             150 MB

Capacity Numbers

1QPS a single web server handles:     1,000 - 10,000
2QPS a single database handles:       5,000 - 10,000
3QPS Redis handles (single node):   100,000 - 200,000
4Requests a load balancer handles: 100,000+

The Framework: 5 Steps to Any Estimate

I use the same framework every time, whether I'm estimating storage for a social media feed or throughput for a payment system.

Step 1: Define the Scale

Start with the number of users and their behavior.

1Example: Estimate Twitter's storage needs
2
3Users:              500 million total
4Daily active users: 200 million (40% DAU ratio)
5Tweets per user:    2 per day (average)
6Total tweets/day:   400 million

Step 2: Estimate the Data Size Per Unit

1Average tweet:
2  - Text:           280 chars → 280 bytes
3  - Metadata:       user_id, timestamp, etc. → 200 bytes
4  - Media link:     (30% have media) → 100 bytes average
5  Total per tweet:  ~600 bytes ≈ 1 KB (round up for safety)

Pro tip: Always round up. It's better to overestimate capacity than underestimate. In system design, the cost of having too much capacity is low; the cost of too little is an outage.

Step 3: Calculate Daily / Monthly / Yearly

1Daily storage:      400M tweets × 1 KB = 400 GB/day
2Monthly storage:    400 GB × 30 = 12 TB/month
3Yearly storage:     12 TB × 12 = 144 TB/year
45-year storage:     144 TB × 5 = 720 TB

Step 4: Factor in Replication and Overhead

Real systems don't store one copy of anything.

1Replication factor: 3 (standard for most distributed systems)
2Storage with replication: 720 TB × 3 = 2.16 PB
3Index overhead: ~30%
4Total 5-year storage: 2.16 PB × 1.3 ≈ 2.8 PB

Step 5: Sanity Check

Does 2.8 PB over 5 years for Twitter's text data sound reasonable? Twitter (X) has reported petabyte-scale storage requirements, so yes — we're in the right ballpark.

Real-World Estimation: YouTube's Bandwidth

Let's do a more complex estimation that Google's own capacity planning teams have discussed.

1YouTube Statistics:
2- 2.5 billion monthly active users
3- Average session: 40 minutes/day
4- Average video bitrate: 5 Mbps (mix of qualities)

Daily bandwidth calculation

1Concurrent viewers (peak):
2  2.5B MAU × 0.3 (DAU ratio) × 0.1 (peak fraction)
3  = 75 million concurrent viewers
4
5Bandwidth:
6  75M viewers × 5 Mbps = 375 Petabits/second
7  = 375 Pbps ÷ 8 = ~47 PB/s
8
9Wait, that seems too high. Let me reconsider...
10
11Actually, not all users watch simultaneously.
12Peak concurrent = total_daily_minutes / minutes_in_day × peak_factor
13
14Total daily minutes: 750M DAU × 40 min = 30 billion minutes
15Average concurrent: 30B min / 1440 min = ~20 million
16Peak concurrent (2x average): ~40 million
17
18Bandwidth: 40M × 5 Mbps = 200 Tbps = 25 TB/s

YouTube has reported peak bandwidth in the hundreds of terabits per second range. Our estimate of 200 Tbps is reasonable.

Notice what happened: My first estimate was wrong, but the framework helped me catch it. I sanity-checked, found an error in my concurrent user assumption, and corrected it. This iterative process is the whole point.

The Powers of 2 Cheat Sheet

These come up constantly in capacity estimation:

12^10 = 1,024          ≈ 1 Thousand (1 KB)
22^20 = 1,048,576      ≈ 1 Million  (1 MB)
32^30 = 1,073,741,824  ≈ 1 Billion  (1 GB)
42^40 =                ≈ 1 Trillion (1 TB)
5
6Daily seconds:   86,400     ≈ 10^5
7Monthly seconds: 2,592,000  ≈ 2.5 × 10^6
8Yearly seconds:  31,536,000 ≈ 3 × 10^7

Quick QPS conversions:

11 million requests/day    = ~12 QPS
210 million requests/day   = ~120 QPS
3100 million requests/day  = ~1,200 QPS
41 billion requests/day    = ~12,000 QPS

Memorize these conversions. They let you go from "we have X million daily users" to "we need Y QPS" in seconds.

Common Estimation Mistakes

Mistake 1: Confusing Peak with Average

Your system needs to handle peak load, not average load. Peak is typically 2-10x average, depending on the application.

1E-commerce: Peak (Black Friday) = 10x average
2Social media: Peak (evening) = 3x average
3B2B SaaS: Peak (Monday morning) = 2x average

Always design for peak. Always estimate average first, then multiply.

Mistake 2: Forgetting the 80/20 Rule

80% of traffic typically hits 20% of your data. This has massive implications for caching:

1Total data: 10 TB
2Hot data (20%): 2 TB
3If 2 TB fits in cache → 80% cache hit rate
4Your database only handles 20% of read traffic

Mistake 3: Ignoring Write Amplification

When you write 1 KB to a database, the actual disk I/O is much higher:

Rule of thumb: Actual disk writes are 10-30x the logical write size for LSM-tree databases.

Mistake 4: Assuming Linear Scaling

Doubling your servers doesn't double your capacity. Coordination overhead, network bottlenecks, and shared resources mean you typically get 70-80% efficiency at scale.

11 server:    10,000 QPS
22 servers:   18,000 QPS (not 20,000)
310 servers:  70,000 QPS (not 100,000)
4100 servers: 500,000 QPS (not 1,000,000)

The Meta-Skill: Thinking in Orders of Magnitude

The real power of back-of-the-envelope calculations isn't producing exact numbers. It's developing intuition about scale.

When someone proposes a design, you should instantly know:

This intuition prevents the most expensive mistake in engineering: building the wrong thing at the wrong scale.

A system designed for 10x your actual needs is fine (growth headroom). A system designed for 1,000x your actual needs is a maintenance nightmare that costs 100x more than necessary. A system designed for 0.1x your actual needs falls over in production.

Get within 10x of the right answer, and you'll make the right architectural decisions. That's all back-of-the-envelope calculations need to do.


The numbers and frameworks in this post draw from Jeff Dean's famous latency numbers, Google's capacity planning methodologies, AWS's back-of-the-envelope estimation guides, and real-world system design interview patterns.

Up next

The Invisible Infrastructure: How Service Meshes Quietly Revolutionized Microservices

Service meshes moved networking logic out of your application code and into infrastructure. Here's why that matters, how sidecars work, and when you actually need one.