📡 You're offline — showing cached content
New version available!
Quick Access
PHP Advanced

Database Sharding: Horizontal vs Vertical Partitioning Explained

Understand database sharding — vertical vs horizontal partitioning, shard key selection, consistent hashing, hot shards, and resharding.

EzyCoders Admin December 3, 2025 12 min read 0 views
Database Sharding Horizontal Vertical Guide
Share: Twitter LinkedIn WhatsApp

Database Sharding

When a single database server can no longer handle the load, sharding distributes data across multiple servers. It is the most complex scalability technique for databases — understanding it conceptually is required for senior system design interviews.

Types of Partitioning:

Vertical Sharding: split by table/feature
  DB1: users, sessions, auth
  DB2: posts, comments, likes
  DB3: orders, payments, invoices
  Easy but limited — eventually each DB grows too large

Horizontal Sharding: split rows of same table
  Shard 1: users where id % 4 = 0  (users 4, 8, 12...)
  Shard 2: users where id % 4 = 1  (users 1, 5, 9...)
  Shard 3: users where id % 4 = 2  (users 2, 6, 10...)
  Shard 4: users where id % 4 = 3  (users 3, 7, 11...)

Shard Key Selection

import hashlib

class ShardRouter:
    def __init__(self, shard_count: int, shards: list):
        self.shard_count = shard_count
        self.shards      = shards  # list of DB connections

    def get_shard(self, key: str) -> object:
        # Consistent hashing: distribute evenly
        hash_val = int(hashlib.md5(str(key).encode()).hexdigest(), 16)
        shard_id = hash_val % self.shard_count
        return self.shards[shard_id]

router = ShardRouter(4, [db1, db2, db3, db4])

# All of user 42's data goes to same shard
shard = router.get_shard(user_id=42)
shard.query('INSERT INTO posts (user_id, title) VALUES (?, ?)', [42, 'Guide'])

Problems with Sharding

1. Cross-shard queries are expensive
   SELECT * FROM posts ORDER BY views DESC LIMIT 10
   → Must query ALL shards, merge results in app

2. Cross-shard transactions are complex
   → Requires distributed transactions (2PC) or eventual consistency

3. Hot shards (uneven distribution)
   → Some users are 100x more active than others
   → Solution: consistent hashing with virtual nodes

4. Resharding is painful
   → Adding a 5th shard means moving ~20% of data
   → Solution: consistent hashing minimizes data movement

5. JOINs across shards
   → Cannot JOIN posts table (shard 1) with users table (shard 3)
   → Solution: denormalize, or route to same shard by user_id

Q: When should you shard vs other scaling strategies?

Try these first (much simpler): add read replicas, add Redis caching, optimize queries and indexes, use connection pooling, upgrade server hardware. Shard only when a single database cannot handle write throughput even with optimization, or when data volume exceeds what one server can store (typically 5-10 TB+). Twitter, Facebook, and WhatsApp all shard — but only after exhausting simpler options.

EzyCoders Admin
Written by
EzyCoders Admin

Team Lead and Full-Stack Developer with experience in PHP, JavaScript, SQL, DSA, and System Design. Passionate about software engineering, scalable web technologies, and helping developers prepare for coding interviews and tech careers through practical tutorials and professional guidance.

Comments (0)

No comments yet. Be the first!

Leave a Comment