2024-09-08
System design is about making decisions on how to structure an application so that it can handle growth, remain reliable, and stay maintainable. Whether we are building a small side project or a large production system, understanding the fundamentals helps us make better architectural choices.
Scalability is the ability of a system to handle increasing load. There are two main approaches.
Vertical scaling means adding more resources to a single machine, like more CPU, memory, or storage. It is simple but has a ceiling. There is a limit to how powerful a single server can be.
Horizontal scaling means adding more machines and distributing the load across them. This is how most large systems scale. It introduces complexity around data consistency and coordination, but it allows a system to grow well beyond what a single machine can handle.
When we have multiple servers, a load balancer sits in front of them and distributes incoming requests. Common strategies include round-robin, where requests are distributed evenly in order, and least connections, where requests go to the server with the fewest active connections.
Load balancers also handle health checks. If a server becomes unresponsive, the load balancer stops sending traffic to it until it recovers.
Caching stores frequently accessed data in a fast storage layer, usually in memory, to reduce the load on the database. A cache like Redis sits between the application and the database. When data is requested, the application checks the cache first. If the data is there (a cache hit), it is returned immediately. If not (a cache miss), the application fetches it from the database and stores it in the cache for next time.
The tricky part of caching is invalidation. When the underlying data changes, the cache needs to be updated or cleared. Common strategies include time-based expiration (TTL), where cached data expires after a set duration, and write-through, where the cache is updated whenever the data is written.
Choosing the right database depends on the data model and access patterns. Relational databases like PostgreSQL are a strong default for structured data with relationships. Document databases like MongoDB work well when the data is hierarchical or the schema varies between records.
For high read throughput, read replicas can distribute query load. For high write throughput, sharding splits data across multiple database instances based on a partition key.
When parts of a system do not need to respond immediately, a message queue can decouple them. Instead of service A calling service B directly, service A puts a message on a queue, and service B processes it when ready. This improves resilience because if service B is temporarily down, the messages wait in the queue instead of being lost.
Tools like RabbitMQ and Amazon SQS are commonly used for this. Message queues are especially useful for tasks like sending emails, processing images, or generating reports, where the user does not need to wait for the result.
Every design decision involves trade-offs. Adding a cache improves read performance but introduces consistency challenges. Horizontal scaling improves capacity but adds operational complexity. A message queue improves resilience but adds latency. Understanding these trade-offs and choosing based on the actual requirements of the system is the core skill of system design.