System Design Primer
System Design Primer
Learn how to design large-scale systems. Prep for the system design interview.
Donne Martin
Contents
Design a key-value cache to save the results of the most recent web server queries 149
Step 1: Outline use cases and constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
Step 2: Create a high level design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
Step 3: Design core components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
Step 4: Scale the design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
Additional talking points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
Contents
English1 � ���2 � ����3 � ����4 | ��������5 � �����6 � Português do Brasil7 � Deutsch8 � ��������9 � �����10 � Italiano11 � ���12 � �����13 �
Polski14 � ������� ����15 � Español16 � �������17 � Türkçe18 � tiếng Việt19 � Français20 | Add Translation21
Help translate22 this guide!
1
README.md
2
README-ja.md
3
README-zh-Hans.md
4
README-zh-TW.md
5
https://github.com/donnemartin/system-design-primer/issues/170
6
https://github.com/donnemartin/system-design-primer/issues/220
7
https://github.com/donnemartin/system-design-primer/issues/40
8
https://github.com/donnemartin/system-design-primer/issues/186
9
https://github.com/donnemartin/system-design-primer/issues/130
10
https://github.com/donnemartin/system-design-primer/issues/272
11
https://github.com/donnemartin/system-design-primer/issues/104
12
https://github.com/donnemartin/system-design-primer/issues/102
13
https://github.com/donnemartin/system-design-primer/issues/110
14
https://github.com/donnemartin/system-design-primer/issues/68
15
https://github.com/donnemartin/system-design-primer/issues/87
16
https://github.com/donnemartin/system-design-primer/issues/136
17
https://github.com/donnemartin/system-design-primer/issues/187
18
https://github.com/donnemartin/system-design-primer/issues/39
19
https://github.com/donnemartin/system-design-primer/issues/127
20
https://github.com/donnemartin/system-design-primer/issues/250
21
https://github.com/donnemartin/system-design-primer/issues/28
22
TRANSLATIONS.md
The System Design Primer
Motivation
Learning how to design scalable systems will help you become a better engineer.
System design is a broad topic. There is a vast amount of resources scattered throughout the web on system
design principles.
This repo is an organized collection of resources to help you learn how to build systems at scale.
In addition to coding interviews, system design is a required component of the technical interview process at
many tech companies.
Practice common system design interview questions and compare your results with sample solutions:
discussions, code, and diagrams.
Additional topics for interview prep:
• Study guide
• How to approach a system design interview question
• System design interview questions, with solutions
• Object-oriented design interview questions, with solutions
• Additional system design interview questions
Anki flashcards
The provided Anki flashcard decks1 use spaced repetition to help you retain key system design concepts.
Looking for resources to help you prep for the Coding Interview5 ?
Check out the sister repo Interactive Coding Challenges6 , which contains an additional Anki deck:
• Coding deck7
Contributing
• Fix errors
• Improve sections
• Add new sections
• Translate8
Summaries of various system design topics, including pros and cons. Everything is a trade-off.
Each section contains links to more in-depth resources.
• Application layer
• Microservices
• Service discovery
• Database
• Relational database management system (RDBMS)
• Master-slave replication
• Master-master replication
• Federation
• Sharding
• Denormalization
• SQL tuning
• NoSQL
• Key-value store
• Document store
• Wide column store
• Graph Database
• SQL or NoSQL
• Cache
• Client caching
• CDN caching
• Web server caching
• Database caching
• Application caching
• Caching at the database query level
• Caching at the object level
• When to update the cache
• Cache-aside
• Write-through
• Write-behind (write-back)
• Refresh-ahead
• Asynchronism
• Message queues
• Task queues
• Back pressure
• Communication
• Transmission control protocol (TCP)
• User datagram protocol (UDP)
• Remote procedure call (RPC)
• Representational state transfer (REST)
• Security
• Appendix
• Powers of two table
• Latency numbers every programmer should know
• Additional system design interview questions
• Real world architectures
• Company architectures
• Company engineering blogs
• Under development
• Credits
• Contact info
• License
The System Design Primer
Study guide
Suggested topics to review based on your interview timeline (short, medium, long).
More experienced candidates are generally expected to know more about system design. Architects or team leads
might be expected to know more than individual contributors. Top tech companies are likely to have one or more
design interview rounds.
Start broad and go deeper in a few areas. It helps to know a little about various key system design topics. Adjust the
following guide based on your timeline, experience, what positions you are interviewing for, and which companies you
are interviewing with.
How to approach a system design interview question
• Short timeline - Aim for breadth with system design topics. Practice by solving some interview questions.
• Medium timeline - Aim for breadth and some depth with system design topics. Practice by solving many
interview questions.
• Long timeline - Aim for breadth and more depth with system design topics. Practice by solving most
interview questions.
The system design interview is an open-ended conversation. You are expected to lead it.
You can use the following steps to guide the discussion. To help solidify this process, work through the System design
interview questions with solutions section using the following steps.
Gather requirements and scope the problem. Ask questions to clarify use cases and constraints. Discuss assumptions.
Dive into details for each core component. For example, if you were asked to design a url shortening service10 , discuss:
• Database lookup
Identify and address bottlenecks, given the constraints. For example, do you need the following to address scalability
issues?
• Load balancer
• Horizontal scaling
• Caching
• Database sharding
Discuss potential solutions and trade-offs. Everything is a trade-off. Address bottlenecks using principles of scalable
system design.
Back-of-the-envelope calculations
You might be asked to do some estimates by hand. Refer to the Appendix for the following resources:
Check out the following links to get a better idea of what to expect:
Common system design interview questions with sample discussions, code, and diagrams.
Solutions linked to content in the solutions/ folder.
Question
Design Pastebin.com (or Bit.ly) Solution18
Design the Twitter timeline and search (or Facebook feed Solution19
and search)
Design a web crawler Solution20
Design Mint.com Solution21
Design the data structures for a social network Solution22
Design a key-value store for a search engine Solution23
Design Amazon’s sales ranking by category feature Solution24
Design a system that scales to millions of users on AWS Solution25
Add a system design question Contribute
Design the Twitter timeline and search (or Facebook feed and search)
Figure 5: Scaled design of the Twitter timeline and search (or Facebook feed and search)
The System Design Primer
Design Mint.com
Common object-oriented design interview questions with sample discussions, code, and diagrams.
Question
Design a hash map Solution34
Design a least recently used cache Solution35
Design a call center Solution36
Design a deck of cards Solution37
Design a parking lot Solution38
Design a chat server Solution39
Design a circular array Contribute
Add an object-oriented design question Contribute
First, you’ll need a basic understanding of common principles, learning about what they are, how they are used, and
their pros and cons.
• Topics covered:
• Vertical scaling
• Horizontal scaling
• Caching
• Load balancing
• Database replication
• Database partitioning
33
solutions/system_design/scaling_aws/README.md
34
solutions/object_oriented_design/hash_table/hash_map.ipynb
35
solutions/object_oriented_design/lru_cache/lru_cache.ipynb
36
solutions/object_oriented_design/call_center/call_center.ipynb
37
solutions/object_oriented_design/deck_of_cards/deck_of_cards.ipynb
38
solutions/object_oriented_design/parking_lot/parking_lot.ipynb
39
solutions/object_oriented_design/online_chat/online_chat.ipynb
40
https://www.youtube.com/watch?v=-W9F__D3oY4
The System Design Primer
Figure 11: Scaled design of a system that scales to millions of users on AWS
Performance vs scalability
Scalability41
• Topics covered:
• Clones42
• Databases43
• Caches44
• Asynchronism45
Next steps
• Performance vs scalability
• Latency vs throughput
• Availability vs consistency
Then we’ll dive into more specific topics such as DNS, CDNs, and load balancers.
Performance vs scalability
A service is scalable if it results in increased performance in a manner proportional to resources added. Generally,
increasing performance means serving more units of work, but it can also be to handle larger units of work, such as
when datasets grow.1
• If you have a performance problem, your system is slow for a single user.
• If you have a scalability problem, your system is fast for a single user but slow under heavy load.
• A word on scalability46
• Scalability, availability, stability, patterns47
Latency vs throughput
Generally, you should aim for maximal throughput with acceptable latency.
41
http://www.lecloud.net/tagged/scalability/chrono
42
http://www.lecloud.net/post/7295452622/scalability-for-dummies-part-1-clones
43
http://www.lecloud.net/post/7994751381/scalability-for-dummies-part-2-database
44
http://www.lecloud.net/post/9246290032/scalability-for-dummies-part-3-cache
45
http://www.lecloud.net/post/9699762917/scalability-for-dummies-part-4-asynchronism
46
http://www.allthingsdistributed.com/2006/03/a_word_on_scalability.html
47
http://www.slideshare.net/jboner/scalability-availability-stability-patterns/
The System Design Primer
Availability vs consistency
CAP theorem
In a distributed computer system, you can only support two of the following guarantees:
Networks aren’t reliable, so you’ll need to support partition tolerance. You’ll need to make a software tradeoff between
consistency and availability.
Waiting for a response from the partitioned node might result in a timeout error. CP is a good choice if your business
needs require atomic reads and writes.
Responses return the most readily available version of the data available on any node, which might not be the latest.
Writes might take some time to propagate when the partition is resolved.
AP is a good choice if the business needs to allow for eventual consistency or when the system needs to continue
working despite external errors.
48
https://community.cadence.com/cadence_blogs_8/b/fv/posts/understanding-latency-vs-throughput
49
https://robertgreiner.com/cap-theorem-revisited
Consistency patterns
Consistency patterns
With multiple copies of the same data, we are faced with options on how to synchronize them so clients have a
consistent view of the data. Recall the definition of consistency from the CAP theorem - Every read receives the most
recent write or an error.
Weak consistency
After a write, reads may or may not see it. A best effort approach is taken.
This approach is seen in systems such as memcached. Weak consistency works well in real time use cases such as
VoIP, video chat, and realtime multiplayer games. For example, if you are on a phone call and lose reception for a few
seconds, when you regain connection you do not hear what was spoken during connection loss.
Eventual consistency
After a write, reads will eventually see it (typically within milliseconds). Data is replicated asynchronously.
This approach is seen in systems such as DNS and email. Eventual consistency works well in highly available systems.
Strong consistency
This approach is seen in file systems and RDBMSes. Strong consistency works well in systems that need transactions.
Availability patterns
There are two complementary patterns to support high availability: fail-over and replication.
50
http://robertgreiner.com/2014/08/cap-theorem-revisited/
51
http://ksat.me/a-plain-english-introduction-to-cap-theorem
52
https://github.com/henryr/cap-faq
53
https://www.youtube.com/watch?v=k-Yaq8AHlFA
54
http://snarfed.org/transactions_across_datacenters_io.html
The System Design Primer
Fail-over
Active-passive
With active-passive fail-over, heartbeats are sent between the active and the passive server on standby. If the heartbeat
is interrupted, the passive server takes over the active’s IP address and resumes service.
The length of downtime is determined by whether the passive server is already running in ‘hot’ standby or whether it
needs to start up from ‘cold’ standby. Only the active server handles traffic.
Active-passive failover can also be referred to as master-slave failover.
Active-active
In active-active, both servers are managing traffic, spreading the load between them.
If the servers are public-facing, the DNS would need to know about the public IPs of both servers. If the servers are
internal-facing, application logic would need to know about both servers.
Active-active failover can also be referred to as master-master failover.
Disadvantage(s): failover
Replication
• Master-slave replication
• Master-master replication
Availability in numbers
Availability is often quantified by uptime (or downtime) as a percentage of time the service is available. Availability is
generally measured in number of 9s–a service with 99.99% availability is described as having four 9s.
If a service consists of multiple components prone to failure, the service’s overall availability depends on whether the
components are in sequence or in parallel.
In sequence Overall availability decreases when two components with availability < 100% are in sequence:
If both Foo and Bar each had 99.9% availability, their total availability in sequence would be 99.8%.
In parallel Overall availability increases when two components with availability < 100% are in parallel:
If both Foo and Bar each had 99.9% availability, their total availability in parallel would be 99.9999%.
The System Design Primer
• NS record (name server) - Specifies the DNS servers for your ___domain/subdomain.
• MX record (mail exchange) - Specifies the mail servers for accepting messages.
• A record (address) - Points a name to an IP address.
• CNAME (canonical) - Points a name to another name or CNAME (example.com to www.example.com) or to
an A record.
Services such as CloudFlare57 and Route 5358 provide managed DNS services. Some DNS services can route traffic
through various methods:
• Latency-based60
• Geolocation-based61
Disadvantage(s): DNS
• Accessing a DNS server introduces a slight delay, although mitigated by caching described above.
• DNS server management could be complex and is generally managed by governments, ISPs, and large compa-
nies62 .
• DNS services have recently come under DDoS attack63 , preventing users from accessing websites such as Twitter
without knowing Twitter’s IP address(es).
• DNS architecture64
• Wikipedia65
• DNS articles66
A content delivery network (CDN) is a globally distributed network of proxy servers, serving content from locations
closer to the user. Generally, static files such as HTML/CSS/JS, photos, and videos are served from CDN, although
60
https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/routing-policy.html#routing-policy-latency
61
https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/routing-policy.html#routing-policy-geo
62
http://superuser.com/questions/472695/who-controls-the-dns-servers/472729
63
http://dyn.com/blog/dyn-analysis-summary-of-friday-october-21-attack/
64
https://technet.microsoft.com/en-us/library/dd197427(v=ws.10).aspx
65
https://en.wikipedia.org/wiki/Domain_Name_System
66
https://support.dnsimple.com/categories/dns/
67
https://www.creative-artworks.eu/why-use-a-content-delivery-network-cdn/
The System Design Primer
some CDNs such as Amazon’s CloudFront support dynamic content. The site’s DNS resolution will tell clients which
server to contact.
Serving content from CDNs can significantly improve performance in two ways:
Push CDNs
Push CDNs receive new content whenever changes occur on your server. You take full responsibility for providing
content, uploading directly to the CDN and rewriting URLs to point to the CDN. You can configure when content
expires and when it is updated. Content is uploaded only when it is new or changed, minimizing traffic, but maximizing
storage.
Sites with a small amount of traffic or sites with content that isn’t often updated work well with push CDNs. Content
is placed on the CDNs once, instead of being re-pulled at regular intervals.
Pull CDNs
Pull CDNs grab new content from your server when the first user requests the content. You leave the content on
your server and rewrite URLs to point to the CDN. This results in a slower request until the content is cached on the
CDN.
A time-to-live (TTL)68 determines how long content is cached. Pull CDNs minimize storage space on the CDN, but
can create redundant traffic if files expire and are pulled before they have actually changed.
Sites with heavy traffic work well with pull CDNs, as traffic is spread out more evenly with only recently-requested
content remaining on the CDN.
Disadvantage(s): CDN
• CDN costs could be significant depending on traffic, although this should be weighed with additional costs you
would incur not using a CDN.
• Content might be stale if it is updated before the TTL expires it.
• CDNs require changing URLs for static content to point to the CDN.
Load balancer
Source: Scal-
able system design patterns72
Load balancers distribute incoming client requests to computing resources such as application servers and databases.
In each case, the load balancer returns the response from the computing resource to the appropriate client. Load
balancers are effective at:
Load balancers can be implemented with hardware (expensive) or with software such as HAProxy.
Additional benefits include:
• SSL termination - Decrypt incoming requests and encrypt server responses so backend servers do not have to
perform these potentially expensive operations
• Removes the need to install X.509 certificates73 on each server
• Session persistence - Issue cookies and route a specific client’s requests to same instance if the web apps do
not keep track of sessions
To protect against failures, it’s common to set up multiple load balancers, either in active-passive or active-active
mode.
Load balancers can route traffic based on various metrics, including:
• Random
• Least loaded
• Session/cookies
• Round robin or weighted round robin74
72
https://horicky.blogspot.com/2010/10/scalable-system-design-patterns.html
73
https://en.wikipedia.org/wiki/X.509
74
https://www.g33kinfo.com/info/round-robin-vs-weighted-round-robin-lb
The System Design Primer
• Layer 4
• Layer 7
Layer 4 load balancers look at info at the transport layer to decide how to distribute requests. Generally, this involves
the source, destination IP addresses, and ports in the header, but not the contents of the packet. Layer 4 load balancers
forward network packets to and from the upstream server, performing Network Address Translation (NAT)75 .
Layer 7 load balancers look at the application layer to decide how to distribute requests. This can involve contents
of the header, message, and cookies. Layer 7 load balancers terminate network traffic, reads the message, makes a
load-balancing decision, then opens a connection to the selected server. For example, a layer 7 load balancer can
direct video traffic to servers that host videos while directing more sensitive user billing traffic to security-hardened
servers.
At the cost of flexibility, layer 4 load balancing requires less time and computing resources than Layer 7, although the
performance impact can be minimal on modern commodity hardware.
Horizontal scaling
Load balancers can also help with horizontal scaling, improving performance and availability. Scaling out using
commodity machines is more cost efficient and results in higher availability than scaling up a single server on more
expensive hardware, called Vertical Scaling. It is also easier to hire for talent working on commodity hardware than
it is for specialized enterprise systems.
• Servers should be stateless: they should not contain any user-related data like sessions or profile pictures
• Sessions can be stored in a centralized data store such as a database (SQL, NoSQL) or a persistent cache
(Redis, Memcached)
• Downstream servers such as caches and databases need to handle more simultaneous connections as upstream
servers scale out
• The load balancer can become a performance bottleneck if it does not have enough resources or if it is not
configured properly.
• Introducing a load balancer to help eliminate a single point of failure results in increased complexity.
• A single load balancer is a single point of failure, configuring multiple load balancers further increases complexity.
75
https://www.nginx.com/resources/glossary/layer-4-load-balancing/
Reverse proxy (web server)
• NGINX architecture76
• HAProxy architecture guide77
• Scalability78
• Wikipedia79
• Layer 4 load balancing80
• Layer 7 load balancing81
• ELB listener config82
Source: Wikipedia83
A reverse proxy is a web server that centralizes internal services and provides unified interfaces to the public. Requests
from clients are forwarded to a server that can fulfill it before the reverse proxy returns the server’s response to the
client.
• Increased security - Hide information about backend servers, blacklist IPs, limit number of connections per
client
• Increased scalability and flexibility - Clients only see the reverse proxy’s IP, allowing you to scale servers
or change their configuration
• SSL termination - Decrypt incoming requests and encrypt server responses so backend servers do not have to
perform these potentially expensive operations
• HTML/CSS/JS
• Photos
• Videos
• Etc
76
https://www.nginx.com/blog/inside-nginx-how-we-designed-for-performance-scale/
77
http://www.haproxy.org/download/1.2/doc/architecture.txt
78
http://www.lecloud.net/post/7295452622/scalability-for-dummies-part-1-clones
79
https://en.wikipedia.org/wiki/Load_balancing_(computing)
80
https://www.nginx.com/resources/glossary/layer-4-load-balancing/
81
https://www.nginx.com/resources/glossary/layer-7-load-balancing/
82
http://docs.aws.amazon.com/elasticloadbalancing/latest/classic/elb-listener-config.html
83
https://upload.wikimedia.org/wikipedia/commons/6/67/Reverse_proxy_h2g2bob.svg
84
https://en.wikipedia.org/wiki/X.509
The System Design Primer
• Deploying a load balancer is useful when you have multiple servers. Often, load balancers route traffic to a set
of servers serving the same function.
• Reverse proxies can be useful even with just one web server or application server, opening up the benefits
described in the previous section.
• Solutions such as NGINX and HAProxy can support both layer 7 reverse proxying and load balancing.
Application layer
Separating out the web layer from the application layer (also known as platform layer) allows you to scale and
configure both layers independently. Adding a new API results in adding application servers without necessarily
adding additional web servers. The single responsibility principle advocates for small and autonomous services
that work together. Small teams with small services can plan more aggressively for rapid growth.
Microservices
Related to this discussion are microservices91 , which can be described as a suite of independently deployable, small,
modular services. Each service runs a unique process and communicates through a well-defined, lightweight mechanism
to serve a business goal. 1
Pinterest, for example, could have the following microservices: user profile, follower, feed, search, photo upload, etc.
Service Discovery
Systems such as Consul92 , Etcd93 , and Zookeeper94 can help services find each other by keeping track of registered
names, addresses, and ports. Health checks95 help verify service integrity and are often done using an HTTP endpoint.
Both Consul and Etcd have a built in key-value store that can be useful for storing config values and other shared
data.
• Adding an application layer with loosely coupled services requires a different approach from an architectural,
operations, and process viewpoint (vs a monolithic system).
• Microservices can add complexity in terms of deployments and operations.
91
https://en.wikipedia.org/wiki/Microservices
92
https://www.consul.io/docs/index.html
93
https://coreos.com/etcd/docs/latest
94
http://www.slideshare.net/sauravhaloi/introduction-to-apache-zookeeper
95
https://www.consul.io/intro/getting-started/checks.html
96
http://lethain.com/introduction-to-architecting-systems-for-scale
97
http://www.puncsky.com/blog/2016-02-13-crack-the-system-design-interview
98
https://en.wikipedia.org/wiki/Service-oriented_architecture
99
http://www.slideshare.net/sauravhaloi/introduction-to-apache-zookeeper
100
https://cloudncode.wordpress.com/2016/07/22/msa-getting-started/
The System Design Primer
Database
There are many techniques to scale a relational database: master-slave replication, master-master replication,
federation, sharding, denormalization, and SQL tuning.
Master-slave replication
The master serves reads and writes, replicating writes to one or more slaves, which serve only reads. Slaves can also
replicate to additional slaves in a tree-like fashion. If the master goes offline, the system can continue to operate in
read-only mode until a slave is promoted to a master or a new master is provisioned.
101
https://www.youtube.com/watch?v=kKjm4ehYiMs
102
https://en.wikipedia.org/wiki/Database_transaction
Database
Master-master replication
Both masters serve reads and writes and coordinate with each other on writes. If either master goes down, the system
can continue to operate with both reads and writes.
103
https://www.slideshare.net/jboner/scalability-availability-stability-patterns
The System Design Primer
• You’ll need a load balancer or you’ll need to make changes to your application logic to determine where to write.
• Most master-master systems are either loosely consistent (violating ACID) or have increased write latency due
to synchronization.
• Conflict resolution comes more into play as more write nodes are added and as latency increases.
• See Disadvantage(s): replication for points related to both master-slave and master-master.
Disadvantage(s): replication
• There is a potential for loss of data if the master fails before any newly written data can be replicated to other
nodes.
• Writes are replayed to the read replicas. If there are a lot of writes, the read replicas can get bogged down with
replaying writes and can’t do as many reads.
• The more read slaves, the more you have to replicate, which leads to greater replication lag.
• On some systems, writing to the master can spawn multiple threads to write in parallel, whereas read replicas
only support writing sequentially with a single thread.
• Replication adds more hardware and additional complexity.
Federation
Federation (or functional partitioning) splits up databases by function. For example, instead of a single, monolithic
database, you could have three databases: forums, users, and products, resulting in less read and write traffic to
each database and therefore less replication lag. Smaller databases result in more data that can fit in memory, which
in turn results in more cache hits due to improved cache locality. With no single central master serializing writes you
can write in parallel, increasing throughput.
Disadvantage(s): federation
107
https://www.youtube.com/watch?v=kKjm4ehYiMs
108
http://stackoverflow.com/questions/5145637/querying-data-by-joining-two-tables-in-two-database-on-different-servers
109
https://www.youtube.com/watch?v=kKjm4ehYiMs
The System Design Primer
Sharding
Sharding distributes data across different databases such that each database can only manage a subset of the data.
Taking a users database as an example, as the number of users increases, more shards are added to the cluster.
Similar to the advantages of federation, sharding results in less read and write traffic, less replication, and more
cache hits. Index size is also reduced, which generally improves performance with faster queries. If one shard goes
110
https://www.slideshare.net/jboner/scalability-availability-stability-patterns
Database
down, the other shards are still operational, although you’ll want to add some form of replication to avoid data loss.
Like federation, there is no single central master serializing writes, allowing you to write in parallel with increased
throughput.
Common ways to shard a table of users is either through the user’s last name initial or the user’s geographic ___location.
Disadvantage(s): sharding
• You’ll need to update your application logic to work with shards, which could result in complex SQL queries.
• Data distribution can become lopsided in a shard. For example, a set of power users on a shard could result in
increased load to that shard compared to others.
• Rebalancing adds additional complexity. A sharding function based on consistent hashing111 can reduce
the amount of transferred data.
Denormalization
Denormalization attempts to improve read performance at the expense of some write performance. Redundant copies
of the data are written in multiple tables to avoid expensive joins. Some RDBMS such as PostgreSQL115 and Oracle
support materialized views116 which handle the work of storing redundant information and keeping redundant copies
consistent.
Once data becomes distributed with techniques such as federation and sharding, managing joins across data centers
further increases complexity. Denormalization might circumvent the need for such complex joins.
In most systems, reads can heavily outnumber writes 100:1 or even 1000:1. A read resulting in a complex database
join can be very expensive, spending a significant amount of time on disk operations.
Disadvantage(s): denormalization
• Data is duplicated.
• Constraints can help redundant copies of information stay in sync, which increases complexity of the database
design.
• A denormalized database under heavy write load might perform worse than its normalized counterpart.
• Denormalization117
111
http://www.paperplanes.de/2011/12/9/the-magic-of-consistent-hashing.html
112
http://highscalability.com/blog/2009/8/6/an-unorthodox-approach-to-database-design-the-coming-of-the.html
113
https://en.wikipedia.org/wiki/Shard_(database_architecture)
114
http://www.paperplanes.de/2011/12/9/the-magic-of-consistent-hashing.html
115
https://en.wikipedia.org/wiki/PostgreSQL
116
https://en.wikipedia.org/wiki/Materialized_view
117
https://en.wikipedia.org/wiki/Denormalization
The System Design Primer
SQL tuning
SQL tuning is a broad topic and many books118 have been written as reference.
• CHAR effectively allows for fast, random access, whereas with VARCHAR, you must find the end of a string
before moving onto the next one.
• Use TEXT for large blocks of text such as blog posts. TEXT also allows for boolean searches. Using a TEXT field
results in storing a pointer on disk that is used to locate the text block.
• Use INT for larger numbers up to 2^32 or 4 billion.
• Use DECIMAL for currency to avoid floating point representation errors.
• Avoid storing large BLOBS, store the ___location of where to get the object instead.
• VARCHAR(255) is the largest number of characters that can be counted in an 8 bit number, often maximizing the
use of a byte in some RDBMS.
• Set the NOT NULL constraint where applicable to improve search performance121 .
• Columns that you are querying (SELECT, GROUP BY, ORDER BY, JOIN) could be faster with indices.
• Indices are usually represented as self-balancing B-tree122 that keeps data sorted and allows searches, sequential
access, insertions, and deletions in logarithmic time.
• Placing an index can keep the data in memory, requiring more space.
• Writes could also be slower since the index also needs to be updated.
• When loading large amounts of data, it might be faster to disable indices, load the data, then rebuild the indices.
Partition tables
• Break up a table by putting hot spots in a separate table to help keep it in memory.
NoSQL
NoSQL is a collection of data items represented in a key-value store, document store, wide column store, or
a graph database. Data is denormalized, and joins are generally done in the application code. Most NoSQL stores
lack true ACID transactions and favor eventual consistency.
BASE is often used to describe the properties of NoSQL databases. In comparison with the CAP Theorem, BASE
chooses availability over consistency.
In addition to choosing between SQL or NoSQL, it is helpful to understand which type of NoSQL database best fits
your use case(s). We’ll review key-value stores, document stores, wide column stores, and graph databases
in the next section.
Key-value store
A key-value store generally allows for O(1) reads and writes and is often backed by memory or SSD. Data stores can
maintain keys in lexicographic order129 , allowing efficient retrieval of key ranges. Key-value stores can allow for storing
of metadata with a value.
Key-value stores provide high performance and are often used for simple data models or for rapidly-changing data, such
as an in-memory cache layer. Since they offer only a limited set of operations, complexity is shifted to the application
layer if additional operations are needed.
A key-value store is the basis for more complex systems such as a document store, and in some cases, a graph
database.
• Key-value database130
• Disadvantages of key-value stores131
• Redis architecture132
• Memcached architecture133
125
http://aiddroid.com/10-tips-optimizing-mysql-queries-dont-suck/
126
http://stackoverflow.com/questions/1217466/is-there-a-good-reason-i-see-varchar255-used-so-often-as-opposed-to-another-l
127
http://stackoverflow.com/questions/1017239/how-do-null-values-affect-performance-in-a-database-search
128
http://dev.mysql.com/doc/refman/5.7/en/slow-query-log.html
129
https://en.wikipedia.org/wiki/Lexicographical_order
130
https://en.wikipedia.org/wiki/Key-value_database
131
http://stackoverflow.com/questions/4056093/what-are-the-disadvantages-of-using-a-key-value-table-over-nullable-columns-or
132
http://qnimate.com/overview-of-redis-architecture/
133
https://adayinthelifeof.nl/2011/02/06/memcache-internals/
The System Design Primer
Document store
A document store is centered around documents (XML, JSON, binary, etc), where a document stores all information
for a given object. Document stores provide APIs or a query language to query based on the internal structure of the
document itself. Note, many key-value stores include features for working with a value’s metadata, blurring the lines
between these two storage types.
Based on the underlying implementation, documents are organized by collections, tags, metadata, or directories.
Although documents can be organized or grouped together, documents may have fields that are completely different
from each other.
Some document stores like MongoDB134 and CouchDB135 also provide a SQL-like language to perform complex queries.
DynamoDB136 supports both key-values and documents.
Document stores provide high flexibility and are often used for working with occasionally changing data.
• Document-oriented database137
• MongoDB architecture138
• CouchDB architecture139
• Elasticsearch architecture140
A wide column store’s basic unit of data is a column (name/value pair). A column can be grouped in column families
(analogous to a SQL table). Super column families further group column families. You can access each column
independently with a row key, and columns with the same row key form a row. Each value contains a timestamp for
versioning and for conflict resolution.
Google introduced Bigtable142 as the first wide column store, which influenced the open-source HBase143 often-used in
the Hadoop ecosystem, and Cassandra144 from Facebook. Stores such as BigTable, HBase, and Cassandra maintain
keys in lexicographic order, allowing efficient retrieval of selective key ranges.
Wide column stores offer high availability and high scalability. They are often used for very large data sets.
142
http://www.read.seas.harvard.edu/~kohler/class/cs239-w08/chang06bigtable.pdf
143
https://www.edureka.co/blog/hbase-architecture/
144
http://docs.datastax.com/en/cassandra/3.0/cassandra/architecture/archIntro.html
145
http://blog.grio.com/2015/11/sql-nosql-a-brief-history.html
146
http://www.read.seas.harvard.edu/~kohler/class/cs239-w08/chang06bigtable.pdf
147
https://www.edureka.co/blog/hbase-architecture/
148
http://docs.datastax.com/en/cassandra/3.0/cassandra/architecture/archIntro.html
The System Design Primer
Graph database
Abstraction: graph
In a graph database, each node is a record and each arc is a relationship between two nodes. Graph databases are
optimized to represent complex relationships with many foreign keys or many-to-many relationships.
Graphs databases offer high performance for data models with complex relationships, such as a social network. They
are relatively new and are not yet widely-used; it might be more difficult to find development tools and resources.
Many graphs can only be accessed with REST APIs.
SQL or NoSQL
• Structured data
• Strict schema
• Relational data
• Need for complex joins
• Transactions
• Clear patterns for scaling
• More established: developers, community, code, tools, etc
• Lookups by index are very fast
• Semi-structured data
• Dynamic or flexible schema
• Non-relational data
• No need for complex joins
• Store many TB (or PB) of data
• Very data intensive workload
• Very high throughput for IOPS
Cache
Source: Scal-
able system design patterns161
Caching improves page load times and can reduce the load on your servers and databases. In this model, the dispatcher
will first lookup if the request has been made before and try to find the previous result to return, in order to save the
actual execution.
Databases often benefit from a uniform distribution of reads and writes across its partitions. Popular items can skew
the distribution, causing bottlenecks. Putting a cache in front of a database can help absorb uneven loads and spikes
in traffic.
Client caching
Caches can be located on the client side (OS or browser), server side, or in a distinct cache layer.
CDN caching
Reverse proxies and caches such as Varnish162 can serve static and dynamic content directly. Web servers can also
cache requests, returning responses without having to contact application servers.
Database caching
Your database usually includes some level of caching in a default configuration, optimized for a generic use case.
Tweaking these settings for specific usage patterns can further boost performance.
Application caching
In-memory caches such as Memcached and Redis are key-value stores between your application and your data storage.
Since the data is held in RAM, it is much faster than typical databases where data is stored on disk. RAM is more
limited than disk, so cache invalidation163 algorithms such as least recently used (LRU)164 can help invalidate ‘cold’
entries and keep ‘hot’ data in RAM.
Redis has the following additional features:
• Persistence option
• Built-in data structures such as sorted sets and lists
There are multiple levels you can cache that fall into two general categories: database queries and objects:
• Row level
• Query-level
• Fully-formed serializable objects
• Fully-rendered HTML
Generally, you should try to avoid file-based caching, as it makes cloning and auto-scaling more difficult.
Whenever you query the database, hash the query as a key and store the result to the cache. This approach suffers
from expiration issues:
See your data as an object, similar to what you do with your application code. Have your application assemble the
dataset from the database into a class instance or a data structure(s):
• Remove the object from cache if its underlying data has changed
• Allows for asynchronous processing: workers assemble objects by consuming the latest cached object
• User sessions
• Fully rendered web pages
• Activity streams
• User graph data
162
https://www.varnish-cache.org/
163
https://en.wikipedia.org/wiki/Cache_algorithms
164
https://en.wikipedia.org/wiki/Cache_replacement_policies#Least_recently_used_(LRU)
The System Design Primer
Since you can only store a limited amount of data in cache, you’ll need to determine which cache update strategy
works best for your use case.
Cache-aside
The application is responsible for reading and writing from storage. The cache does not interact with storage directly.
The application does the following:
Subsequent reads of data added to cache are fast. Cache-aside is also referred to as lazy loading. Only requested data
is cached, which avoids filling up the cache with data that isn’t requested.
Disadvantage(s): cache-aside
• Each cache miss results in three trips, which can cause a noticeable delay.
• Data can become stale if it is updated in the database. This issue is mitigated by setting a time-to-live (TTL)
which forces an update of the cache entry, or by using write-through.
• When a node fails, it is replaced by a new, empty node, increasing latency.
165
https://www.slideshare.net/tmatyashovsky/from-cache-to-in-memory-data-grid-introduction-to-hazelcast
166
https://memcached.org/
Cache
The System Design Primer
Write-through
Source:
167
Scalability, availability, stability, patterns
167
https://www.slideshare.net/jboner/scalability-availability-stability-patterns
Cache
The application uses the cache as the main data store, reading and writing data to it, while the cache is responsible
for reading and writing to the database:
Application code:
set_user(12345, {"foo":"bar"})
Cache code:
Write-through is a slow overall operation due to the write operation, but subsequent reads of just written data are fast.
Users are generally more tolerant of latency when updating data than reading data. Data in the cache is not stale.
• When a new node is created due to failure or scaling, the new node will not cache entries until the entry is
updated in the database. Cache-aside in conjunction with write through can mitigate this issue.
• Most data written might never be read, which can be minimized with a TTL.
The System Design Primer
Write-behind (write-back)
Disadvantage(s): write-behind
• There could be data loss if the cache goes down prior to its contents hitting the data store.
• It is more complex to implement write-behind than it is to implement cache-aside or write-through.
168
https://www.slideshare.net/jboner/scalability-availability-stability-patterns
Cache
Refresh-ahead
You can configure the cache to automatically refresh any recently accessed cache entry prior to its expiration.
Refresh-ahead can result in reduced latency vs read-through if the cache can accurately predict which items are likely
to be needed in the future.
Disadvantage(s): refresh-ahead
• Not accurately predicting which items are likely to be needed in the future can result in reduced performance
than without refresh-ahead.
Disadvantage(s): cache
• Need to maintain consistency between caches and the source of truth such as the database through cache
invalidation170 .
• Cache invalidation is a difficult problem, there is additional complexity associated with when to update the
cache.
• Need to make application changes such as adding Redis or memcached.
Asynchronism
Message queues
Message queues receive, hold, and deliver messages. If an operation is too slow to perform inline, you can use a message
queue with the following workflow:
• An application publishes a job to the queue, then notifies the user of job status
• A worker picks up the job from the queue, processes it, then signals the job is complete
The user is not blocked and the job is processed in the background. During this time, the client might optionally do
a small amount of processing to make it seem like the task has completed. For example, if posting a tweet, the tweet
could be instantly posted to your timeline, but it could take some time before your tweet is actually delivered to all
of your followers.
Redis179 is useful as a simple message broker but messages can be lost.
RabbitMQ180 is popular but requires you to adapt to the ‘AMQP’ protocol and manage your own nodes.
Amazon SQS181 is hosted but can have high latency and has the possibility of messages being delivered twice.
Task queues
Tasks queues receive tasks and their related data, runs them, then delivers their results. They can support scheduling
and can be used to run computationally-intensive jobs in the background.
Celery182 has support for scheduling and primarily has python support.
Back pressure
If queues start to grow significantly, the queue size can become larger than memory, resulting in cache misses, disk
reads, and even slower performance. Back pressure183 can help by limiting the queue size, thereby maintaining a
high throughput rate and good response times for jobs already in the queue. Once the queue fills up, clients get a
server busy or HTTP 503 status code to try again later. Clients can retry the request at a later time, perhaps with
exponential backoff184 .
178
https://lethain.com/introduction-to-architecting-systems-for-scale/#platform_layer
179
https://redis.io/
180
https://www.rabbitmq.com/
181
https://aws.amazon.com/sqs/
182
https://docs.celeryproject.org/en/stable/
183
http://mechanical-sympathy.blogspot.com/2012/05/apply-back-pressure-when-overloaded.html
184
https://en.wikipedia.org/wiki/Exponential_backoff
Asynchronism
Disadvantage(s): asynchronism
• Use cases such as inexpensive calculations and realtime workflows might be better suited for synchronous opera-
tions, as introducing queues can add delays and complexity.
185
https://www.youtube.com/watch?v=1KRYH75wgy4
186
http://mechanical-sympathy.blogspot.com/2012/05/apply-back-pressure-when-overloaded.html
187
https://en.wikipedia.org/wiki/Little%27s_law
188
https://www.quora.com/What-is-the-difference-between-a-message-queue-and-a-task-queue-Why-would-a-task-queue-require-a-
message-broker-like-RabbitMQ-Redis-Celery-or-IronMQ-to-function
The System Design Primer
Communication
HTTP is a method for encoding and transporting data between a client and a server. It is a request/response protocol:
clients issue requests and servers issue responses with relevant content and completion status info about the request.
HTTP is self-contained, allowing requests and responses to flow through many intermediate routers and servers that
perform load balancing, caching, encryption, and compression.
A basic HTTP request consists of a verb (method) and a resource (endpoint). Below are common HTTP verbs:
189
https://www.escotal.com/osilayer.html
Communication
• What is HTTP?190
• Difference between HTTP and TCP191
• Difference between PUT and PATCH192
Source: How
193
to make a multiplayer game
TCP is a connection-oriented protocol over an IP network194 . Connection is established and terminated using a
handshake195 . All packets sent are guaranteed to reach the destination in the original order and without corruption
through:
If the sender does not receive a correct response, it will resend the packets. If there are multiple timeouts, the
connection is dropped. TCP also implements flow control198 and congestion control199 . These guarantees cause delays
and generally result in less efficient transmission than UDP.
190
https://www.nginx.com/resources/glossary/http/
191
https://www.quora.com/What-is-the-difference-between-HTTP-protocol-and-TCP-protocol
192
https://laracasts.com/discuss/channels/general-discussion/whats-the-differences-between-put-and-patch?page=1
193
http://www.wildbunny.co.uk/blog/2012/10/09/how-to-make-a-multi-player-game-part-1
194
https://en.wikipedia.org/wiki/Internet_Protocol
195
https://en.wikipedia.org/wiki/Handshaking
196
https://en.wikipedia.org/wiki/Transmission_Control_Protocol#Checksum_computation
197
https://en.wikipedia.org/wiki/Acknowledgement_(data_networks)
198
https://en.wikipedia.org/wiki/Flow_control_(data)
199
https://en.wikipedia.org/wiki/Network_congestion#Congestion_control
The System Design Primer
To ensure high throughput, web servers can keep a large number of TCP connections open, resulting in high memory
usage. It can be expensive to have a large number of open connections between web server threads and say, a
memcached200 server. Connection pooling201 can help in addition to switching to UDP where applicable.
TCP is useful for applications that require high reliability but are less time critical. Some examples include web servers,
database info, SMTP, FTP, and SSH.
Use TCP over UDP when:
Source: How
to make a multiplayer game202
UDP is connectionless. Datagrams (analogous to packets) are guaranteed only at the datagram level. Datagrams
might reach their destination out of order or not at all. UDP does not support congestion control. Without the
guarantees that TCP support, UDP is generally more efficient.
UDP can broadcast, sending datagrams to all devices on the subnet. This is useful with DHCP203 because the client
has not yet received an IP address, thus preventing a way for TCP to stream without the IP address.
UDP is less reliable but works well in real time use cases such as VoIP, video chat, streaming, and realtime multiplayer
games.
Use UDP over TCP when:
• Client program - Calls the client stub procedure. The parameters are pushed onto the stack like a local
procedure call.
• Client stub procedure - Marshals (packs) procedure id and arguments into a request message.
• Client communication module - OS sends the message from the client to the server.
• Server communication module - OS passes the incoming packets to the server stub procedure.
• Server stub procedure - Unmarshalls the results, calls the server procedure matching the procedure id and
passes the given arguments.
• The server response repeats the steps above in reverse order.
GET /someoperation?data=anId
POST /anotheroperation
{
"data":"anId";
"anotherdata": "another value"
}
RPC is focused on exposing behaviors. RPCs are often used for performance reasons with internal communications,
as you can hand-craft native calls to better fit your use cases.
Choose a native library (aka SDK) when:
HTTP APIs following REST tend to be used more often for public APIs.
Disadvantage(s): RPC
REST is an architectural style enforcing a client/server model where the client acts on a set of resources managed
by the server. The server provides a representation of resources and actions that can either manipulate or get a new
representation of resources. All communication must be stateless and cacheable.
There are four qualities of a RESTful interface:
• Identify resources (URI in HTTP) - use the same URI regardless of any operation.
• Change with representations (Verbs in HTTP) - use verbs, headers, and body.
• Self-descriptive error message (status response in HTTP) - Use status codes, don’t reinvent the wheel.
• HATEOAS216 (HTML interface for HTTP) - your web service should be fully accessible in a browser.
GET /someresources/anId
PUT /someresources/anId
{"anotherdata": "another value"}
REST is focused on exposing data. It minimizes the coupling between client/server and is often used for public HTTP
APIs. REST uses a more generic and uniform method of exposing resources through URIs, representation through
headers217 , and actions through verbs such as GET, POST, PUT, DELETE, and PATCH. Being stateless, REST is
great for horizontal scaling and partitioning.
Disadvantage(s): REST
• With REST being focused on exposing data, it might not be a good fit if resources are not naturally organized
or accessed in a simple hierarchy. For example, returning all updated records from the past hour matching a
particular set of events is not easily expressed as a path. With REST, it is likely to be implemented with a
combination of URI path, query parameters, and possibly the request body.
• REST typically relies on a few verbs (GET, POST, PUT, DELETE, and PATCH) which sometimes doesn’t fit
your use case. For example, moving expired documents to the archive folder might not cleanly fit within these
verbs.
• Fetching complicated resources with nested hierarchies requires multiple round trips between the client and
server to render single views, e.g. fetching content of a blog entry and the comments on that entry. For mobile
applications operating in variable network conditions, these multiple roundtrips are highly undesirable.
• Over time, more fields might be added to an API response and older clients will receive all new data fields, even
those that they do not need, as a result, it bloats the payload size and leads to larger latencies.
214
http://etherealbits.com/2012/12/debunking-the-myths-of-rpc-rest/
215
http://www.squid-cache.org/
216
http://restcookbook.com/Basics/hateoas/
217
https://github.com/for-GET/know-your-http-well/blob/master/headers.md
Security
Source: Do you really know why you prefer REST over RPC
Security
Security is a broad topic. Unless you have considerable experience, a security background, or are applying for a
position that requires knowledge of security, you probably won’t need to know more than the basics:
Appendix
You’ll sometimes be asked to do ‘back-of-the-envelope’ estimates. For example, you might need to determine how long
it will take to generate 100 image thumbnails from disk or how much memory a data structure will take. The Powers
of two table and Latency numbers every programmer should know are handy references.
• Powers of two232
229
https://github.com/shieldfy/API-Security-Checklist
230
https://github.com/FallibleInc/security-guide-for-developers
231
https://www.owasp.org/index.php/OWASP_Top_Ten_Cheat_Sheet
232
https://en.wikipedia.org/wiki/Power_of_two
Appendix
Notes
-----
1 ns = 10^-9 seconds
1 us = 10^-6 seconds = 1,000 ns
1 ms = 10^-3 seconds = 1,000 us = 1,000,000 ns
Common system design interview questions, with links to resources on how to solve each.
233
https://gist.github.com/jboner/2841832
234
https://gist.github.com/hellerbarde/2843375
235
http://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf
236
https://static.googleusercontent.com/media/research.google.com/en//people/jeff/stanford-295-talk.pdf
The System Design Primer
Question Reference(s)
Design a file sync service like Dropbox youtube.com237
Design a search engine like Google queue.acm.org238 stackexchange.com239 ardendertat.com240 stan
Design a scalable web crawler like Google quora.com242
Design Google docs code.google.com243 neil.fraser.name244
Design a key-value store like Redis slideshare.net245
Design a cache system like Memcached slideshare.net246
Design a recommendation system like Amazon’s hulu.com247 ijcai13.org248
Design a tinyurl system like Bitly n00tc0d3r.blogspot.com249
Design a chat app like WhatsApp highscalability.com250
Design a picture sharing system like Instagram highscalability.com251 highscalability.com252
Design the Facebook news feed function quora.com253 quora.com254 slideshare.net255
Design the Facebook timeline function facebook.com256 highscalability.com257
Design the Facebook chat function erlang-factory.com258 facebook.com259
Design a graph search function like Facebook’s facebook.com260 facebook.com261 facebook.com262
Design a content delivery network like CloudFlare figshare.com263
Design a trending topic system like Twitter’s michael-noll.com264 snikolov .wordpress.com265
Design a random ID generation system blog.twitter.com266 github.com267
Return the top k requests during a time interval cs.ucsb.edu268 wpi.edu269
Design a system that serves data from multiple data highscalability.com270
centers
Design an online multiplayer card game indieflashblog.com271 buildnewgames.com272
Design a garbage collection system stuffwithstuff.com273 washington.edu274
237
https://www.youtube.com/watch?v=PE4gwstWhmc
238
http://queue.acm.org/detail.cfm?id=988407
239
http://programmers.stackexchange.com/questions/38324/interview-question-how-would-you-implement-google-search
240
http://www.ardendertat.com/2012/01/11/implementing-search-engines/
241
http://infolab.stanford.edu/~backrub/google.html
242
https://www.quora.com/How-can-I-build-a-web-crawler-from-scratch
243
https://code.google.com/p/google-mobwrite/
244
https://neil.fraser.name/writing/sync/
245
http://www.slideshare.net/dvirsky/introduction-to-redis
246
http://www.slideshare.net/oemebamo/introduction-to-memcached
247
https://web.archive.org/web/20170406065247/http://tech.hulu.com/blog/2011/09/19/recommendation-system.html
248
http://ijcai13.org/files/tutorial_slides/td3.pdf
249
http://n00tc0d3r.blogspot.com/
250
http://highscalability.com/blog/2014/2/26/the-whatsapp-architecture-facebook-bought-for-19-billion.html
251
http://highscalability.com/flickr-architecture
252
http://highscalability.com/blog/2011/12/6/instagram-architecture-14-million-users-terabytes-of-photos.html
253
http://www.quora.com/What-are-best-practices-for-building-something-like-a-News-Feed
254
http://www.quora.com/Activity-Streams/What-are-the-scaling-issues-to-keep-in-mind-while-developing-a-social-network-feed
255
http://www.slideshare.net/danmckinley/etsy-activity-feeds-architecture
256
https://www.facebook.com/note.php?note_id=10150468255628920
257
http://highscalability.com/blog/2012/1/23/facebook-timeline-brought-to-you-by-the-power-of-denormaliza.html
258
http://www.erlang-factory.com/upload/presentations/31/EugeneLetuchy-ErlangatFacebook.pdf
259
https://www.facebook.com/note.php?note_id=14218138919&id=9445547199&index=0
260
https://www.facebook.com/notes/facebook-engineering/under-the-hood-building-out-the-infrastructure-for-graph-search/10151347573
598920
261
https://www.facebook.com/notes/facebook-engineering/under-the-hood-indexing-and-ranking-in-graph-search/10151361720763920
262
https://www.facebook.com/notes/facebook-engineering/under-the-hood-the-natural-language-interface-of-graph-search/1015143273304
8920
263
https://figshare.com/articles/Globally_distributed_content_delivery/6605972
264
http://www.michael-noll.com/blog/2013/01/18/implementing-real-time-trending-topics-in-storm/
265
http://snikolov.wordpress.com/2012/11/14/early-detection-of-twitter-trends/
266
https://blog.twitter.com/2010/announcing-snowflake
267
https://github.com/twitter/snowflake/
268
https://www.cs.ucsb.edu/sites/default/files/documents/2005-23.pdf
269
http://davis.wpi.edu/xmdv/docs/EDBT11-diyang.pdf
270
http://highscalability.com/blog/2009/8/24/how-google-serves-data-from-multiple-datacenters.html
271
https://web.archive.org/web/20180929181117/http://www.indieflashblog.com/how-to-create-an-asynchronous-multiplayer-game.html
272
http://buildnewgames.com/real-time-multiplayer/
273
http://journal.stuffwithstuff.com/2013/12/08/babys-first-garbage-collector/
274
http://courses.cs.washington.edu/courses/csep521/07wi/prj/rick.pdf
Appendix
Question Reference(s)
Design an API rate limiter https://stripe.com/blog/275
Design a Stock Exchange (like NASDAQ or Binance) Jane Street276 Golang Implementation277 Go
Implementation278
Add a system design question Contribute
• Identify shared principles, common technologies, and patterns within these articles
• Study what problems are solved by each component, where it works, where it doesn’t
• Review the lessons learned
275
https://stripe.com/blog/rate-limiters
276
https://youtu.be/b1e4t2k2KJY
277
https://around25.com/blog/building-a-trading-engine-for-a-crypto-exchange/
278
http://bhomnick.net/building-a-simple-limit-order-in-go/
279
https://www.infoq.com/presentations/Twitter-Timeline-Scalability
The System Design Primer
280
http://static.googleusercontent.com/media/research.google.com/zh-CN/us/archive/mapreduce-osdi04.pdf
281
http://www.slideshare.net/AGrishchenko/apache-spark-architecture
282
http://www.slideshare.net/previa/storm-16094009
283
http://www.read.seas.harvard.edu/~kohler/class/cs239-w08/chang06bigtable.pdf
284
http://www.slideshare.net/alexbaranau/intro-to-hbase
285
http://www.slideshare.net/planetcassandra/cassandra-introduction-features-30103666
286
http://www.read.seas.harvard.edu/~kohler/class/cs239-w08/decandia07dynamo.pdf
287
http://www.slideshare.net/mdirolf/introduction-to-mongodb
288
http://research.google.com/archive/spanner-osdi2012.pdf
289
http://www.slideshare.net/oemebamo/introduction-to-memcached
290
http://www.slideshare.net/dvirsky/introduction-to-redis
291
http://static.googleusercontent.com/media/research.google.com/zh-CN/us/archive/gfs-sosp2003.pdf
292
http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html
293
http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/archive/chubby-osdi06.pdf
294
http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/36356.pdf
295
http://www.slideshare.net/mumrah/kafka-talk-tri-hug
Appendix
Company architectures
Company Reference(s)
Amazon Amazon architecture297
Cinchcast Producing 1,500 hours of audio every day298
DataSift Realtime datamining At 120,000 tweets per second299
Dropbox How we’ve scaled Dropbox300
ESPN Operating At 100,000 duh nuh nuhs per second301
Google Google architecture302
Instagram 14 million users, terabytes of photos303 What powers
Instagram304
Justin.tv Justin.Tv’s live video broadcasting architecture305
Facebook Scaling memcached at Facebook306 TAO: Facebook’s
distributed data store for the social graph307 Facebook’s
photo storage308 How Facebook Live Streams To 800,000
Simultaneous Viewers309
Flickr Flickr architecture310
Mailbox From 0 to one million users in 6 weeks311
Netflix A 360 Degree View Of The Entire Netflix Stack312 Netflix:
What Happens When You Press Play?313
Pinterest From 0 To 10s of billions of page views a month314 18
million visitors, 10x growth, 12 employees315
Playfish 50 million monthly users and growing316
PlentyOfFish PlentyOfFish architecture317
Salesforce How they handle 1.3 billion transactions a day318
Stack Overflow Stack Overflow architecture319
296
http://www.slideshare.net/sauravhaloi/introduction-to-apache-zookeeper
297
http://highscalability.com/amazon-architecture
298
http://highscalability.com/blog/2012/7/16/cinchcast-architecture-producing-1500-hours-of-audio-every-d.html
299
http://highscalability.com/blog/2011/11/29/datasift-architecture-realtime-datamining-at-120000-tweets-p.html
300
https://www.youtube.com/watch?v=PE4gwstWhmc
301
http://highscalability.com/blog/2013/11/4/espns-architecture-at-scale-operating-at-100000-duh-nuh-nuhs.html
302
http://highscalability.com/google-architecture
303
http://highscalability.com/blog/2011/12/6/instagram-architecture-14-million-users-terabytes-of-photos.html
304
http://instagram-engineering.tumblr.com/post/13649370142/what-powers-instagram-hundreds-of-instances
305
http://highscalability.com/blog/2010/3/16/justintvs-live-video-broadcasting-architecture.html
306
https://cs.uwaterloo.ca/~brecht/courses/854-Emerging-2014/readings/key-value/fb-memcached-nsdi-2013.pdf
307
https://cs.uwaterloo.ca/~brecht/courses/854-Emerging-2014/readings/data-store/tao-facebook-distributed-datastore-atc-2013.pdf
308
https://www.usenix.org/legacy/event/osdi10/tech/full_papers/Beaver.pdf
309
http://highscalability.com/blog/2016/6/27/how-facebook-live-streams-to-800000-simultaneous-viewers.html
310
http://highscalability.com/flickr-architecture
311
http://highscalability.com/blog/2013/6/18/scaling-mailbox-from-0-to-one-million-users-in-6-weeks-and-1.html
312
http://highscalability.com/blog/2015/11/9/a-360-degree-view-of-the-entire-netflix-stack.html
313
http://highscalability.com/blog/2017/12/11/netflix-what-happens-when-you-press-play.html
314
http://highscalability.com/blog/2013/4/15/scaling-pinterest-from-0-to-10s-of-billions-of-page-views-a.html
315
http://highscalability.com/blog/2012/5/21/pinterest-architecture-update-18-million-visitors-10x-growth.html
316
http://highscalability.com/blog/2010/9/21/playfishs-social-gaming-architecture-50-million-monthly-user.html
317
http://highscalability.com/plentyoffish-architecture
318
http://highscalability.com/blog/2013/9/23/salesforce-architecture-how-they-handle-13-billion-transacti.html
319
http://highscalability.com/blog/2009/8/5/stack-overflow-architecture.html
The System Design Primer
Company Reference(s)
TripAdvisor 40M visitors, 200M dynamic page views, 30TB data320
Tumblr 15 billion page views a month321
Twitter Making Twitter 10000 percent faster322 Storing 250
million tweets a day using MySQL323 150M active users,
300K QPS, a 22 MB/S firehose324 Timelines at
scale325 Big and small data at Twitter326 Operations at
Twitter: scaling beyond 100 million users327 How Twitter
Handles 3,000 Images Per Second328
Uber How Uber scales their real-time market
platform329 Lessons Learned From Scaling Uber To 2000
Engineers, 1000 Services, And 8000 Git Repositories330
WhatsApp The WhatsApp architecture Facebook bought for $19
billion331
YouTube YouTube scalability332 YouTube architecture333
• Airbnb Engineering334
• Atlassian Developers335
• AWS Blog336
• Bitly Engineering Blog337
• Box Blogs338
• Cloudera Developer Blog339
• Dropbox Tech Blog340
• Engineering at Quora341
• Ebay Tech Blog342
• Evernote Tech Blog343
• Etsy Code as Craft344
320
http://highscalability.com/blog/2011/6/27/tripadvisor-architecture-40m-visitors-200m-dynamic-page-view.html
321
http://highscalability.com/blog/2012/2/13/tumblr-architecture-15-billion-page-views-a-month-and-harder.html
322
http://highscalability.com/scaling-twitter-making-twitter-10000-percent-faster
323
http://highscalability.com/blog/2011/12/19/how-twitter-stores-250-million-tweets-a-day-using-mysql.html
324
http://highscalability.com/blog/2013/7/8/the-architecture-twitter-uses-to-deal-with-150m-active-users.html
325
https://www.infoq.com/presentations/Twitter-Timeline-Scalability
326
https://www.youtube.com/watch?v=5cKTP36HVgI
327
https://www.youtube.com/watch?v=z8LU0Cj6BOU
328
http://highscalability.com/blog/2016/4/20/how-twitter-handles-3000-images-per-second.html
329
http://highscalability.com/blog/2015/9/14/how-uber-scales-their-real-time-market-platform.html
330
http://highscalability.com/blog/2016/10/12/lessons-learned-from-scaling-uber-to-2000-engineers-1000-ser.html
331
http://highscalability.com/blog/2014/2/26/the-whatsapp-architecture-facebook-bought-for-19-billion.html
332
https://www.youtube.com/watch?v=w5WVu624fY8
333
http://highscalability.com/youtube-architecture
334
http://nerds.airbnb.com/
335
https://developer.atlassian.com/blog/
336
https://aws.amazon.com/blogs/aws/
337
http://word.bitly.com/
338
https://blog.box.com/blog/category/engineering
339
http://blog.cloudera.com/
340
https://tech.dropbox.com/
341
https://www.quora.com/q/quoraengineering
342
http://www.ebaytechblog.com/
343
https://blog.evernote.com/tech/
344
http://codeascraft.com/
Appendix
• Facebook Engineering345
• Flickr Code346
• Foursquare Engineering Blog347
• GitHub Engineering Blog348
• Google Research Blog349
• Groupon Engineering Blog350
• Heroku Engineering Blog351
• Hubspot Engineering Blog352
• High Scalability353
• Instagram Engineering354
• Intel Software Blog355
• Jane Street Tech Blog356
• LinkedIn Engineering357
• Microsoft Engineering358
• Microsoft Python Engineering359
• Netflix Tech Blog360
• Paypal Developer Blog361
• Pinterest Engineering Blog362
• Reddit Blog363
• Salesforce Engineering Blog364
• Slack Engineering Blog365
• Spotify Labs366
• Twilio Engineering Blog367
• Twitter Engineering368
• Uber Engineering Blog369
• Yahoo Engineering Blog370
• Yelp Engineering Blog371
• Zynga Engineering Blog372
Looking to add a blog? To avoid duplicating work, consider adding your company blog to the following repo:
345
https://www.facebook.com/Engineering
346
http://code.flickr.net/
347
http://engineering.foursquare.com/
348
https://github.blog/category/engineering
349
http://googleresearch.blogspot.com/
350
https://engineering.groupon.com/
351
https://engineering.heroku.com/
352
http://product.hubspot.com/blog/topic/engineering
353
http://highscalability.com/
354
http://instagram-engineering.tumblr.com/
355
https://software.intel.com/en-us/blogs/
356
https://blogs.janestreet.com/category/ocaml/
357
http://engineering.linkedin.com/blog
358
https://engineering.microsoft.com/
359
https://blogs.msdn.microsoft.com/pythonengineering/
360
http://techblog.netflix.com/
361
https://medium.com/paypal-engineering
362
https://medium.com/@Pinterest_Engineering
363
http://www.redditblog.com/
364
https://developer.salesforce.com/blogs/engineering/
365
https://slack.engineering/
366
https://labs.spotify.com/
367
http://www.twilio.com/engineering
368
https://blog.twitter.com/engineering/
369
http://eng.uber.com/
370
http://yahooeng.tumblr.com/
371
http://engineeringblog.yelp.com/
372
https://www.zynga.com/blogs/engineering
The System Design Primer
• kilimchoi/engineering-blogs373
Under development
Credits
• Hired in tech374
• Cracking the coding interview375
• High scalability376
• checkcheckzz/system-design-interview377
• shashank88/system_design378
• mmcgrana/services-engineering379
• System design cheat sheet380
• A distributed systems reading list381
• Cracking the system design interview382
Contact info
License
I am providing code and resources in this repository to you under an open source license. Because this is my personal
repository, the license you receive to my code and resources is from me and not my employer (Facebook).
http://creativecommons.org/licenses/by/4.0/
373
https://github.com/kilimchoi/engineering-blogs
374
http://www.hiredintech.com/system-design/the-system-design-process/
375
https://www.amazon.com/dp/0984782850/
376
http://highscalability.com/
377
https://github.com/checkcheckzz/system-design-interview
378
https://github.com/shashank88/system_design
379
https://github.com/mmcgrana/services-engineering
380
https://gist.github.com/vasanthk/485d1c25737e8e72759f
381
http://dancres.github.io/Pages/
382
http://www.puncsky.com/blog/2016-02-13-crack-the-system-design-interview
383
https://github.com/donnemartin
Design the data structures for a social network
Note: This document links directly to relevant areas found in the system design topics1 to avoid duplication. Refer to
the linked content for general talking points, tradeoffs, and alternatives.
Gather requirements and scope the problem. Ask questions to clarify use cases and constraints. Discuss
assumptions.
Without an interviewer to address clarifying questions, we’ll define some use cases and constraints.
Use cases
We’ll scope the problem to handle only the following use cases
• User searches for someone and sees the shortest path to the searched person
• Service has high availability
State assumptions
Exercise the use of more traditional systems - don’t use graph-specific solutions such as GraphQL2 or a graph database
like Neo4j3
Calculate usage
Clarify with your interviewer if you should run back-of-the-envelope usage calculations.
Figure 1: High level design of the data structures for a social network
Use case: User searches for someone and sees the shortest path to the searched person
Clarify with your interviewer how much code you are expected to write.
Without the constraint of millions of users (vertices) and billions of friend relationships (edges), we could solve this
unweighted shortest path task with a general BFS approach:
class Graph(Graph):
return None
else:
path_ids = [dest.key]
prev_node_key = prev_node_keys[dest.key]
while prev_node_key is not None:
path_ids.append(prev_node_key)
prev_node_key = prev_node_keys[prev_node_key]
return path_ids[::-1]
We won’t be able to fit all users on the same machine, we’ll need to shard4 users across Person Servers and access
them with a Lookup Service.
• The Client sends a request to the Web Server, running as a reverse proxy5
• The Web Server forwards the request to the Search API server
• The Search API server forwards the request to the User Graph Service
• The User Graph Service does the following:
• Uses the Lookup Service to find the Person Server where the current user’s info is stored
• Finds the appropriate Person Server to retrieve the current user’s list of friend_ids
• Runs a BFS search using the current user as the source and the current user’s friend_ids as the ids for
each adjacent_node
• To get the adjacent_node from a given id:
• The User Graph Service will again need to communicate with the Lookup Service to determine
which Person Server stores theadjacent_node matching the given id (potential for optimization)
Clarify with your interviewer how much code you should be writing.
Note: Error handling is excluded below for simplicity. Ask if you should code proper error handing.
class LookupService(object):
def __init__(self):
self.lookup = self._init_lookup() # key: person_id, value: person_server
def _init_lookup(self):
...
4
https://github.com/donnemartin/system-design-primer#sharding
5
https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server
Design the data structures for a social network
class PersonServer(object):
def __init__(self):
self.people = {} # key: person_id, value: person
Person implementation:
class Person(object):
class UserGraphService(object):
$ curl https://social.com/api/v1/friend_search?person_id=1234
Response:
{
"person_id": "100",
"name": "foo",
"link": "https://social.com/foo",
},
{
"person_id": "53",
"name": "bar",
"link": "https://social.com/bar",
},
{
"person_id": "1234",
"name": "baz",
"link": "https://social.com/baz",
},
Important: Do not simply jump right into the final design from the initial design!
State you would 1) Benchmark/Load Test, 2) Profile for bottlenecks 3) address bottlenecks while evaluating
alternatives and trade-offs, and 4) repeat. See Design a system that scales to millions of users on AWS8 as a sample
on how to iteratively scale the initial design.
It’s important to discuss what bottlenecks you might encounter with the initial design and how you might address
each of them. For example, what issues are addressed by adding a Load Balancer with multiple Web Servers?
CDN? Master-Slave Replicas? What are the alternatives and Trade-Offs for each?
8
../scaling_aws/README.md
Additional talking points
We’ll introduce some components to complete the design and to address scalability issues. Internal load balancers are
not shown to reduce clutter.
To avoid repeating discussions, refer to the following system design topics9 for main talking points, tradeoffs, and
alternatives:
• DNS10
• Load balancer11
• Horizontal scaling12
• Web server (reverse proxy)13
• API server (application layer)14
• Cache15
• Consistency patterns16
• Availability patterns17
To address the constraint of 400 average read requests per second (higher at peak), person data can be served from a
Memory Cache such as Redis or Memcached to reduce response times and to reduce traffic to downstream services.
This could be especially useful for people who do multiple searches in succession and for people who are well-connected.
Reading 1 MB sequentially from memory takes about 250 microseconds, while reading from SSD takes 4x and from
disk takes 80x longer.1
• Store complete or partial BFS traversals to speed up subsequent lookups in the Memory Cache
• Batch compute offline then store complete or partial BFS traversals to speed up subsequent lookups in a NoSQL
Database
• Reduce machine jumps by batching together friend lookups hosted on the same Person Server
• Shard18 Person Servers by ___location to further improve this, as friends generally live closer to each other
• Do two BFS searches at the same time, one starting from the source, and one from the destination, then merge
the two paths
• Start the BFS search from people with large numbers of friends, as they are more likely to reduce the number
of degrees of separation19 between the current user and the search target
• Set a limit based on time or number of hops before asking the user if they want to continue searching, as searching
could take a considerable amount of time in some cases
• Use a Graph Database such as Neo4j20 or a graph-specific query language such as GraphQL21 (if there were
no constraint preventing the use of Graph Databases)
Additional topics to dive into, depending on the problem scope and time remaining.
9
https://github.com/donnemartin/system-design-primer#index-of-system-design-topics
10
https://github.com/donnemartin/system-design-primer#___domain-name-system
11
https://github.com/donnemartin/system-design-primer#load-balancer
12
https://github.com/donnemartin/system-design-primer#horizontal-scaling
13
https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server
14
https://github.com/donnemartin/system-design-primer#application-layer
15
https://github.com/donnemartin/system-design-primer#cache
16
https://github.com/donnemartin/system-design-primer#consistency-patterns
17
https://github.com/donnemartin/system-design-primer#availability-patterns
18
https://github.com/donnemartin/system-design-primer#sharding
19
https://en.wikipedia.org/wiki/Six_degrees_of_separation
20
https://neo4j.com/
21
http://graphql.org/
Design the data structures for a social network
• Read replicas22
• Federation23
• Sharding24
• Denormalization25
• SQL Tuning26
NoSQL
• Key-value store27
• Document store28
• Wide column store29
• Graph database30
• SQL vs NoSQL31
Caching
• Where to cache
• Client caching32
• CDN caching33
• Web server caching34
• Database caching35
• Application caching36
• What to cache
• Cache-aside39
• Write-through40
• Write-behind (write-back)41
• Refresh ahead42
22
https://github.com/donnemartin/system-design-primer#master-slave-replication
23
https://github.com/donnemartin/system-design-primer#federation
24
https://github.com/donnemartin/system-design-primer#sharding
25
https://github.com/donnemartin/system-design-primer#denormalization
26
https://github.com/donnemartin/system-design-primer#sql-tuning
27
https://github.com/donnemartin/system-design-primer#key-value-store
28
https://github.com/donnemartin/system-design-primer#document-store
29
https://github.com/donnemartin/system-design-primer#wide-column-store
30
https://github.com/donnemartin/system-design-primer#graph-database
31
https://github.com/donnemartin/system-design-primer#sql-or-nosql
32
https://github.com/donnemartin/system-design-primer#client-caching
33
https://github.com/donnemartin/system-design-primer#cdn-caching
34
https://github.com/donnemartin/system-design-primer#web-server-caching
35
https://github.com/donnemartin/system-design-primer#database-caching
36
https://github.com/donnemartin/system-design-primer#application-caching
37
https://github.com/donnemartin/system-design-primer#caching-at-the-database-query-level
38
https://github.com/donnemartin/system-design-primer#caching-at-the-object-level
39
https://github.com/donnemartin/system-design-primer#cache-aside
40
https://github.com/donnemartin/system-design-primer#write-through
41
https://github.com/donnemartin/system-design-primer#write-behind-write-back
42
https://github.com/donnemartin/system-design-primer#refresh-ahead
Additional talking points
• Message queues43
• Task queues44
• Back pressure45
• Microservices46
Communications
• Discuss tradeoffs:
• External communication with clients - HTTP APIs following REST47
• Internal communications - RPC48
• Service discovery49
Security
Latency numbers
Ongoing
• Continue benchmarking and monitoring your system to address bottlenecks as they come up
• Scaling is an iterative process
43
https://github.com/donnemartin/system-design-primer#message-queues
44
https://github.com/donnemartin/system-design-primer#task-queues
45
https://github.com/donnemartin/system-design-primer#back-pressure
46
https://github.com/donnemartin/system-design-primer#microservices
47
https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest
48
https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc
49
https://github.com/donnemartin/system-design-primer#service-discovery
50
https://github.com/donnemartin/system-design-primer#security
51
https://github.com/donnemartin/system-design-primer#latency-numbers-every-programmer-should-know
Design a web crawler
Note: This document links directly to relevant areas found in the system design topics1 to avoid duplication. Refer to
the linked content for general talking points, tradeoffs, and alternatives.
Gather requirements and scope the problem. Ask questions to clarify use cases and constraints. Discuss
assumptions.
Without an interviewer to address clarifying questions, we’ll define some use cases and constraints.
Use cases
We’ll scope the problem to handle only the following use cases
Out of scope
• Search analytics
• Personalized search results
• Page rank
State assumptions
Exercise the use of more traditional systems - don’t use existing systems such as solr2 or nutch3 .
Calculate usage
Clarify with your interviewer if you should run back-of-the-envelope usage calculations.
We’ll assume we have an initial list of links_to_crawl ranked initially based on overall site popularity. If this is
not a reasonable assumption, we can seed the crawler with popular sites that link to outside content such as Yahoo4 ,
DMOZ5 , etc.
We’ll use a table crawled_links to store processed links and their page signatures.
We could store links_to_crawl and crawled_links in a key-value NoSQL Database. For the ranked links in
links_to_crawl, we could use Redis6 with sorted sets to maintain a ranking of page links. We should discuss the use
cases and tradeoffs between choosing SQL or NoSQL7 .
The Crawler Service processes each page link by doing the following in a loop:
2
http://lucene.apache.org/solr/
3
http://nutch.apache.org/
4
https://www.yahoo.com/
5
http://www.dmoz.org/
6
https://redis.io/
7
https://github.com/donnemartin/system-design-primer#sql-or-nosql
Step 3: Design core components
Clarify with your interviewer how much code you are expected to write.
PagesDataStore is an abstraction within the Crawler Service that uses the NoSQL Database:
class PagesDataStore(object):
def extract_max_priority_page(self):
"""Return the highest priority link in `links_to_crawl`."""
...
Page is an abstraction within the Crawler Service that encapsulates a page, its contents, child urls, and signature:
class Page(object):
Crawler is the main class within Crawler Service, composed of Page and PagesDataStore.
class Crawler(object):
def crawl(self):
while True:
page = self.data_store.extract_max_priority_page()
if page is None:
break
if self.data_store.crawled_similar(page.signature):
self.data_store.reduce_priority_link_to_crawl(page.url)
else:
self.crawl_page(page)
Handling duplicates
We need to be careful the web crawler doesn’t get stuck in an infinite loop, which happens when the graph contains a
cycle.
Clarify with your interviewer how much code you are expected to write.
class RemoveDuplicateUrls(MRJob):
Detecting duplicate content is more complex. We could generate a signature based on the contents of the page and
compare those two signatures for similarity. Some potential algorithms are Jaccard index9 and cosine similarity10 .
9
https://en.wikipedia.org/wiki/Jaccard_index
10
https://en.wikipedia.org/wiki/Cosine_similarity
Design a web crawler
Pages need to be crawled regularly to ensure freshness. Crawl results could have a timestamp field that indicates the
last time a page was crawled. After a default time period, say one week, all pages should be refreshed. Frequently
updated or more popular sites could be refreshed in shorter intervals.
Although we won’t dive into details on analytics, we could do some data mining to determine the mean time before a
particular page is updated, and use that statistic to determine how often to re-crawl the page.
We might also choose to support a Robots.txt file that gives webmasters control of crawl frequency.
Use case: User inputs a search term and sees a list of relevant pages with titles and snippets
• The Client sends a request to the Web Server, running as a reverse proxy11
• The Web Server forwards the request to the Query API server
• The Query API server does the following:
$ curl https://search.com/api/v1/search?query=hello+world
Response:
{
"title": "foo's title",
"snippet": "foo's snippet",
"link": "https://foo.com",
},
{
"title": "bar's title",
"snippet": "bar's snippet",
"link": "https://bar.com",
},
{
"title": "baz's title",
"snippet": "baz's snippet",
"link": "https://baz.com",
},
Important: Do not simply jump right into the final design from the initial design!
State you would 1) Benchmark/Load Test, 2) Profile for bottlenecks 3) address bottlenecks while evaluating
alternatives and trade-offs, and 4) repeat. See Design a system that scales to millions of users on AWS14 as a sample
on how to iteratively scale the initial design.
It’s important to discuss what bottlenecks you might encounter with the initial design and how you might address
each of them. For example, what issues are addressed by adding a Load Balancer with multiple Web Servers?
CDN? Master-Slave Replicas? What are the alternatives and Trade-Offs for each?
We’ll introduce some components to complete the design and to address scalability issues. Internal load balancers are
not shown to reduce clutter.
To avoid repeating discussions, refer to the following system design topics15 for main talking points, tradeoffs, and
alternatives:
• DNS16
• Load balancer17
• Horizontal scaling18
• Web server (reverse proxy)19
• API server (application layer)20
• Cache21
• NoSQL22
• Consistency patterns23
• Availability patterns24
Some searches are very popular, while others are only executed once. Popular queries can be served from a Memory
Cache such as Redis or Memcached to reduce response times and to avoid overloading the Reverse Index Service
and Document Service. The Memory Cache is also useful for handling the unevenly distributed traffic and traffic
spikes. Reading 1 MB sequentially from memory takes about 250 microseconds, while reading from SSD takes 4x and
from disk takes 80x longer.1
• To handle the data size and request load, the Reverse Index Service and Document Service will likely need
to make heavy use sharding and federation.
• DNS lookup can be a bottleneck, the Crawler Service can keep its own DNS lookup that is refreshed periodically
• The Crawler Service can improve performance and reduce memory usage by keeping many open connections
at a time, referred to as connection pooling25
• Web crawling is bandwidth intensive, ensure there is enough bandwidth to sustain high throughput
14
../scaling_aws/README.md
15
https://github.com/donnemartin/system-design-primer#index-of-system-design-topics
16
https://github.com/donnemartin/system-design-primer#___domain-name-system
17
https://github.com/donnemartin/system-design-primer#load-balancer
18
https://github.com/donnemartin/system-design-primer#horizontal-scaling
19
https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server
20
https://github.com/donnemartin/system-design-primer#application-layer
21
https://github.com/donnemartin/system-design-primer#cache
22
https://github.com/donnemartin/system-design-primer#nosql
23
https://github.com/donnemartin/system-design-primer#consistency-patterns
24
https://github.com/donnemartin/system-design-primer#availability-patterns
25
https://en.wikipedia.org/wiki/Connection_pool
26
https://github.com/donnemartin/system-design-primer#user-datagram-protocol-udp
Design a web crawler
Additional topics to dive into, depending on the problem scope and time remaining.
• Read replicas27
• Federation28
• Sharding29
• Denormalization30
• SQL Tuning31
NoSQL
• Key-value store32
• Document store33
• Wide column store34
• Graph database35
• SQL vs NoSQL36
Caching
• Where to cache
• Client caching37
• CDN caching38
• Web server caching39
• Database caching40
• Application caching41
• What to cache
• Caching at the database query level42
• Caching at the object level43
• When to update the cache
• Cache-aside44
• Write-through45
• Write-behind (write-back)46
27
https://github.com/donnemartin/system-design-primer#master-slave-replication
28
https://github.com/donnemartin/system-design-primer#federation
29
https://github.com/donnemartin/system-design-primer#sharding
30
https://github.com/donnemartin/system-design-primer#denormalization
31
https://github.com/donnemartin/system-design-primer#sql-tuning
32
https://github.com/donnemartin/system-design-primer#key-value-store
33
https://github.com/donnemartin/system-design-primer#document-store
34
https://github.com/donnemartin/system-design-primer#wide-column-store
35
https://github.com/donnemartin/system-design-primer#graph-database
36
https://github.com/donnemartin/system-design-primer#sql-or-nosql
37
https://github.com/donnemartin/system-design-primer#client-caching
38
https://github.com/donnemartin/system-design-primer#cdn-caching
39
https://github.com/donnemartin/system-design-primer#web-server-caching
40
https://github.com/donnemartin/system-design-primer#database-caching
41
https://github.com/donnemartin/system-design-primer#application-caching
42
https://github.com/donnemartin/system-design-primer#caching-at-the-database-query-level
43
https://github.com/donnemartin/system-design-primer#caching-at-the-object-level
44
https://github.com/donnemartin/system-design-primer#cache-aside
45
https://github.com/donnemartin/system-design-primer#write-through
46
https://github.com/donnemartin/system-design-primer#write-behind-write-back
Design a web crawler
• Refresh ahead47
• Message queues48
• Task queues49
• Back pressure50
• Microservices51
Communications
• Discuss tradeoffs:
• External communication with clients - HTTP APIs following REST52
• Internal communications - RPC53
• Service discovery54
Security
Latency numbers
Ongoing
• Continue benchmarking and monitoring your system to address bottlenecks as they come up
• Scaling is an iterative process
47
https://github.com/donnemartin/system-design-primer#refresh-ahead
48
https://github.com/donnemartin/system-design-primer#message-queues
49
https://github.com/donnemartin/system-design-primer#task-queues
50
https://github.com/donnemartin/system-design-primer#back-pressure
51
https://github.com/donnemartin/system-design-primer#microservices
52
https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest
53
https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc
54
https://github.com/donnemartin/system-design-primer#service-discovery
55
https://github.com/donnemartin/system-design-primer#security
56
https://github.com/donnemartin/system-design-primer#latency-numbers-every-programmer-should-know
Design a system that scales to millions of users on AWS
Note: This document links directly to relevant areas found in the system design topics1 to avoid duplication. Refer to
the linked content for general talking points, tradeoffs, and alternatives.
Gather requirements and scope the problem. Ask questions to clarify use cases and constraints. Discuss
assumptions.
Without an interviewer to address clarifying questions, we’ll define some use cases and constraints.
Use cases
Solving this problem takes an iterative approach of: 1) Benchmark/Load Test, 2) Profile for bottlenecks 3) address
bottlenecks while evaluating alternatives and trade-offs, and 4) repeat, which is good pattern for evolving basic designs
to scalable designs.
Unless you have a background in AWS or are applying for a position that requires AWS knowledge, AWS-specific
details are not a requirement. However, much of the principles discussed in this exercise can apply more
generally outside of the AWS ecosystem.
We’ll scope the problem to handle only the following use cases
State assumptions
Calculate usage
Clarify with your interviewer if you should run back-of-the-envelope usage calculations.
Goals
The constraints assume there is a need for relational data. We can start off using a MySQL Database on the single
box.
Trade-offs, alternatives, and additional details:
Use a DNS
Add a DNS such as Route 53 to map the ___domain to the instance’s public IP.
Trade-offs, alternatives, and additional details:
Users+
Figure 2: Scaled design of an AWS service to lighten load on a single box and allow for independent scaling
Assumptions
Our user count is starting to pick up and the load is increasing on our single box. Our Benchmarks/Load Tests
and Profiling are pointing to the MySQL Database taking up more and more memory and CPU resources, while
the user content is filling up disk space.
We’ve been able to address these issues with Vertical Scaling so far. Unfortunately, this has become quite expensive
and it doesn’t allow for independent scaling of the MySQL Database and Web Server.
7
https://github.com/donnemartin/system-design-primer#security
Step 4: Scale the design
Goals
• Lighten load on the single box and allow for independent scaling
• Disadvantages
• These changes would increase complexity and would require changes to the Web Server to point to the
Object Store and the MySQL Database
• Additional security measures must be taken to secure the new components
• AWS costs could also increase, but should be weighed with the costs of managing similar systems on your
own
• User files
• JS
• CSS
• Images
• Videos
• Create a public subnet for the single Web Server so it can send and receive traffic from the internet
• Create a private subnet for everything else, preventing outside access
• Only open ports from whitelisted IPs for each component
• These same patterns should be implemented for new components in the remainder of the exercise
Users++
Assumptions
Our Benchmarks/Load Tests and Profiling show that our single Web Server bottlenecks during peak hours,
resulting in slow responses and in some cases, downtime. As the service matures, we’d also like to move towards higher
availability and redundancy.
Goals
• The following goals attempt to address the scaling issues with the Web Server
• Based on the Benchmarks/Load Tests and Profiling, you might only need to implement one or two of
these techniques
• Use Horizontal Scaling9 to handle increasing loads and to address single points of failure
• Add a Load Balancer10 such as Amazon’s ELB or HAProxy
• ELB is highly available
• If you are configuring your own Load Balancer, setting up multiple servers in active-active11 or
active-passive12 in multiple availability zones will improve availability
• Terminate SSL on the Load Balancer to reduce computational load on backend servers and to simplify
certificate administration
• Use multiple Web Servers spread out over multiple availability zones
• Use multiple MySQL instances in Master-Slave Failover13 mode across multiple availability zones to
improve redundancy
• Separate out the Web Servers from the Application Servers14
• Scale and configure both layers independently
• Web Servers can run as a Reverse Proxy15
• For example, you can add Application Servers handling Read APIs while others handle Write APIs
• Move static (and some dynamic) content to a Content Delivery Network (CDN)16 such as CloudFront to
reduce load and latency
Users+++
Assumptions
Our Benchmarks/Load Tests and Profiling show that we are read-heavy (100:1 with writes) and our database is
suffering from poor performance from the high read requests.
8
https://github.com/donnemartin/system-design-primer#security
9
https://github.com/donnemartin/system-design-primer#horizontal-scaling
10
https://github.com/donnemartin/system-design-primer#load-balancer
11
https://github.com/donnemartin/system-design-primer#active-active
12
https://github.com/donnemartin/system-design-primer#active-passive
13
https://github.com/donnemartin/system-design-primer#master-slave-replication
14
https://github.com/donnemartin/system-design-primer#application-layer
15
https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server
16
https://github.com/donnemartin/system-design-primer#content-delivery-network
Design a system that scales to millions of users on AWS
Goals
• The following goals attempt to address the scaling issues with the MySQL Database
• Based on the Benchmarks/Load Tests and Profiling, you might only need to implement one or two of
these techniques
• Move the following data to a Memory Cache17 such as Elasticache to reduce load and latency:
• Frequently accessed content from MySQL
• First, try to configure the MySQL Database cache to see if that is sufficient to relieve the bottleneck
before implementing a Memory Cache
• Session data from the Web Servers
• The Web Servers become stateless, allowing for Autoscaling
• Reading 1 MB sequentially from memory takes about 250 microseconds, while reading from SSD takes 4x
and from disk takes 80x longer.1
• Add MySQL Read Replicas18 to reduce load on the write master
• Add more Web Servers and Application Servers to improve responsiveness
• In addition to adding and scaling a Memory Cache, MySQL Read Replicas can also help relieve load on
the MySQL Write Master
• Add logic to Web Server to separate out writes and reads
• Add Load Balancers in front of MySQL Read Replicas (not pictured to reduce clutter)
• Most services are read-heavy vs write-heavy
Users++++
Assumptions
Our Benchmarks/Load Tests and Profiling show that our traffic spikes during regular business hours in the U.S.
and drop significantly when users leave the office. We think we can cut costs by automatically spinning up and down
servers based on actual load. We’re a small shop so we’d like to automate as much of the DevOps as possible for
Autoscaling and for the general operations.
Goals
Add autoscaling
Users+++++
Assumptions
As the service continues to grow towards the figures outlined in the constraints, we iteratively run Benchmarks/Load
Tests and Profiling to uncover and address new bottlenecks.
Goals
• If our MySQL Database starts to grow too large, we might consider only storing a limited time period of data
in the database, while storing the rest in a data warehouse such as Redshift
• A data warehouse such as Redshift can comfortably handle the constraint of 1 TB of new content per month
• With 40,000 average read requests per second, read traffic for popular content can be addressed by scaling the
Memory Cache, which is also useful for handling the unevenly distributed traffic and traffic spikes
• The SQL Read Replicas might have trouble handling the cache misses, we’ll probably need to employ
additional SQL scaling patterns
• 400 average writes per second (with presumably significantly higher peaks) might be tough for a single SQL
Write Master-Slave, also pointing to a need for additional scaling techniques
• Federation20
• Sharding21
• Denormalization22
• SQL Tuning23
To further address the high read and write requests, we should also consider moving appropriate data to a NoSQL
Database24 such as DynamoDB.
We can further separate out our Application Servers25 to allow for independent scaling. Batch processes or compu-
tations that do not need to be done in real-time can be done Asynchronously26 with Queues and Workers:
• For example, in a photo service, the photo upload and the thumbnail creation can be separated:
• Client uploads photo
• Application Server puts a job in a Queue such as SQS
• The Worker Service on EC2 or Lambda pulls work off the Queue then:
• Creates a thumbnail
• Updates a Database
• Stores the thumbnail in the Object Store
Additional topics to dive into, depending on the problem scope and time remaining.
• Read replicas27
• Federation28
• Sharding29
• Denormalization30
• SQL Tuning31
NoSQL
• Key-value store32
• Document store33
• Wide column store34
• Graph database35
20
https://github.com/donnemartin/system-design-primer#federation
21
https://github.com/donnemartin/system-design-primer#sharding
22
https://github.com/donnemartin/system-design-primer#denormalization
23
https://github.com/donnemartin/system-design-primer#sql-tuning
24
https://github.com/donnemartin/system-design-primer#nosql
25
https://github.com/donnemartin/system-design-primer#application-layer
26
https://github.com/donnemartin/system-design-primer#asynchronism
27
https://github.com/donnemartin/system-design-primer#master-slave-replication
28
https://github.com/donnemartin/system-design-primer#federation
29
https://github.com/donnemartin/system-design-primer#sharding
30
https://github.com/donnemartin/system-design-primer#denormalization
31
https://github.com/donnemartin/system-design-primer#sql-tuning
32
https://github.com/donnemartin/system-design-primer#key-value-store
33
https://github.com/donnemartin/system-design-primer#document-store
34
https://github.com/donnemartin/system-design-primer#wide-column-store
35
https://github.com/donnemartin/system-design-primer#graph-database
Design a system that scales to millions of users on AWS
• SQL vs NoSQL36
Caching
• Where to cache
• Client caching37
• CDN caching38
• Web server caching39
• Database caching40
• Application caching41
• What to cache
• Cache-aside44
• Write-through45
• Write-behind (write-back)46
• Refresh ahead47
• Message queues48
• Task queues49
• Back pressure50
• Microservices51
Communications
• Discuss tradeoffs:
• Service discovery54
36
https://github.com/donnemartin/system-design-primer#sql-or-nosql
37
https://github.com/donnemartin/system-design-primer#client-caching
38
https://github.com/donnemartin/system-design-primer#cdn-caching
39
https://github.com/donnemartin/system-design-primer#web-server-caching
40
https://github.com/donnemartin/system-design-primer#database-caching
41
https://github.com/donnemartin/system-design-primer#application-caching
42
https://github.com/donnemartin/system-design-primer#caching-at-the-database-query-level
43
https://github.com/donnemartin/system-design-primer#caching-at-the-object-level
44
https://github.com/donnemartin/system-design-primer#cache-aside
45
https://github.com/donnemartin/system-design-primer#write-through
46
https://github.com/donnemartin/system-design-primer#write-behind-write-back
47
https://github.com/donnemartin/system-design-primer#refresh-ahead
48
https://github.com/donnemartin/system-design-primer#message-queues
49
https://github.com/donnemartin/system-design-primer#task-queues
50
https://github.com/donnemartin/system-design-primer#back-pressure
51
https://github.com/donnemartin/system-design-primer#microservices
52
https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest
53
https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc
54
https://github.com/donnemartin/system-design-primer#service-discovery
Additional talking points
Security
Latency numbers
Ongoing
• Continue benchmarking and monitoring your system to address bottlenecks as they come up
• Scaling is an iterative process
55
https://github.com/donnemartin/system-design-primer#security
56
https://github.com/donnemartin/system-design-primer#latency-numbers-every-programmer-should-know
Design Pastebin.com (or Bit.ly)
Note: This document links directly to relevant areas found in the system design topics1 to avoid duplication. Refer to
the linked content for general talking points, tradeoffs, and alternatives.
Design Bit.ly - is a similar question, except pastebin requires storing the paste contents instead of the original
unshortened url.
Gather requirements and scope the problem. Ask questions to clarify use cases and constraints. Discuss
assumptions.
Without an interviewer to address clarifying questions, we’ll define some use cases and constraints.
Use cases
We’ll scope the problem to handle only the following use cases
• Expiration
• Default setting does not expire
• Can optionally set a timed expiration
Out of scope
State assumptions
Calculate usage
Clarify with your interviewer if you should run back-of-the-envelope usage calculations.
Use case: User enters a block of text and gets a randomly generated link
We could use a relational database2 as a large hash table, mapping the generated url to a file server and path containing
the paste file.
Instead of managing a file server, we could use a managed Object Store such as Amazon S3 or a NoSQL document
store3 .
An alternative to a relational database acting as a large hash table, we could use a NoSQL key-value store4 . We
should discuss the tradeoffs between choosing SQL or NoSQL5 . The following discussion uses the relational database
approach.
• The Client sends a create paste request to the Web Server, running as a reverse proxy6
• The Web Server forwards the request to the Write API server
• The Write API server does the following:
• Generates a unique url
• Checks if the url is unique by looking at the SQL Database for a duplicate
• If the url is not unique, it generates another url
• If we supported a custom url, we could use the user-supplied (also check for a duplicate)
• Saves to the SQL Database pastes table
• Saves the paste data to the Object Store
• Returns the url
Clarify with your interviewer how much code you are expected to write.
The pastes table could have the following structure:
Setting the primary key to be based on the shortlink column creates an index7 that the database uses to enforce
uniqueness. We’ll create an additional index on created_at to speed up lookups (log-time instead of scanning the
entire table) and to keep the data in memory. Reading 1 MB sequentially from memory takes about 250 microseconds,
while reading from SSD takes 4x and from disk takes 80x longer.1
To generate the unique url, we could:
• The following Base 62 pseudocode10 runs in O(k) time where k is the number of digits = 7:
• Take the first 7 characters of the output, which results in 62^7 possible values and should be sufficient to handle
our constraint of 360 million shortlinks in 3 years:
url = base_encode(md5(ip_address+timestamp))[:URL_LENGTH]
Response:
{
"shortlink": "foobar"
}
Use case: User enters a paste’s url and views the contents
REST API:
$ curl https://pastebin.com/api/v1/paste?shortlink=foobar
Response:
{
"paste_contents": "Hello World"
"created_at": "YYYY-MM-DD HH:MM:SS"
"expiration_length_in_minutes": "60"
}
10
http://stackoverflow.com/questions/742013/how-to-code-a-url-shortener
11
https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest
12
https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc
Design Pastebin.com (or Bit.ly)
Since realtime analytics are not a requirement, we could simply MapReduce the Web Server logs to generate hit
counts.
Clarify with your interviewer how much code you are expected to write.
class HitCounts(MRJob):
(2016-01, url0), 1
(2016-01, url0), 1
(2016-01, url1), 1
"""
url = self.extract_url(line)
period = self.extract_year_month(line)
yield (period, url), 1
(2016-01, url0), 2
(2016-01, url1), 1
"""
yield key, sum(values)
To delete expired pastes, we could just scan the SQL Database for all entries whose expiration timestamp are older
than the current timestamp. All expired entries would then be deleted (or marked as expired) from the table.
Important: Do not simply jump right into the final design from the initial design!
State you would do this iteratively: 1) Benchmark/Load Test, 2) Profile for bottlenecks 3) address bottlenecks
while evaluating alternatives and trade-offs, and 4) repeat. See Design a system that scales to millions of users on
AWS13 as a sample on how to iteratively scale the initial design.
13
../scaling_aws/README.md
Step 4: Scale the design
It’s important to discuss what bottlenecks you might encounter with the initial design and how you might address
each of them. For example, what issues are addressed by adding a Load Balancer with multiple Web Servers?
CDN? Master-Slave Replicas? What are the alternatives and Trade-Offs for each?
We’ll introduce some components to complete the design and to address scalability issues. Internal load balancers are
not shown to reduce clutter.
To avoid repeating discussions, refer to the following system design topics14 for main talking points, tradeoffs, and
alternatives:
• DNS15
• CDN16
• Load balancer17
• Horizontal scaling18
• Web server (reverse proxy)19
• API server (application layer)20
• Cache21
• Relational database management system (RDBMS)22
• SQL write master-slave failover23
• Master-slave replication24
• Consistency patterns25
• Availability patterns26
The Analytics Database could use a data warehousing solution such as Amazon Redshift or Google BigQuery.
An Object Store such as Amazon S3 can comfortably handle the constraint of 12.7 GB of new content per month.
To address the 40 average read requests per second (higher at peak), traffic for popular content should be handled
by the Memory Cache instead of the database. The Memory Cache is also useful for handling the unevenly
distributed traffic and traffic spikes. The SQL Read Replicas should be able to handle the cache misses, as long as
the replicas are not bogged down with replicating writes.
4 average paste writes per second (with higher at peak) should be do-able for a single SQL Write Master-Slave.
Otherwise, we’ll need to employ additional SQL scaling patterns:
• Federation27
• Sharding28
• Denormalization29
• SQL Tuning30
Additional topics to dive into, depending on the problem scope and time remaining.
14
https://github.com/donnemartin/system-design-primer#index-of-system-design-topics
15
https://github.com/donnemartin/system-design-primer#___domain-name-system
16
https://github.com/donnemartin/system-design-primer#content-delivery-network
17
https://github.com/donnemartin/system-design-primer#load-balancer
18
https://github.com/donnemartin/system-design-primer#horizontal-scaling
19
https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server
20
https://github.com/donnemartin/system-design-primer#application-layer
21
https://github.com/donnemartin/system-design-primer#cache
22
https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms
23
https://github.com/donnemartin/system-design-primer#fail-over
24
https://github.com/donnemartin/system-design-primer#master-slave-replication
25
https://github.com/donnemartin/system-design-primer#consistency-patterns
26
https://github.com/donnemartin/system-design-primer#availability-patterns
27
https://github.com/donnemartin/system-design-primer#federation
28
https://github.com/donnemartin/system-design-primer#sharding
29
https://github.com/donnemartin/system-design-primer#denormalization
30
https://github.com/donnemartin/system-design-primer#sql-tuning
Additional talking points
NoSQL
• Key-value store31
• Document store32
• Wide column store33
• Graph database34
• SQL vs NoSQL35
Caching
• Where to cache
• Client caching36
• CDN caching37
• Web server caching38
• Database caching39
• Application caching40
• What to cache
• Cache-aside43
• Write-through44
• Write-behind (write-back)45
• Refresh ahead46
• Message queues47
• Task queues48
• Back pressure49
• Microservices50
31
https://github.com/donnemartin/system-design-primer#key-value-store
32
https://github.com/donnemartin/system-design-primer#document-store
33
https://github.com/donnemartin/system-design-primer#wide-column-store
34
https://github.com/donnemartin/system-design-primer#graph-database
35
https://github.com/donnemartin/system-design-primer#sql-or-nosql
36
https://github.com/donnemartin/system-design-primer#client-caching
37
https://github.com/donnemartin/system-design-primer#cdn-caching
38
https://github.com/donnemartin/system-design-primer#web-server-caching
39
https://github.com/donnemartin/system-design-primer#database-caching
40
https://github.com/donnemartin/system-design-primer#application-caching
41
https://github.com/donnemartin/system-design-primer#caching-at-the-database-query-level
42
https://github.com/donnemartin/system-design-primer#caching-at-the-object-level
43
https://github.com/donnemartin/system-design-primer#cache-aside
44
https://github.com/donnemartin/system-design-primer#write-through
45
https://github.com/donnemartin/system-design-primer#write-behind-write-back
46
https://github.com/donnemartin/system-design-primer#refresh-ahead
47
https://github.com/donnemartin/system-design-primer#message-queues
48
https://github.com/donnemartin/system-design-primer#task-queues
49
https://github.com/donnemartin/system-design-primer#back-pressure
50
https://github.com/donnemartin/system-design-primer#microservices
Design Pastebin.com (or Bit.ly)
Communications
• Discuss tradeoffs:
• External communication with clients - HTTP APIs following REST51
• Internal communications - RPC52
• Service discovery53
Security
Latency numbers
Ongoing
• Continue benchmarking and monitoring your system to address bottlenecks as they come up
• Scaling is an iterative process
51
https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest
52
https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc
53
https://github.com/donnemartin/system-design-primer#service-discovery
54
https://github.com/donnemartin/system-design-primer#security
55
https://github.com/donnemartin/system-design-primer#latency-numbers-every-programmer-should-know
Design Amazon’s sales rank by category feature
Note: This document links directly to relevant areas found in the system design topics1 to avoid duplication. Refer to
the linked content for general talking points, tradeoffs, and alternatives.
Gather requirements and scope the problem. Ask questions to clarify use cases and constraints. Discuss
assumptions.
Without an interviewer to address clarifying questions, we’ll define some use cases and constraints.
Use cases
We’ll scope the problem to handle only the following use case
Out of scope
State assumptions
• 10 million products
• 1000 categories
• 1 billion transactions per month
• 100 billion read requests per month
• 100:1 read to write ratio
1
https://github.com/donnemartin/system-design-primer#index-of-system-design-topics
Design Amazon’s sales rank by category feature
Calculate usage
Clarify with your interviewer if you should run back-of-the-envelope usage calculations.
Use case: Service calculates the past week’s most popular products by category
We could store the raw Sales API server log files on a managed Object Store such as Amazon S3, rather than
managing our own distributed file system.
Clarify with your interviewer how much code you are expected to write.
We’ll assume this is a sample log entry, tab delimited:
The Sales Rank Service could use MapReduce, using the Sales API server log files as input and writing the
results to an aggregate table sales_rank in a SQL Database. We should discuss the use cases and tradeoffs between
choosing SQL or NoSQL2 .
class SalesRanker(MRJob):
(category1, product1), 2
(category2, product1), 2
(category2, product1), 1
(category1, product2), 3
(category2, product3), 7
(category1, product4), 1
"""
timestamp, product_id, category_id, quantity, total_price, seller_id, \
buyer_id = line.split('\t')
if self.within_past_week(timestamp):
yield (category_id, product_id), quantity
(category1, product1), 2
(category2, product1), 3
(category1, product2), 3
(category2, product3), 7
(category1, product4), 1
"""
yield key, sum(values)
def steps(self):
"""Run the map and reduce steps."""
return [
self.mr(mapper=self.mapper,
reducer=self.reducer),
self.mr(mapper=self.mapper_sort,
reducer=self.reducer_identity),
]
The result would be the following sorted list, which we could insert into the sales_rank table:
We’ll create an index3 on id, category_id, and product_id to speed up lookups (log-time instead of scanning the
entire table) and to keep the data in memory. Reading 1 MB sequentially from memory takes about 250 microseconds,
while reading from SSD takes 4x and from disk takes 80x longer.1
Use case: User views the past week’s most popular products by category
• The Client sends a request to the Web Server, running as a reverse proxy4
• The Web Server forwards the request to the Read API server
• The Read API server reads from the SQL Database sales_rank table
$ curl https://amazon.com/api/v1/popular?category_id=1234
Response:
{
"id": "100",
"category_id": "1234",
"total_sold": "100000",
"product_id": "50",
},
{
"id": "53",
"category_id": "1234",
"total_sold": "90000",
"product_id": "200",
},
{
"id": "75",
"category_id": "1234",
"total_sold": "80000",
"product_id": "3",
},
Important: Do not simply jump right into the final design from the initial design!
State you would 1) Benchmark/Load Test, 2) Profile for bottlenecks 3) address bottlenecks while evaluating
alternatives and trade-offs, and 4) repeat. See Design a system that scales to millions of users on AWS7 as a sample
on how to iteratively scale the initial design.
It’s important to discuss what bottlenecks you might encounter with the initial design and how you might address
each of them. For example, what issues are addressed by adding a Load Balancer with multiple Web Servers?
CDN? Master-Slave Replicas? What are the alternatives and Trade-Offs for each?
We’ll introduce some components to complete the design and to address scalability issues. Internal load balancers are
not shown to reduce clutter.
To avoid repeating discussions, refer to the following system design topics8 for main talking points, tradeoffs, and
alternatives:
• DNS9
• CDN10
• Load balancer11
• Horizontal scaling12
• Web server (reverse proxy)13
6
https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc
7
../scaling_aws/README.md
8
https://github.com/donnemartin/system-design-primer#index-of-system-design-topics
9
https://github.com/donnemartin/system-design-primer#___domain-name-system
10
https://github.com/donnemartin/system-design-primer#content-delivery-network
11
https://github.com/donnemartin/system-design-primer#load-balancer
12
https://github.com/donnemartin/system-design-primer#horizontal-scaling
13
https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server
Step 4: Scale the design
The Analytics Database could use a data warehousing solution such as Amazon Redshift or Google BigQuery.
We might only want to store a limited time period of data in the database, while storing the rest in a data warehouse
or in an Object Store. An Object Store such as Amazon S3 can comfortably handle the constraint of 40 GB of
new content per month.
To address the 40,000 average read requests per second (higher at peak), traffic for popular content (and their sales
rank) should be handled by the Memory Cache instead of the database. The Memory Cache is also useful for
handling the unevenly distributed traffic and traffic spikes. With the large volume of reads, the SQL Read Replicas
might not be able to handle the cache misses. We’ll probably need to employ additional SQL scaling patterns.
400 average writes per second (higher at peak) might be tough for a single SQL Write Master-Slave, also pointing
to a need for additional scaling techniques.
• Federation21
• Sharding22
• Denormalization23
• SQL Tuning24
Additional topics to dive into, depending on the problem scope and time remaining.
NoSQL
• Key-value store25
• Document store26
• Wide column store27
• Graph database28
• SQL vs NoSQL29
14
https://github.com/donnemartin/system-design-primer#application-layer
15
https://github.com/donnemartin/system-design-primer#cache
16
https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms
17
https://github.com/donnemartin/system-design-primer#fail-over
18
https://github.com/donnemartin/system-design-primer#master-slave-replication
19
https://github.com/donnemartin/system-design-primer#consistency-patterns
20
https://github.com/donnemartin/system-design-primer#availability-patterns
21
https://github.com/donnemartin/system-design-primer#federation
22
https://github.com/donnemartin/system-design-primer#sharding
23
https://github.com/donnemartin/system-design-primer#denormalization
24
https://github.com/donnemartin/system-design-primer#sql-tuning
25
https://github.com/donnemartin/system-design-primer#key-value-store
26
https://github.com/donnemartin/system-design-primer#document-store
27
https://github.com/donnemartin/system-design-primer#wide-column-store
28
https://github.com/donnemartin/system-design-primer#graph-database
29
https://github.com/donnemartin/system-design-primer#sql-or-nosql
Additional talking points
Caching
• Where to cache
• Client caching30
• CDN caching31
• Web server caching32
• Database caching33
• Application caching34
• What to cache
• Caching at the database query level35
• Caching at the object level36
• When to update the cache
• Cache-aside37
• Write-through38
• Write-behind (write-back)39
• Refresh ahead40
• Message queues41
• Task queues42
• Back pressure43
• Microservices44
Communications
• Discuss tradeoffs:
• External communication with clients - HTTP APIs following REST45
• Internal communications - RPC46
• Service discovery47
Security
Latency numbers
Ongoing
• Continue benchmarking and monitoring your system to address bottlenecks as they come up
• Scaling is an iterative process
49
https://github.com/donnemartin/system-design-primer#latency-numbers-every-programmer-should-know
Design the Twitter timeline and search
Note: This document links directly to relevant areas found in the system design topics1 to avoid duplication. Refer to
the linked content for general talking points, tradeoffs, and alternatives.
Design the Facebook feed and Design Facebook search are similar questions.
Gather requirements and scope the problem. Ask questions to clarify use cases and constraints. Discuss
assumptions.
Without an interviewer to address clarifying questions, we’ll define some use cases and constraints.
Use cases
We’ll scope the problem to handle only the following use cases
Out of scope
State assumptions
General
Timeline
Search
Calculate usage
Clarify with your interviewer if you should run back-of-the-envelope usage calculations.
Figure 1: High level design of the Twitter timeline and search (or Facebook feed and search)
Design the Twitter timeline and search
We could store the user’s own tweets to populate the user timeline (activity from the user) in a relational database2 .
We should discuss the use cases and tradeoffs between choosing SQL or NoSQL3 .
Delivering tweets and building the home timeline (activity from people the user is following) is trickier. Fanning
out tweets to all followers (60 thousand tweets delivered on fanout per second) will overload a traditional relational
database4 . We’ll probably want to choose a data store with fast writes such as a NoSQL database or Memory
Cache. Reading 1 MB sequentially from memory takes about 250 microseconds, while reading from SSD takes 4x
and from disk takes 80x longer.1
• The Client posts a tweet to the Web Server, running as a reverse proxy5
• The Web Server forwards the request to the Write API server
• The Write API stores the tweet in the user’s timeline on a SQL database
• The Write API contacts the Fan Out Service, which does the following:
• Queries the User Graph Service to find the user’s followers stored in the Memory Cache
• Stores the tweet in the home timeline of the user’s followers in a Memory Cache
• O(n) operation: 1,000 followers = 1,000 lookups and inserts
• Stores the tweet in the Search Index Service to enable fast searching
• Stores media in the Object Store
• Uses the Notification Service to send out push notifications to followers:
• Uses a Queue (not pictured) to asynchronously send out notifications
Clarify with your interviewer how much code you are expected to write.
If our Memory Cache is Redis, we could use a native Redis list with the following structure:
The new tweet would be placed in the Memory Cache, which populates the user’s home timeline (activity from
people the user is following).
Response:
2
https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms
3
https://github.com/donnemartin/system-design-primer#sql-or-nosql
4
https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms
5
https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server
6
https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest
Step 3: Design core components
{
"created_at": "Wed Sep 05 00:37:15 +0000 2012",
"status": "hello world!",
"tweet_id": "987",
"user_id": "123",
...
}
• Gets the timeline data stored in the Memory Cache, containing tweet ids and user ids - O(1)
• Queries the Tweet Info Service with a multiget8 to obtain additional info about the tweet ids - O(n)
• Queries the User Info Service with a multiget to obtain additional info about the user ids - O(n)
REST API:
$ curl https://twitter.com/api/v1/home_timeline?user_id=123
Response:
{
"user_id": "456",
"tweet_id": "123",
"status": "foo"
},
{
"user_id": "789",
"tweet_id": "456",
"status": "bar"
},
{
"user_id": "789",
"tweet_id": "579",
"status": "baz"
},
The REST API would be similar to the home timeline, except all tweets would come from the user as opposed to the
people the user is following.
7
https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc
8
http://redis.io/commands/mget
Design the Twitter timeline and search
REST API:
$ curl https://twitter.com/api/v1/search?query=hello+world
The response would be similar to that of the home timeline, except for tweets matching the given query.
Important: Do not simply jump right into the final design from the initial design!
State you would 1) Benchmark/Load Test, 2) Profile for bottlenecks 3) address bottlenecks while evaluating
alternatives and trade-offs, and 4) repeat. See Design a system that scales to millions of users on AWS11 as a sample
on how to iteratively scale the initial design.
It’s important to discuss what bottlenecks you might encounter with the initial design and how you might address
each of them. For example, what issues are addressed by adding a Load Balancer with multiple Web Servers?
CDN? Master-Slave Replicas? What are the alternatives and Trade-Offs for each?
We’ll introduce some components to complete the design and to address scalability issues. Internal load balancers are
not shown to reduce clutter.
To avoid repeating discussions, refer to the following system design topics12 for main talking points, tradeoffs, and
alternatives:
• DNS13
• CDN14
• Load balancer15
• Horizontal scaling16
• Web server (reverse proxy)17
• API server (application layer)18
9
https://lucene.apache.org/
10
https://github.com/donnemartin/system-design-primer#under-development
11
../scaling_aws/README.md
12
https://github.com/donnemartin/system-design-primer#index-of-system-design-topics
13
https://github.com/donnemartin/system-design-primer#___domain-name-system
14
https://github.com/donnemartin/system-design-primer#content-delivery-network
15
https://github.com/donnemartin/system-design-primer#load-balancer
16
https://github.com/donnemartin/system-design-primer#horizontal-scaling
17
https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server
18
https://github.com/donnemartin/system-design-primer#application-layer
Step 4: Scale the design
Figure 2: Scaled design of the Twitter timeline and search (or Facebook feed and search)
Design the Twitter timeline and search
• Cache19
• Relational database management system (RDBMS)20
• SQL write master-slave failover21
• Master-slave replication22
• Consistency patterns23
• Availability patterns24
The Fanout Service is a potential bottleneck. Twitter users with millions of followers could take several minutes to
have their tweets go through the fanout process. This could lead to race conditions with @replies to the tweet, which
we could mitigate by re-ordering the tweets at serve time.
We could also avoid fanning out tweets from highly-followed users. Instead, we could search to find tweets for highly-
followed users, merge the search results with the user’s home timeline results, then re-order the tweets at serve time.
• Keep only several hundred tweets for each home timeline in the Memory Cache
• Keep only active users’ home timeline info in the Memory Cache
• If a user was not previously active in the past 30 days, we could rebuild the timeline from the SQL
Database
• Query the User Graph Service to determine who the user is following
• Get the tweets from the SQL Database and add them to the Memory Cache
We’ll also want to address the bottleneck with the SQL Database.
Although the Memory Cache should reduce the load on the database, it is unlikely the SQL Read Replicas alone
would be enough to handle the cache misses. We’ll probably need to employ additional SQL scaling patterns.
The high volume of writes would overwhelm a single SQL Write Master-Slave, also pointing to a need for additional
scaling techniques.
• Federation25
• Sharding26
• Denormalization27
• SQL Tuning28
Additional topics to dive into, depending on the problem scope and time remaining.
19
https://github.com/donnemartin/system-design-primer#cache
20
https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms
21
https://github.com/donnemartin/system-design-primer#fail-over
22
https://github.com/donnemartin/system-design-primer#master-slave-replication
23
https://github.com/donnemartin/system-design-primer#consistency-patterns
24
https://github.com/donnemartin/system-design-primer#availability-patterns
25
https://github.com/donnemartin/system-design-primer#federation
26
https://github.com/donnemartin/system-design-primer#sharding
27
https://github.com/donnemartin/system-design-primer#denormalization
28
https://github.com/donnemartin/system-design-primer#sql-tuning
Additional talking points
NoSQL
• Key-value store29
• Document store30
• Wide column store31
• Graph database32
• SQL vs NoSQL33
Caching
• Where to cache
• Client caching34
• CDN caching35
• Web server caching36
• Database caching37
• Application caching38
• What to cache
• Cache-aside41
• Write-through42
• Write-behind (write-back)43
• Refresh ahead44
• Message queues45
• Task queues46
• Back pressure47
• Microservices48
29
https://github.com/donnemartin/system-design-primer#key-value-store
30
https://github.com/donnemartin/system-design-primer#document-store
31
https://github.com/donnemartin/system-design-primer#wide-column-store
32
https://github.com/donnemartin/system-design-primer#graph-database
33
https://github.com/donnemartin/system-design-primer#sql-or-nosql
34
https://github.com/donnemartin/system-design-primer#client-caching
35
https://github.com/donnemartin/system-design-primer#cdn-caching
36
https://github.com/donnemartin/system-design-primer#web-server-caching
37
https://github.com/donnemartin/system-design-primer#database-caching
38
https://github.com/donnemartin/system-design-primer#application-caching
39
https://github.com/donnemartin/system-design-primer#caching-at-the-database-query-level
40
https://github.com/donnemartin/system-design-primer#caching-at-the-object-level
41
https://github.com/donnemartin/system-design-primer#cache-aside
42
https://github.com/donnemartin/system-design-primer#write-through
43
https://github.com/donnemartin/system-design-primer#write-behind-write-back
44
https://github.com/donnemartin/system-design-primer#refresh-ahead
45
https://github.com/donnemartin/system-design-primer#message-queues
46
https://github.com/donnemartin/system-design-primer#task-queues
47
https://github.com/donnemartin/system-design-primer#back-pressure
48
https://github.com/donnemartin/system-design-primer#microservices
Design the Twitter timeline and search
Communications
• Discuss tradeoffs:
• External communication with clients - HTTP APIs following REST49
• Internal communications - RPC50
• Service discovery51
Security
Latency numbers
Ongoing
• Continue benchmarking and monitoring your system to address bottlenecks as they come up
• Scaling is an iterative process
49
https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest
50
https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc
51
https://github.com/donnemartin/system-design-primer#service-discovery
52
https://github.com/donnemartin/system-design-primer#security
53
https://github.com/donnemartin/system-design-primer#latency-numbers-every-programmer-should-know
Design Mint.com
Note: This document links directly to relevant areas found in the system design topics1 to avoid duplication. Refer to
the linked content for general talking points, tradeoffs, and alternatives.
Gather requirements and scope the problem. Ask questions to clarify use cases and constraints. Discuss
assumptions.
Without an interviewer to address clarifying questions, we’ll define some use cases and constraints.
Use cases
We’ll scope the problem to handle only the following use cases
Out of scope
State assumptions
• Housing = $1,000
• Food = $200
• Gas = $100
• Sellers are used to determine transaction category
• 50,000 sellers
• Write-heavy, users make transactions daily, but few visit the site daily
Calculate usage
Clarify with your interviewer if you should run back-of-the-envelope usage calculations.
• user_id - 8 bytes
• created_at - 5 bytes
• seller - 32 bytes
• amount - 5 bytes
• Total: ~50 bytes
We could store info on the 10 million users in a relational database2 . We should discuss the use cases and tradeoffs
between choosing SQL or NoSQL3 .
• The Client sends a request to the Web Server, running as a reverse proxy4
• The Web Server forwards the request to the Accounts API server
• The Accounts API server updates the SQL Database accounts table with the newly entered account info
Clarify with your interviewer how much code you are expected to write.
The accounts table could have the following structure:
We’ll create an index5 on id, user_id, and created_at to speed up lookups (log-time instead of scanning the entire
table) and to keep the data in memory. Reading 1 MB sequentially from memory takes about 250 microseconds, while
reading from SSD takes 4x and from disk takes 80x longer.1
We’ll use a public REST API6 :
Data flow:
• Extracting transactions could take awhile, we’d probably want to do this asynchronously with a queue10 ,
although this introduces additional complexity
• The Transaction Extraction Service does the following:
• Pulls from the Queue and extracts transactions for the given account from the financial institution, storing
the results as raw log files in the Object Store
• Uses the Category Service to categorize each transaction
• Uses the Budget Service to calculate aggregate monthly spending by category
• The Budget Service uses the Notification Service to let users know if they are nearing or have
exceeded their budget
• Updates the SQL Database transactions table with categorized transactions
• Updates the SQL Database monthly_spending table with aggregate monthly spending by category
• Notifies the user the transactions have completed through the Notification Service:
• Uses a Queue (not pictured) to asynchronously send out notifications
Category service
For the Category Service, we can seed a seller-to-category dictionary with the most popular sellers. If we estimate
50,000 sellers and estimate each entry to take less than 255 bytes, the dictionary would only take about 12 MB of
memory.
Clarify with your interviewer how much code you are expected to write.
class DefaultCategories(Enum):
HOUSING = 0
FOOD = 1
GAS = 2
10
https://github.com/donnemartin/system-design-primer#asynchronism
11
https://github.com/donnemartin/system-design-primer#use-good-indices
12
https://github.com/donnemartin/system-design-primer#use-good-indices
Design Mint.com
SHOPPING = 3
...
seller_category_map = {}
seller_category_map['Exxon'] = DefaultCategories.GAS
seller_category_map['Target'] = DefaultCategories.SHOPPING
...
For sellers not initially seeded in the map, we could use a crowdsourcing effort by evaluating the manual category
overrides our users provide. We could use a heap to quickly lookup the top manual override per seller in O(1) time.
class Categorizer(object):
Transaction implementation:
class Transaction(object):
To start, we could use a generic budget template that allocates category amounts based on income tiers. Using this
approach, we would not have to store the 100 million budget items identified in the constraints, only those that the user
overrides. If a user overrides a budget category, which we could store the override in the TABLE budget_overrides.
class Budget(object):
def create_budget_template(self):
return {
DefaultCategories.HOUSING: self.income * .4,
DefaultCategories.FOOD: self.income * .2,
DefaultCategories.GAS: self.income * .1,
DefaultCategories.SHOPPING: self.income * .2,
...
}
Step 3: Design core components
For the Budget Service, we can potentially run SQL queries on the transactions table to generate the
monthly_spending aggregate table. The monthly_spending table would likely have much fewer rows than the total
5 billion transactions, since users typically have many transactions per month.
As an alternative, we can run MapReduce jobs on the raw transaction files to:
Running analyses on the transaction files could significantly reduce the load on the database.
We could call the Budget Service to re-run the analysis if the user updates a category.
Clarify with your interviewer how much code you are expected to write.
Sample log file format, tab delimited:
MapReduce implementation:
class SpendingByCategory(MRJob):
def calc_current_year_month(self):
"""Return the current year and month."""
...
period = self.extract_year_month(timestamp)
if period == self.current_year_month:
yield (user_id, period, category), amount
Important: Do not simply jump right into the final design from the initial design!
State you would 1) Benchmark/Load Test, 2) Profile for bottlenecks 3) address bottlenecks while evaluating
alternatives and trade-offs, and 4) repeat. See Design a system that scales to millions of users on AWS13 as a sample
on how to iteratively scale the initial design.
It’s important to discuss what bottlenecks you might encounter with the initial design and how you might address
each of them. For example, what issues are addressed by adding a Load Balancer with multiple Web Servers?
CDN? Master-Slave Replicas? What are the alternatives and Trade-Offs for each?
We’ll introduce some components to complete the design and to address scalability issues. Internal load balancers are
not shown to reduce clutter.
To avoid repeating discussions, refer to the following system design topics14 for main talking points, tradeoffs, and
alternatives:
• DNS15
• CDN16
• Load balancer17
• Horizontal scaling18
• Web server (reverse proxy)19
• API server (application layer)20
• Cache21
• Relational database management system (RDBMS)22
• SQL write master-slave failover23
• Master-slave replication24
• Asynchronism25
• Consistency patterns26
13
../scaling_aws/README.md
14
https://github.com/donnemartin/system-design-primer#index-of-system-design-topics
15
https://github.com/donnemartin/system-design-primer#___domain-name-system
16
https://github.com/donnemartin/system-design-primer#content-delivery-network
17
https://github.com/donnemartin/system-design-primer#load-balancer
18
https://github.com/donnemartin/system-design-primer#horizontal-scaling
19
https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server
20
https://github.com/donnemartin/system-design-primer#application-layer
21
https://github.com/donnemartin/system-design-primer#cache
22
https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms
23
https://github.com/donnemartin/system-design-primer#fail-over
24
https://github.com/donnemartin/system-design-primer#master-slave-replication
25
https://github.com/donnemartin/system-design-primer#asynchronism
26
https://github.com/donnemartin/system-design-primer#consistency-patterns
Step 4: Scale the design
• Availability patterns27
We’ll add an additional use case: User accesses summaries and transactions.
User sessions, aggregate stats by category, and recent transactions could be placed in a Memory Cache such as Redis
or Memcached.
• Static content can be served from the Object Store such as S3, which is cached on the CDN
Refer to When to update the cache28 for tradeoffs and alternatives. The approach above describes cache-aside29 .
Instead of keeping the monthly_spending aggregate table in the SQL Database, we could create a separate Analytics
Database using a data warehousing solution such as Amazon Redshift or Google BigQuery.
We might only want to store a month of transactions data in the database, while storing the rest in a data warehouse
or in an Object Store. An Object Store such as Amazon S3 can comfortably handle the constraint of 250 GB of
new content per month.
To address the 200 average read requests per second (higher at peak), traffic for popular content should be handled
by the Memory Cache instead of the database. The Memory Cache is also useful for handling the unevenly
distributed traffic and traffic spikes. The SQL Read Replicas should be able to handle the cache misses, as long as
the replicas are not bogged down with replicating writes.
2,000 average transaction writes per second (higher at peak) might be tough for a single SQL Write Master-Slave.
We might need to employ additional SQL scaling patterns:
• Federation30
• Sharding31
• Denormalization32
• SQL Tuning33
Additional topics to dive into, depending on the problem scope and time remaining.
27
https://github.com/donnemartin/system-design-primer#availability-patterns
28
https://github.com/donnemartin/system-design-primer#when-to-update-the-cache
29
https://github.com/donnemartin/system-design-primer#cache-aside
30
https://github.com/donnemartin/system-design-primer#federation
31
https://github.com/donnemartin/system-design-primer#sharding
32
https://github.com/donnemartin/system-design-primer#denormalization
33
https://github.com/donnemartin/system-design-primer#sql-tuning
Additional talking points
NoSQL
• Key-value store34
• Document store35
• Wide column store36
• Graph database37
• SQL vs NoSQL38
Caching
• Where to cache
• Client caching39
• CDN caching40
• Web server caching41
• Database caching42
• Application caching43
• What to cache
• Cache-aside46
• Write-through47
• Write-behind (write-back)48
• Refresh ahead49
• Message queues50
• Task queues51
• Back pressure52
• Microservices53
34
https://github.com/donnemartin/system-design-primer#key-value-store
35
https://github.com/donnemartin/system-design-primer#document-store
36
https://github.com/donnemartin/system-design-primer#wide-column-store
37
https://github.com/donnemartin/system-design-primer#graph-database
38
https://github.com/donnemartin/system-design-primer#sql-or-nosql
39
https://github.com/donnemartin/system-design-primer#client-caching
40
https://github.com/donnemartin/system-design-primer#cdn-caching
41
https://github.com/donnemartin/system-design-primer#web-server-caching
42
https://github.com/donnemartin/system-design-primer#database-caching
43
https://github.com/donnemartin/system-design-primer#application-caching
44
https://github.com/donnemartin/system-design-primer#caching-at-the-database-query-level
45
https://github.com/donnemartin/system-design-primer#caching-at-the-object-level
46
https://github.com/donnemartin/system-design-primer#cache-aside
47
https://github.com/donnemartin/system-design-primer#write-through
48
https://github.com/donnemartin/system-design-primer#write-behind-write-back
49
https://github.com/donnemartin/system-design-primer#refresh-ahead
50
https://github.com/donnemartin/system-design-primer#message-queues
51
https://github.com/donnemartin/system-design-primer#task-queues
52
https://github.com/donnemartin/system-design-primer#back-pressure
53
https://github.com/donnemartin/system-design-primer#microservices
Design Mint.com
Communications
• Discuss tradeoffs:
• External communication with clients - HTTP APIs following REST54
• Internal communications - RPC55
• Service discovery56
Security
Latency numbers
Ongoing
• Continue benchmarking and monitoring your system to address bottlenecks as they come up
• Scaling is an iterative process
54
https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest
55
https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc
56
https://github.com/donnemartin/system-design-primer#service-discovery
57
https://github.com/donnemartin/system-design-primer#security
58
https://github.com/donnemartin/system-design-primer#latency-numbers-every-programmer-should-know
Design a key-value cache to save the results of the most
recent web server queries
Note: This document links directly to relevant areas found in the system design topics1 to avoid duplication. Refer to
the linked content for general talking points, tradeoffs, and alternatives.
Gather requirements and scope the problem. Ask questions to clarify use cases and constraints. Discuss
assumptions.
Without an interviewer to address clarifying questions, we’ll define some use cases and constraints.
Use cases
We’ll scope the problem to handle only the following use cases
State assumptions
• 10 million users
• 10 billion queries per month
1
https://github.com/donnemartin/system-design-primer#index-of-system-design-topics
Design a key-value cache to save the results of the most recent web server queries
Calculate usage
Clarify with your interviewer if you should run back-of-the-envelope usage calculations.
Popular queries can be served from a Memory Cache such as Redis or Memcached to reduce read latency and to
avoid overloading the Reverse Index Service and Document Service. Reading 1 MB sequentially from memory
takes about 250 microseconds, while reading from SSD takes 4x and from disk takes 80x longer.1
Since the cache has limited capacity, we’ll use a least recently used (LRU) approach to expire older entries.
• The Client sends a request to the Web Server, running as a reverse proxy2
• The Web Server forwards the request to the Query API server
• The Query API server does the following:
• Parses the query
• Removes markup
• Breaks up the text into terms
• Fixes typos
• Normalizes capitalization
• Converts the query to use boolean operations
• Checks the Memory Cache for the content matching the query
• If there’s a hit in the Memory Cache, the Memory Cache does the following:
• Updates the cached entry’s position to the front of the LRU list
• Returns the cached contents
2
https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server
Step 3: Design core components
Figure 1: High level of a key-value cache to save the results of the most recent web server queries
Design a key-value cache to save the results of the most recent web server queries
Cache implementation
The cache can use a doubly-linked list: new items will be added to the head while items to expire will be removed
from the tail. We’ll use a hash table for fast lookups to each linked list node.
Clarify with your interviewer how much code you are expected to write.
Query API Server implementation:
class QueryApi(object):
Node implementation:
class Node(object):
LinkedList implementation:
class LinkedList(object):
def __init__(self):
self.head = None
self.tail = None
def remove_from_tail(self):
...
Cache implementation:
class Cache(object):
Accessing a node updates its position to the front of the LRU list.
"""
node = self.lookup[query]
if node is None:
return None
self.linked_list.move_to_front(node)
return node.results
When updating an entry, updates its position to the front of the LRU list.
If the entry is new and the cache is at capacity, removes the oldest entry
before the new entry is added.
"""
node = self.lookup[query]
if node is not None:
# Key exists in cache, update the value
node.results = results
self.linked_list.move_to_front(node)
else:
# Key does not exist in cache
if self.size == self.MAX_SIZE:
# Remove the oldest entry from the linked list and lookup
self.lookup.pop(self.linked_list.tail.query, None)
self.linked_list.remove_from_tail()
else:
self.size += 1
# Add the new key and value
new_node = Node(query, results)
self.linked_list.append_to_front(new_node)
self.lookup[query] = new_node
The most straightforward way to handle these cases is to simply set a max time that a cached entry can stay in the
cache before it is updated, usually referred to as time to live (TTL).
Refer to When to update the cache3 for tradeoffs and alternatives. The approach above describes cache-aside4 .
Important: Do not simply jump right into the final design from the initial design!
3
https://github.com/donnemartin/system-design-primer#when-to-update-the-cache
4
https://github.com/donnemartin/system-design-primer#cache-aside
Additional talking points
State you would 1) Benchmark/Load Test, 2) Profile for bottlenecks 3) address bottlenecks while evaluating
alternatives and trade-offs, and 4) repeat. See Design a system that scales to millions of users on AWS5 as a sample
on how to iteratively scale the initial design.
It’s important to discuss what bottlenecks you might encounter with the initial design and how you might address
each of them. For example, what issues are addressed by adding a Load Balancer with multiple Web Servers?
CDN? Master-Slave Replicas? What are the alternatives and Trade-Offs for each?
We’ll introduce some components to complete the design and to address scalability issues. Internal load balancers are
not shown to reduce clutter.
To avoid repeating discussions, refer to the following system design topics6 for main talking points, tradeoffs, and
alternatives:
• DNS7
• Load balancer8
• Horizontal scaling9
• Web server (reverse proxy)10
• API server (application layer)11
• Cache12
• Consistency patterns13
• Availability patterns14
To handle the heavy request load and the large amount of memory needed, we’ll scale horizontally. We have three
main options on how to store the data on our Memory Cache cluster:
• Each machine in the cache cluster has its own cache - Simple, although it will likely result in a low cache
hit rate.
• Each machine in the cache cluster has a copy of the cache - Simple, although it is an inefficient use of
memory.
• The cache is sharded15 across all machines in the cache cluster - More complex, although it is likely the
best option. We could use hashing to determine which machine could have the cached results of a query using
machine = hash(query). We’ll likely want to use consistent hashing16 .
Additional topics to dive into, depending on the problem scope and time remaining.
5
../scaling_aws/README.md
6
https://github.com/donnemartin/system-design-primer#index-of-system-design-topics
7
https://github.com/donnemartin/system-design-primer#___domain-name-system
8
https://github.com/donnemartin/system-design-primer#load-balancer
9
https://github.com/donnemartin/system-design-primer#horizontal-scaling
10
https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server
11
https://github.com/donnemartin/system-design-primer#application-layer
12
https://github.com/donnemartin/system-design-primer#cache
13
https://github.com/donnemartin/system-design-primer#consistency-patterns
14
https://github.com/donnemartin/system-design-primer#availability-patterns
15
https://github.com/donnemartin/system-design-primer#sharding
16
https://github.com/donnemartin/system-design-primer#under-development
Design a key-value cache to save the results of the most recent web server queries
• Read replicas17
• Federation18
• Sharding19
• Denormalization20
• SQL Tuning21
NoSQL
• Key-value store22
• Document store23
• Wide column store24
• Graph database25
• SQL vs NoSQL26
Caching
• Where to cache
• Client caching27
• CDN caching28
• Web server caching29
• Database caching30
• Application caching31
• What to cache
• Cache-aside34
• Write-through35
• Write-behind (write-back)36
• Refresh ahead37
17
https://github.com/donnemartin/system-design-primer#master-slave-replication
18
https://github.com/donnemartin/system-design-primer#federation
19
https://github.com/donnemartin/system-design-primer#sharding
20
https://github.com/donnemartin/system-design-primer#denormalization
21
https://github.com/donnemartin/system-design-primer#sql-tuning
22
https://github.com/donnemartin/system-design-primer#key-value-store
23
https://github.com/donnemartin/system-design-primer#document-store
24
https://github.com/donnemartin/system-design-primer#wide-column-store
25
https://github.com/donnemartin/system-design-primer#graph-database
26
https://github.com/donnemartin/system-design-primer#sql-or-nosql
27
https://github.com/donnemartin/system-design-primer#client-caching
28
https://github.com/donnemartin/system-design-primer#cdn-caching
29
https://github.com/donnemartin/system-design-primer#web-server-caching
30
https://github.com/donnemartin/system-design-primer#database-caching
31
https://github.com/donnemartin/system-design-primer#application-caching
32
https://github.com/donnemartin/system-design-primer#caching-at-the-database-query-level
33
https://github.com/donnemartin/system-design-primer#caching-at-the-object-level
34
https://github.com/donnemartin/system-design-primer#cache-aside
35
https://github.com/donnemartin/system-design-primer#write-through
36
https://github.com/donnemartin/system-design-primer#write-behind-write-back
37
https://github.com/donnemartin/system-design-primer#refresh-ahead
Additional talking points
• Message queues38
• Task queues39
• Back pressure40
• Microservices41
Communications
• Discuss tradeoffs:
• External communication with clients - HTTP APIs following REST42
• Internal communications - RPC43
• Service discovery44
Security
Latency numbers
Ongoing
• Continue benchmarking and monitoring your system to address bottlenecks as they come up
• Scaling is an iterative process
38
https://github.com/donnemartin/system-design-primer#message-queues
39
https://github.com/donnemartin/system-design-primer#task-queues
40
https://github.com/donnemartin/system-design-primer#back-pressure
41
https://github.com/donnemartin/system-design-primer#microservices
42
https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest
43
https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc
44
https://github.com/donnemartin/system-design-primer#service-discovery
45
https://github.com/donnemartin/system-design-primer#security
46
https://github.com/donnemartin/system-design-primer#latency-numbers-every-programmer-should-know