Ask HN: How do I handle a million hits a minute?

lallysingh · on Nov 12, 2008

First, relax.

Look, you've got ~4 GHz processors available, and you're worried about a million hits a minute. That's nearly 240k clock cycles per hit.

Next, you've got 60 +/- 10ms of latency between you and your clients (unless there are details here you aren't covering, but no matter). You don't really have to respond to users terribly fast, as lots of your responsiveness would be lost over jitter/lag.

A single dell box could handle your load, if you wrote the whole thing in C on a modern quad-core desktop.

Ignore all the standard Apache/SQL/networking stuff until later until you know what you need. Get the core requirements up first, then add on the load of the side-stuff secondly.

E.g. doing all the heavy lifting in a private set of cpu cores and then providing a shared-mem plugin to apache may be enough for you. Save a CPU core to snapshot your data.

So, for advice:

1. Ignore database bullshit. You don't need it, it won't help. If you want a DB for other purposes, fine. A snapshot process writing to your DB is fine, just don't put it in the critical path.

2. Build a load simulator. A raw mode that just sends over the handful of bytes, and a cooked mode that bothers to printf' a GET request.

3. Start with a reasonable prototype, and work your way to something performant. Hell, you can probably do it in java if you don't mind buying a few more CPU cores.

4. Integrate as you need for the rest of your requirements. For example, have another box serve the rest of your webapp, and dedicate a stripped down apache box with a custom module for this stuff.

In essence, I'm telling you to treat it as a very smallish HPC problem, instead of some sort of nightmare webapp problem. It fits better, and suddenly you have lots of people/knowledge/COTS equipment available to you.

mleonhard · on Nov 13, 2008

There's no need to build your own load simulator when good ones already exist:

http://tsung.erlang-projects.org/

owkaye · on Nov 13, 2008

Thanks mleonhard, I've created a new thread to continue this discussion since this one has become way too disorganized, so please post future replies here:

http://news.ycombinator.com/item?id=362810

lallysingh · on Nov 12, 2008

Also, if you're doing something game-ish, talk to me privately. Scalability of video games is my phd topic...

swombat · on Nov 12, 2008

Without much knowledge of the specifics it's hard to come up with a plausible solution, but here are some ideas:

* Push off as much of the processing as possible onto the client. Ideally, the browser has a javascript library that deals with all the rendering/updating/etc, and just passes stuff to the server as short commands for which it receives an affirmative or negative response, rather than rendering an entire page for each requrest. This might drastically reduce your cpu usage, depending on what else you do in that request.

* How live does the data have to be? Can it be a few seconds stale? If so, and if the data is similar for each user, cache the hell out of it as a bunch of static files served by nginx and the like, rather than dynamically serving any page at all.

* Is it possible to do any sharding so that each bunch of users gets its own db server? It's easier to serve 10x 100k hits a minute than 1x 1m hits a minute.

Without more specifics, it's hard to provide more specific advice though.

jwilliams · on Nov 12, 2008

Theoretically possible I guess. The current TPCC holder is 6million transactions per minute (mixed workload). But, that was on some, er, serious hardware: http://www.tpc.org/tpcc/results/tpcc_result_detail.asp?id=10...

I've personally seen 1000+ tps systems (60,000 tpm), but these were all big iron environments... I've seen commodity based (e.g. x86/Linux) environments up to about half that.

However - all of that is just guesswork - A lot of how your would go about this would depend on your data. Are there natural partitions in the data? (time, user, geography, category?).

And even if your page is tiny - say 10kb - That's, er - something like 1Gbps sustained (my math might be a bit off)? That's a lot of bandwidth.

davidw · on Nov 12, 2008

That's a lot of traffic. To handle that dynamically would probably cost you a pretty penny, even with really fast code (Apache/nginx/whatever modules in C, for instance). What are you trying to do, exactly?

My laptop can pump out a static page at about 4000 requests per second. A million requests per minute is on the order of 16,000 requests per second, so you'd probably need multiple servers running really fast code. Since you don't explain what you want to do with the data, it's hard to really say anything meaningful about how you want to handle it dynamically.

Do you really need to start off handling that much traffic?

spc476 · on Nov 13, 2008

Most of the responses here focused on the disk/data aspect. What about the actual network? IPv4/TCP has a minimum overhead of 40 bytes per packet, with a seven packet overhead for the connection itself (three for making a connection, four for closing the connection). Adding another packet for the actual data (with say, a payload of only 20 bytes) and that's about 220 incoming bytes across 5 packets (two from the client for the connection, one for the data, and two for closing the connection---I'm concentrating on the incoming side as outgoing should be similar). Multiply 220 bytes by 10 to get a rough estimate of bits per second (close enough and makes the math a bit easier and if we're a bit high here, that's better than being too low) times the number of connections per second (16,667) gives us around 37Mbps---a T3 basically.

Now, to handle a potential 100,000 users at the same time, that's 6 T3s (12 if you want to handle 200,000), and that's with only 20 bytes of actual data being shipped between client and server. Just the networking side is going to be expensive, so I hope the idea is a good one.

One the data storage side, I worked on a project (http://www.x-grey.com) where I had to potentially store a bunch of records. I ended up storing them all in memory in an array. When a new record came in, I would insert it into the proper spot. In theory, I was able to handle about 6,000 requests per second on a 2.6GHz machine (although I found it rather difficult to actually test it properly, which is why I say in theory---in practice it may be more or less, but for what I was doing, that was way more than good enough). So yes, I would recommend keeping everything in memory and avoid the disk as much as possible. But then you have issues of coherency if you have multiple threads/machines that need to update this structure (in my case, that 6,000 is ONE CPU doing the work; I sidestepped the whole concurrency issue by ignoring it).

spc476 · on Nov 13, 2008

About my network math---100,000 users per minute equates down to 1,667 requests per second, so are you sure you need 1,000,000+ hits per minute? Or are those 100,000 users really sending down 10 hits per minute?

mleonhard · on Nov 13, 2008

Consider using a fleet of front-end machines to accept TCP connections from clients. These machines can parse the HTTP envelope, forward the request to the database server, and generate the HTTP response.

There's a lot of useful advice on this topic in "On Designing and Deploying Internet-Scale Services" http://www.mvdirona.com/jrh/talksAndPapers/JamesRH_Lisa.pdf

owkaye · on Nov 12, 2008

Here are my answers / comments to some of the recent posts (more replies to come). Thanks again to everyone for your help and please keep your suggestions coming:

to swombat: The clients will have javascript so short commands rather than entire pages can be passed. Thanks for this suggestion, implementing this alone may totally eliminate my web server issues.

to jwilliams: You said even if my page is tiny that's still 1Gbps sustained, so by implementing swombat's javascript solution I can (hopefully) pass only data, thereby avoiding the potential bandwidth issue.

to swombat: The data cannot be stale, the response MUST include the results of the database search that is performed immediately after the visitor's last post is appended to the database.

to andr and swombat: I don't know what 'sharding' or 'microsharding' is but if the additional details I've recently posted is any help, maybe one or both of you can tell me if you think it is something that's possible, practical and/or appropriate to my needs?

to mseebach: Yes, every single write will affect every single subsequent read in a non-trivial manner. You said the hardware will cost a lot and I'm sure that's true, but I expect the money will become available if the system I've conceived can be built to function as I'm trying to describe it here.

to EVERYONE:

Given the information you all have provided prior to the time I wrote this post, I have further refined my approach to this problem as follows. Please tell me if this is the best approach in your opinions:

1- I will use JavaScript to eliminate page loads and pass only the required 5-10 characters of data from the visitor to the server each time. This should reduce or eliminate the need for multiple web servers and dramatically reduce bandwidth requirements.

2- I will find someone who knows C to write an app that receives the data from each browser, pass it to the database with an append command, pass another command to search/sort the database table, retrieve the results, and pass them back to the JavaScript in the visitor's browser.

If this appears to be the best approach so far, I still need to solve the database issue. If you have any new comments or suggestions that might help me find the database hardware or software I need, please post them here, thanks.

bd · on Nov 12, 2008

It seems you would be better off using some custom in-memory data structure rather than classical ready-made database solution.

You do not want to sort the whole dataset after each new piece of data comes. You want to insert each new entry right into its place (probably via some hashing).

If, of course, it is possible. Too little details about what you want to do with the aggregate data.

lacker · on Nov 12, 2008

Before you start writing code and making architecture decisions, you need to do some sanity testing. Can you even run 15,000 SQL inserts per second of the format you want on a single machine? If not, you may need to abandon using a database entirely. You can test this with a simple script and the results might invalidate your whole plan.

Also, I'm betting you don't actually need to support this kind of load to do what you want to do. In particular you can probably batch up data, since you are going to have latency issues anyways. You need a specific target for the latency you need, and then do some measurement figuring out how much batching you can do.

Why don't you explain your idea in more detail? The community can probably be more helpful then.

igorhvr · on Nov 12, 2008

"new record is appended, a search is performed and the post field is sorted, then the relative position of the visitor's new database record is identified in the results (example 49,501 records from the top)."

It might be a good idea to try to twist things so that no sort is necessary - instead of appending and then sorting you should try to insert the data right at the spot it should be. If your constraints allow you to not use a database, you could consider a data structure that inserts data in such a way that it is always sorted.

Also, as an improvement to your point 1, you should put most of the JavaScript itself as well as images, etc somewhere else - with this your bandwidth will be used only for the dynamic data (which will be more than enough).

Finally, the best tool I know for scalability are queues. Try to reorganize what you are doing to arrive at an architecture where you put things on queues that multiple consumers will process.

owkaye · on Nov 12, 2008

Yes, eliminating the sorting is a better way to approach the problem I believe. I've always just 'sorted' to determine a new record's position in the list, but it seems there may be other (more efficient) ways to determine the record's position without sorting ...

mleonhard · on Nov 13, 2008

Do all of the rankings need the same precision? Maybe you can get away with perfect accuracy for the top 1,000 rankings and then provide an approximation for anything >1,000?

You could use a list for the first 1,000 and then store the rest in a tree. Then have another process that walks the tree in-order and updates the ranking of each node.

ulysses · on Nov 12, 2008

You could cache or replicate the database for reads. When a post comes in, insert it into the master database, then sleep for a preset amount of time, then have it start checking the local database copy to wait for it to show up, then run the query. Just use timestamp <= posters_timestamp to get the result set you want.

Adjust the sleep time based how frequent the cache updates are.

frisco · on Nov 12, 2008

you really, really don't want to be using a relational database here. hitting disks a million times per minute would be disastrous, if not for performance then for blocking IO, even with only row-level locking.

I'd look into memcached and write state to disk every once in a while to back up, kind of a reverse from what's usually done. Or, if thats really not enough, I'd look up Hadoop riding on GlusterFS. You're going to have synchrony and blocking problems no matter what, though, without really creative keying (bigtable style?). Both Facebook and Google have run up against these kinds of issues and have written and/or released source about it. Facebook has OSS'd Cassandra, their distributed database, and Google hasn't released much code but has papers out on their implementations of MapReduce (open source port->Hadoop, as suggested above) and BigTable (which the Hadoop team are in the process of cloning with HBase).

jwilliams · on Nov 12, 2008

pass only the required 5-10 characters of data from the visitor to the server each time

It'll still (generally) need to be HTTP, so there will be a lot of other cruft around it. If you were to use Java or Flash you might be able to get around this.

prakash · on Nov 12, 2008

Use a hierarchy of CDN's; play around with TTL's and/or IMS headers.

ErrantX · on Nov 12, 2008

I've been looking at something similar (not on that scale admittedly) recently.

Either I would go with a generic cloud service to host on (which should scale to handle your hits) such as EC2. Butthe cost could be astronomical.

Or you could go with something like 3tera & the custom cloud solutions and make your own server system: I'd say 4 or 5 servers behind a load balancer.

However w/o knowing more about how data will be stored I cant advise on how to keep the database under control. A million database writes per min is not going to be pretty - even a million reads per min is going to stress it... give us some info on your DB architecture and the process / requirements of those hits and someone will be able to advise more I am sure :) (Edit: as a hint your probably after somethign like slave db's and memcached - have a look at how Wikipedia handles their data)

mseebach · on Nov 12, 2008

The biggest issue, from what I can read, will be how well the data is partitioned. If every single write will affect every single subsequent read in a non-trivial manner, it's a really hard problem, and you're looking at really expensive hardware.

If you're able to partition the data, it's "just" a question of throwing enough servers at the problem.

gsiener · on Nov 12, 2008

Let me guess, you'll be capturing data about where people are browsing in real time?

jyothi · on Nov 13, 2008

Something of a campaing, supporters logged in and editing placards. An online drive mimicking 125000 supporters who were present on the scene when Obama got elected.

rbanffy · on Nov 12, 2008

The on-line nature would only make sense if it were for a highly dynamic behavior-based ad-serving thingie.

So, are you serving ads based on where people go?

bd · on Nov 12, 2008

Sounds more like game or game-like application. Maybe live betting during some sport event?

owkaye · on Nov 13, 2008

it's an auction based on a gaming concept, more details here:

http://news.ycombinator.com/item?id=362810

midwestward · on Nov 13, 2008

You biggest problem by far is getting a million hits per minute. Getting a million hits at all would be a fairly rare accomplishment.

newsit · on Nov 12, 2008

As the other guys said it all depends from your situation and what you are trying to do. One thing to maybe look at is Erlang and related technologies Mnesia, Yaws etc. This is a comparison of Apache vs. Yaws under heavy load: http://www.sics.se/~joe/apachevsyaws.html

davidw · on Nov 12, 2008

That chart is pretty misleading - it's basically comparing OS threads to event-based processing, which is really likely to be a win for the event-based system. It's also several years old at this point. Erlang is neat, but to be really, really fast, he probably wants to be working in C. Who knows though... the 'specs' are awfully vague. Erlang might be an interesting choice for some kinds of apps of this kind.

newsit · on Nov 12, 2008

It is not misleading as this is explained in the comments section bellow the chart. Anyway it was just a suggestion for a possible direction to look at.

davidw · on Nov 12, 2008

Maybe not 'misleading' but dubious. I mean, if you want to compare how many threads an OS can handle before it dies vs how many connections it can handle via an event-based system, I think the results are fairly well known at this point. It's not really highlighting anything great about Erlang or Yaws, nor pointing out any particular defect of Apache.

andr · on Nov 12, 2008

I'd try to build a microsharding system. Depending on the amount of data you want to store, go for 256 or 65k virtual shards and allocate a few shards on each DB server. A hash of the key will give you its shard and it's easy to keep a shard map in memory.

I'd advise going with BDB or whatever key-value store is fastest those days instead of MySQL. And an application-server layer to take care of the microsharding.

Also, try to put most of the work on application servers (things like joins, sorting, calculations etc), because those are much easier to scale than database nodes.

If possible, use in-memory logs instead of writing to the disk during the 1 hour and write the log to disk after the rush hour is over. Consider using a RAM-only database + UPSes, if your data is not that much.

khangtoh · on Nov 12, 2008

After reading through all the other replies, I'm really interested in seeing suggestions on the search and sorting operation that needs to be done with every write.

Realizing that if the query returns a huge number of records, sorting and feteching those records will be THE bottleneck.

iigs · on Nov 12, 2008

Sorry for replying to the root, this thread is kind of a mess, with stuff scattered everywhere. Here's what I think:

What you're asking to do is very hard for anything beyond simple static page retrievals. Tiny mistakes in schema design (let alone architecture) could make or break your site.

Get expert architecture help:

If there's no way for you to start small and make mistakes early, your best bet is to find some knowledgeable people, say 3-5 that have professional or large open source scaling experience. Ask them to NDA if you must, but they won't be able to help you unless you can give them all of the details. Solicit their opinions for architecture, and try to reconcile the dissenting opinions about any given technology to see if anyone is off base. Proper architecture could take this from a seven figure hardware expenditure down to a five figure one.

Get commercial support for your hardware and software:

Once you have a rough idea how you'll do it, you'll want to engage good support from any vendors possible. When you're starting small you can just install Ubuntu, apt-get install mysql and get to work, but at millions of hits per hour you may well start to uncover unusual things involving bad hardware, hardware/kernel, or kernel/userland interaction -- likewise with any DB or other userland software. I would say expect to buy tier 1 hardware, an OS with a support contract, and support for any major software component as well. This isn't a very web 2.0 formula, but you're buying expertise that is focused on your solution.

Other things:

1) Sharding isn't a magic bullet but depending on how your data is laid out (specifically what data is not local to other data) it could totally save your bacon here. One box (or a standard three tier architecture) probably wouldn't even have a chance.

2) SSDs aren't ready yet, IMO. Due to the fact that failure is write cycle count related and not heat/time related, they're not suitable (at this time) for use in RAID systems. I've read reports of SSDs in RAIDs that die in such a tight timespan that you can't get the first one rebuilt before the second one dies.

Good luck. It sounds like a really cool project.

charlesju · on Nov 12, 2008

If your web application idea is super easy, maybe you should just throw it onto Google App Engine or something and let them handle it.

owkaye · on Nov 12, 2008

Wow, I was writing a reply to davidw's first post when lots of new posts arrived in this thread. Thanks to all of you! I haven't read any of your new posts yet but I will as soon as I submit this one:

The concept is a live interactive system with 100,000+ members online, logged in, and actively posting for about an hour at a pre-determined time of the day (or night). Each person may post 2-5 times a minute and this can add up to 1/2 million hits a minute, but I said I need a million hit per minute capacity because it is possible that 200,000 people might be participating in some of these events. This is a high-profile venture and "bogging down" would be an undesirable occurrence in the middle of an event.

Each post from a logged in member will contain a value which typically consists of no more than 5-10 characters. A cookie of 5 characters will accompany the posted data to uniquely identify the visitor. When the server receives this data it will append a new record to the database with the post value in one field, the login cookie in another, a date/time stamp in the third field, and an incremented value in the fourth field to identify the order in which each post is received. I think that's all the data that needs to be appended to the db in each new record.

(Note that this is my current concept of the database schema, if you folks know a better way to do it please tell me, thanks.)

After the new record is appended, a search is performed and the post field is sorted, then the relative position of the visitor's new database record is identified in the results (example 49,501 records from the top). This 'position' information is used by the script that dynamically generates the HTML, which is then passed off to one of the HTTP servers so it can return the page to the visitor's browser.

Since the returned data must be created dynamically based on the most recent database update, it appears that the database itself may be the limiting factor here. I can equip the database server with SSD drives to accelerate its performance but this still may not be fast enough. Plus there may be an issue with database size itself since it will grow extremely rapidly with 16,000 new records appended every second for an entire hour ...

I have not calculated the size of such a database after an hour of such appends, and I don't know whether existing SSD drives have the capacity needed to store the volume of data appended at this rate, or even if today's database systems can use multiple SSD drives if more capacity is needed. These are issues I may have to deal with if something else does not come up first that makes the whole concept look absolutely impossible.

And yes, I am absolutely going to need this capacity from the start. This is not a system I can "grow into", it must be capable of this performance from the very beginning. Assuming that such a system can be created its cost may not pose a problem since the concept will be very attractive to investors based on the huge revenues it can potentially generate. If there is no practical solution at this time I'll have to re-think the entire concept and try to come up with an alternative that works within today's server / database / page rendering limitations. I'm just hoping there's an existing solution so I don't have to change the concept because as-is it is unique and has unbelievably huge upside potential.

Retric · on Nov 12, 2008

1) Skip SSD and keep the database in RAM. 2) Your not going to sort things that fast for every new post. 3) Think cheep HW that I can add as the system grows not a single beastly system that's fine for 150,000 users but breaks at 200,000 etc.

My advice would be to give using a database for most of this and do everything you can in the web servers ram. Then spool the output to a disk / SSD array and/or a DB cluster.

Anyway, there is no way for people to read anywhere near this much data so you can just keep track of the data that's important to them in an approximate fashion. AKA if you give a data that's accurate +/- .2 seconds that's close enough.

Finally think about timestamps vs incrementing a counter so you can have each system independent of all the others and then aggregate the data when your done. (Merging 30 or so streams of timestamped data can be extremely fast.) And then add your post #.

ig1 · on Nov 12, 2008

I agree I've worked on a number of high volume systems (million+ client interactions/minute), and you don't want a conventional database. Either use a custom data-structure to keep it in memory (even if it's across multiple machines) or if you really want to use a database use one thats designed for that kind of usage (think tickerplant databases, kx, etc.)

Figure out how to partition your data/algorithms so you can split it across multiple machines, don't concentrate on maximizing the performance of one machine, but rather of the "cloud" of machines.

frisco · on Nov 12, 2008

Also, what processing is actually going on with the data? Just updating the data in storage seem like less than half the problem. You're going to need an efficient system for parallelizing whatever processing jobs need to be run.

spinonethird · on Nov 12, 2008

From your description you don't seem to really use the db, so can't you roll your own specialized data structure? If each record is 20 byte, 1/2 million per minute for one hour is about 1/2 gig, so you can keep everything in memory.

whacked_new · on Nov 12, 2008

No scaling guru here, but did this quick calculation for fun (gotta love C-x-e):

(/ (* 150000 ; users

      3.5 ; times per min

      (+ 5   ; cookie length

         7.5 ; chars per post

         )

      )

      (expt 10 6) ; bytes per MB

      )

Evals to ~6.6. Double that (for max capacity estimate) = 13.2. Unit is MB/min, which is ~0.22 KB/sec. One hour is just short of 800 MB of plain text data. A gig of DB seems reasonable (I don't know what extra overhead is associated with, say, a CHAR(15) field).

Like Rectic said, a RAM based database seems better.

PS sorry, I don't know how to format code here.

lacker · on Nov 12, 2008

This doesn't really make sense. If you have 100,000+ members online, each posting at least every 30 seconds, you are going to have way more user-provided data than you can possibly show to everyone. So why do you need to know the exact relative position of the visitor's new database record?

owkaye · on Nov 12, 2008

I'm not showing them all that user-provided data, all I'm showing them is the position of their last post relative to the position of 'another post' in the database.

The 'other post' changes from time to time based on an algorithm that takes into account all the currently existing posts. This means I must determine the position of the 'other post' each time, and then relate it to the position of the member's new post. Once I know their relative positions I can return the correct positioning data to the member.

cstejerean · on Nov 12, 2008

The details of what exactly is important for the purpose of determining the "other post" can significantly impact how you need to store the data and the performance implications of doing so. Does it depend on the time the request is made or just based on the data in the database? Can you determine approximately when that other post is likely to change or do you have to try to compute it on every request? Is the "other post" the same for all the users? For a subset of users? Is it unique for every user?

lacker · on Nov 12, 2008

This logic sounds like the important part of your application, but I don't get it yet. Why does the user care about the position of their post relative to some other post? Could it be approximate instead? Also, does it have to be accurate up to the second, or would a slower update cycle work? I feel like maybe you could make some very small changes to the algorithm, and make it much more tractable on the backend.

owkaye · on Nov 13, 2008

I've created a (hopefully) more organized thread with some details this one does not have. Please go to this thread to continue this discussion, thanks:

http://news.ycombinator.com/item?id=362810

ericb · on Nov 12, 2008

This is not a system I can "grow into", it must be capable of this performance from the very beginning.

I would suggest you load test extensively and make load testing a part of your development process from the get-go. Initially, I would load test to evaluate approaches and estimate hardware needs. As you develop, I would use load test each version and ultimately simulate a real world scenario before letting it loose. Rather than speculating, you can actually know what approaches work best, and have some guarantee the live system will work as required.

mseebach · on Nov 12, 2008

What exactly is the output?

SingAlong · on Nov 12, 2008

Process Queing would be the best way to solve your problem. If you are processing a million hits a minute then you might need more than one comp/server. So if you are using databases consider using Facebook's Cassandra project(which is a distributed storage system) for your database needs. This would suit you since it has process queing too.

acangiano · on Nov 13, 2008

As far as the database goes, with decent hardware DB2 9.5 will have you covered [1]. And, should you become truly huge, it can exceed your requirements by a long shot, provided that some serious hardware is employed [2].

[1] http://antoniocangiano.com/2008/11/04/benchmarking-db2-purex... [2] http://www.tpc.org/tpcc/results/tpcc_result_detail.asp?id=10...

whatusername · on Nov 14, 2008

$8,000,000 worth of RAM?

Serious hardware is perhaps an understatement.

owkaye · on Nov 13, 2008

NOTICE ! THIS THREAD IS BEING MOVED !!!

I'M GOING TO START A NEW THREAD TO CARRY ON WHERE THIS ONE LEAVES OFF SINCE THINGS HAVE BECOME SUCH A MESS IN THIS THREAD ... AND I'M SHOUTING TO MAKE SURE EVERYONE KNOW ABOUT THIS SO THEY CAN FIND AND PARTICIPATE IN THE NEW, CLEANER AND BETTER ORGANIZED THREAD.

Thanks and sorry for shouting, here's the new thread URL:

http://news.ycombinator.com/item?id=362810

majelelo · on Feb 6, 2009

http://majelelo.com

scumola · on Nov 12, 2008

It all depends on how "lean and mean" your app and the data is. If you do everything in javascript up-front and use AJAX to move data back and forth, then you reduce the pure bandwidth except for the initial javascript load (which you could cache on Akamai or something).

Some math: Sustained (theoretical maximums) with a 1Gbps link gives you 1M hits per minute at 8053 bytes per hit max. However, you're going to get less than this in reality. Figure 80% of this (6442 bytes) is what you can expect on average. A Gigabit pipe is also big bucks.

You'll need > 1 web server to handle the CPU load to handle 1M hits per minute, so you'll need a load-balancer that can handle Gigabit (Major $$$ already - we're talking in the $50k-$100k just for a single load-balancer that can handle this kind of traffic and most enterprise load-balancers max-out at one million 'sessions' (open connections)), so you may need to address multiple load-balancers running in an active/active configuration and > 1Gbps pipe if you are moving > 8k per 'hit'. Keep in mind that there is connection time/bandwidth overhead and http packet overhead as well. If you could get away from the browser/http model and move to more of a client/server application, you could optimize better and make better use of your bandwidth. Just a thought. Switching to a UDP protocol could save you bandwidth and make things a little faster network-wise also.

To be speedy, your app will have to live without a traditional DB. Put the data in memory (easy to do with memcache across several machines) and make sure that your app is quick and avoid using anything else that would be sub-optimal. Use a lightweight web server, or just get your app to take the http requests itself. C is a good choice for squeezing as much as you can out of your CPU. Write key/value pairs to memcache as your DB, and you should end up with a 10ms turnaround with your app if there is not too much calculation to be done (no sorts, no big/nested loops, ...). Perhaps an apache module for your app would be in order here? Get a few machines maxxed-out with this configuration (need 6+ just as an initial guess, depending on the load-balancer and network latency to serve all of those requests in a second), and it's do-able. Buy reliable machines (DELL or the like) and reliable network equipment since you'll be pushing the hardware to their limit and they'll be running hot.

Instead of writing to a conventional database, you can write to a log file or syslog to another machine (1M writes a minutes shouldn't be that bad) and reconstruct the data to a database when the heavy usage slows down (process the log files offline or something on a different machine).

This is not a small task and will require some tight code and big $$$ up-front, but it's doable. Your Co-___location bill will be large.

owkaye · on Nov 12, 2008

Retric said if I provide data that's accurate +/- .2 seconds that's close enough, but he's wrong (sorry Retric, it's not your fault). The fact is, I must provide each member with the relative position of his post each time he makes one, and the only way I know to do this is to append the new record first, and then do a search and sort to determine where his new post appears in the sorted list.

Retric, you said I'm not going to be able to sort things that fast for every new post. Do you know this to be true? Is this an absolute fact or are you theorizing?

I'm not suggesting that you wrong when you say this, but I certainly do not want to assume you're right just because you say so -- because unless there's another way to determine the position of the new post in a sorted list I may have no other choice but to abandon the existing approach, re-work the entire business plan, and try something else from a completely different angle. I prefer to NOT do this when the current concept is the best I've come up with.

What abut other people's experiences? Do any of you know about data systems that can do an append, followed by a search and sort on one field, at the rate of 16,000 times a second? Or is there another way to determine the position of a record in a sorted list without sorting the db to rceate that sorted list?

How about if the database is kept entirely in RAM? I know this will speed things up a lot, but will sorts be fast enough for 16,000 a second without bogigng down?

mseebach, you asked "What exactly is the output?" and I'm not sure if your question was directed to me but if it was, and if you're asking about the data that needs to be sent from the server back to the browser, then I think I might need to send only "one character" back to the Javascript. Previously I was going to send an entire HTML page but it seems the Javascript solution could be orders of magnitude more efficient. By using the Javascript approach I can send only 10 chars to the server and receive only one char back from the server. The Javascript can then convert that one char to the proper "relative position" value for display in the browser -- or so I hope.

bd, you said I might do better to build my own custom in-memory data structure rather than use a classical ready-made database solution. Maybe you're right. It would certainly reduce the costs over a super-expensive high-speed database solution, wouldn't it? I've never done such a thing before but maybe I can find someone who has. Any suggestions?

bd, you also said I do not want to sort the whole dataset after each new piece of data comes in, and you're right about that too. What I really need is to insert each piece of data in the proper pre-sorted position as it comes in. Then I never have to sort the data later because it's being sorted before or as it enters the db.

spinonethird, you said that from my description I'm not really using the db and maybe I can roll my own specialized data structure. You and bd seem to think alike, and with two of you in consensus I'm becoming more attracted to this idea, especially if it has the potential for faster performance than an off-the-shelf database program.

AS far as the business details are concerned, I know some people (gsiener, rbanffy and perhaps others) are curious but I'm not at liberty to discuss the details at this time. When I eventually spill the beans it will probably be after the business launches so we can gain traction and market share before the competition invades our turf ... :)

One question before I end this post:

Is C the fastest language or the best language to "roll your own" databse solution, or are other languages just as good or better? I'm not up-to-date on the advantages of many programming languages so your thoughts and opinions on this issue might help me to pick the best language for a project like this. Thanks.

mseebach · on Nov 12, 2008

Yes, I wanted to ask you what the output was, because, as others have pointed out, that's probably the determining factor in how you should structure the data.

If I understand correctly, when I post "A" and a short while later, I post "B", and in the meantime everyone else posts X posts, then at the time of posting B, I want to get a return of exactly X?

If that's the case, you should just keep a central counter, which is your critical region. Updating that 16.000 times a second in a blocking way should be possible. Sounds like something Erlang guys do to show off.

The rest of the app can be completely de-coupled. Once you've increased the counter, you save userid, timestamp and counter, and fetch the former post, and return the diff of the counter-values.

The only thing in this setup that's hard is the counter.

one other thing I noticed.. you're very enthusiastic about that everything should happen in exactly the correct order, and not even +/- .2 seconds deviation is in order. We're looking at around 16.000 transactions a second - that's 0.0625 ms pr transaction, or about 1/1000 of a regular internet roundtrip (50 ms is very common for DSL and Cable connections), so the of incoming requests will arrive in quite random order, within about 1000 requests .. make sure that doesn't break your business-model.

owkaye · on Nov 12, 2008

In my last long post I said Retric was wrong about providing data that's accurate only to +/- .2 seconds, but maybe I'm the one who is wrong. It is possible that I completely misinterpreted his meaning. I became aware of this possibility when I read lacker's post that said I can probably batch up data since I'm going to have latency issues anyways. He's right, I never thought of it this way.

The issue I'm imagining with batching is that I cannot send any data back to visitors until the next batch is processed -- because if I do they won't see how their last post has changed their positions in the hierarchy, and they will interpret this as a system error. In other words, every time a member receives a response from the server, that response must indicate his last post's position in the hierarchy.

Theoretically I could run a batch every second, and although this means holding up all the responses to requests that come in during the previous second, one second is also short enough that they will not interpret it as a problem with the system. On the other hand, 5 seconds in between batches would certainly be seen as a server problem especially by folks on high speed connections.

mseebach · on Nov 12, 2008

So you problem is not processing 1.000.000 requests pr. minutes, but rather how to sort up to 60.000.000 items 1.000.000 times pr. minute? Now that could prove to be quite problematic. If your sorting algorithm only requires one processing instruction pr. stored item, and is perfectly parallelizable, and there's zero overhead in the map-reduce, you'll need 300 3 ghz cores to do it.

lallysingh · on Nov 13, 2008

Hmm, nlogn(60,000,000) ~= 1,560,000,000 ops per sort, before your coefficient. If you want that done at 16.67 KHz (1 million times/min), you'll need some work. You can do it, in certain cases, with certain hardware & good skills, but it's tricky.

I'm still not buying that the requirements are really that high.

anamax · on Nov 13, 2008

> The issue I'm imagining with batching is that I cannot send any data back to visitors until the next batch is processed -- because if I do they won't see how their last post has changed their positions in the hierarchy, and they will interpret this as a system error. In other words, every time a member receives a response from the server, that response must indicate his last post's position in the hierarchy.

You're making this much harder than it has to be because you're assuming that users can see things that they can't.

Suppose that the system had two users, A and B, and they each issued two commands, A1 and A2 in that order from A and B1 and B2 from B. Suppose further that system responds to each command with something that it says is the global state. To keep things simple, I'll use the command, such as A2, to refer to the response as well. (In other words, A2 is the response that A gets to command A2.)

A knows that she issued A1 and A2 in that order, so the A2 response must reflect commands A1 and A2. However, the A2 response need not reflect B1 or B2 unless A has some way to know that B issued them before A2, regardless of when B1 and B2 were issued in relation to A2. In fact, it's probably okay for the A2 response to reflect command B2 and not command B1.

Now let's throw C into the mix, issuing C1 and C2 in that order. As with B, the A2 response need not reflect C1 or C2 even if they actually occurred before the A2 command. Moreover, even if the C1 and C2 commands both occurred before B1, the A2 response can reflect B2 and not C1 or C2.

The easist thing to do is to periodically produce consistent views of the data. This process will take a while so while it is happening, you'll have to use a past consistent view.

Three views, call them X, Y, and Z that the system updates should be adequate. At all times, one view will be "current", one will be being generated, and the third will be a source for stragglers.

Here's how the update process works. While X is being generated, from Z and all data received since Z's generation started, Y will be used to produce results to send to users. While Y is being generated, from X and all data received since X's generation started, X will be used to produce results to send to users. And similarly for Z (using Y).

Suppose that user A sends a command while Y is being generated from X. The response to A need only reflect X and commands that A sent since X was generated. As long as that can be finished before the system finishes generating Y and Z, everything is copacetic.

gaius · on Nov 12, 2008

How much work can you do in the other 23 hours? Can you be asynchronously writing from an in-memory cache to a real database during that time? Or pre-computing anything? During that hour, how much (if any) data can you afford to lose in the event of a problem? How much data per transaction as we talking about? Do you need a relational database to generate your pages or could you live with a linked-list of structs (or a tree or whatever) until you can persist it off somewhere?

(FWIW I work on an RDBMS-based OLTP system that handles thousands of commits/sec without breaking a sweat, and a lot more at peak times)

crabapple · on Nov 12, 2008

lots of hardware

there are a few best practices to scaling, you can find those archived at places like highscalability etc...but in the end you just need hardware. hardware buys you fault tolerance, test environments, sharding, blah blah blah. if you are looking to meet this traffic on your own box or on a vm-hoster, forget it