I've got a concept for an online service that will work properly only if the servers can receive and process a million + hits a minute for a minimum of one hour at a time. Each hit will post new data that needs to be updated in the online database, and the newly updated data will become part of the dynamically generated page returned to the visitor.
The pages returned will be very small so I don't think page size will be an issue. Instead I think that HTTP server (or perhaps database) speed limits may be exceeded here, or perhaps the dynamic page rendering engine will become the bottleneck.
???
My ultimate goal is to come up with an clear understanding of how to create a system that updates the database, and generates the dynamic HTML pages, and serves the pages to visitors -- fast enough that they do not experience slowdowns during the hour that this heavy load will exist.
How do I go about learning how to do this? Have any of you done something like this in the past? This is way beyond my personal experience level. Thanks in advance for any suggestions you can provide.
Look, you've got ~4 GHz processors available, and you're worried about a million hits a minute. That's nearly 240k clock cycles per hit.
Next, you've got 60 +/- 10ms of latency between you and your clients (unless there are details here you aren't covering, but no matter). You don't really have to respond to users terribly fast, as lots of your responsiveness would be lost over jitter/lag.
A single dell box could handle your load, if you wrote the whole thing in C on a modern quad-core desktop.
Ignore all the standard Apache/SQL/networking stuff until later until you know what you need. Get the core requirements up first, then add on the load of the side-stuff secondly.
E.g. doing all the heavy lifting in a private set of cpu cores and then providing a shared-mem plugin to apache may be enough for you. Save a CPU core to snapshot your data.
So, for advice:
1. Ignore database bullshit. You don't need it, it won't help. If you want a DB for other purposes, fine. A snapshot process writing to your DB is fine, just don't put it in the critical path.
2. Build a load simulator. A raw mode that just sends over the handful of bytes, and a cooked mode that bothers to printf' a GET request.
3. Start with a reasonable prototype, and work your way to something performant. Hell, you can probably do it in java if you don't mind buying a few more CPU cores.
4. Integrate as you need for the rest of your requirements. For example, have another box serve the rest of your webapp, and dedicate a stripped down apache box with a custom module for this stuff.
In essence, I'm telling you to treat it as a very smallish HPC problem, instead of some sort of nightmare webapp problem. It fits better, and suddenly you have lots of people/knowledge/COTS equipment available to you.