I went from 20second response times to 250us response times (yes, almost a 100,000x speedup) in Django doing the following:
1. Moved from a shared hosting server to my own custom server. This alone my latencies went from 20s (sleep wakeup latencies) or 5s (woke response latencies) down to about .5s-1s.
2. Use Streaming HTTP response to immediately send HTTP header & initial HTML data before database is even accessed
3. Reduce the number of database lookups with select_related()/prefetch_related(). Each cached page fragment should only have one SQL query at most.
4. Don't even bother with Django ORM and use Postgres prepared SQL statements directly.
5. Use database materialized view.
6. GZIP cached data & serve GZIPed response (has side benefit of effectively making cache 10x bigger)
7. Move from NGINX to H2O web server for HTTP/2 (awesome little server that does http/2 cache-aware server push. See https://h2o.examp1e.net)
8. Build a simple Javascript single-page app framework
9. Use Postgres JSON data types for API calls
10. Run a separate python logging process outside of Django response.
11. Further optimizations to speed up HTML (cut down repaints), CSS (GPU animations), JS, DNS, and TLS/SSL.
It's amazing the amount of crap & inefficiencies that are in web services. A CPU can do trillions of calculations a second.. why should a web site take trillions of calculations to send a single page?
That's interesting, but #1 seems like a bug in your shared hosting service. 20 seconds for a single request isn't even usable, and an unoptimized Django config doesn't start anywhere near that slow.
A $5 Linode and stock Django should be able to get you the 500-1000ms.
Also, I'm not sure how you're measuring 250 us, but I doubt it's meaningful number, because in practice either of these two numbers will be greater than that:
1) The network latency to send as single packet from the user to your server. This is usually more like 10 ms, or 40x what you're quoting.
Great link I saw on lobste.rs -- a website in a SINGLE PACKET:
2) The time it takes the browser to render the page (e.g. loaded from the memory of a local server). Static HTML might render in 250 microseconds, but I doubt that anything with JavaScript or even CSS will.
In other words, I highly doubt you have 250 us end-user latency; it's basically impossible with the web and "normal conditions". You can choose to measure it in some weird way, but it doesn't reflect what your users are experiencing.
EDIT: I guess if you throw out the connection time, 250 us possible. But I don't think it's meaningful to throw out the connection time -- you at least have to average it over many requests.
It was on a cheap Bluehost account. These shared hosts usually don't receive a lot of traffic, and often went to sleep which takes 20 seconds to wake up. The system wasn't scalable at all and we had occasional days of high-traffic when it mattered that would cause the site to remain fully inaccessible.
I'm measuring response times at the server, with no network latencies. My optimization process involved reducing TTFB at first and then complete full page generation in later steps.
Yeah, but I'm not sure why that's a meaningful metric. The user doesn't really care when the first byte is delivered -- they care when they can interact with the page (which is often less than when all bytes are delivered.)
At this point why would you even need Django anymore?
A lot of these points jive with my experience, especially #3. Django makes it too easy to accidentally create a view which makes 100s if not 1000s of db calls for no good reason.
Brotli takes a lot longer to compress than Zlib. It doesn't remove the requirement for zlib anyways, which means I have to store both Zlib and Brotli fragments in my Redis cache. And Zlib has a nice python interface.
Only for the default settings, which for brotli is the maximum compression ratio and for nginx isn't, but brotli seems to win in general for comparable compression times or sizes: https://certsimple.com/blog/nginx-brotli
Anyway, compression times don't matter if you're compressing things up ahead. E.g. nginx can serve static files compressed with gzip_static / brotli_static. It's true that you might have to serve both, but disk is cheap and improving latency and response time is hard, so I have no problem with that tradeoff. Or you could just have the zlib encode on the fly for old IE, if space is truly a concern.
Very positive. Works great and fast. Very simple to configure, like Nginx. Nginx also needs a paid license for server push, while H20 does it with a free license.
Likely moving from `text` or `json` to `jsonb` data type in Postgres:
> There are two JSON data types: json and jsonb. They accept almost identical sets of values as input. The major practical difference is one of efficiency. The json data type stores an exact copy of the input text, which processing functions must reparse on each execution; while jsonb data is stored in a decomposed binary format that makes it slightly slower to input due to added conversion overhead, but significantly faster to process, since no reparsing is needed. jsonb also supports indexing, which can be a significant advantage.[1]
Which gave the biggest performance boost? I’d wager #3. 20 seconds sounds pretty absurd unless you’re being really inefficient on your server or are moving around lots of data.
Definitely #3. That's when I went from an amateurish 1s page generation times to a respectable and professional 50ms or so, just by cutting down on the number of SQL queries from hundreds to a few. This also massively cut down CPU usage.
Using materialized views cut down the 50ms page generation to about 10-15ms, by flattening queries into a flat table so there aren't joins.
Using prepared statements cut down 10-15ms page generation to 5ms. Each Postgres query has an entire setup process that involved about 7ms of overhead, and prepared statements cut all that overhead. You won't notice this overhead when your queries are 50-100ms each, but when your queries are only 10ms, it matters.
Getting rid of Gzip in the page generation by serving cached gzip cut down page generation times down to sub-millisecond range for cache hit fragments.
1. Moved from a shared hosting server to my own custom server. This alone my latencies went from 20s (sleep wakeup latencies) or 5s (woke response latencies) down to about .5s-1s.
2. Use Streaming HTTP response to immediately send HTTP header & initial HTML data before database is even accessed
3. Reduce the number of database lookups with select_related()/prefetch_related(). Each cached page fragment should only have one SQL query at most.
4. Don't even bother with Django ORM and use Postgres prepared SQL statements directly.
5. Use database materialized view.
6. GZIP cached data & serve GZIPed response (has side benefit of effectively making cache 10x bigger)
7. Move from NGINX to H2O web server for HTTP/2 (awesome little server that does http/2 cache-aware server push. See https://h2o.examp1e.net)
8. Build a simple Javascript single-page app framework
9. Use Postgres JSON data types for API calls
10. Run a separate python logging process outside of Django response.
11. Further optimizations to speed up HTML (cut down repaints), CSS (GPU animations), JS, DNS, and TLS/SSL.
It's amazing the amount of crap & inefficiencies that are in web services. A CPU can do trillions of calculations a second.. why should a web site take trillions of calculations to send a single page?