On demand is easier precisely because having a huge library in all data centers is relatively cheap. In actuality you just have a cache, collocated ISPs that pulls from your origin servers. Likely you have users all watching different things so you can easily avoid hot spots by sharding on the content type. Once the in demand content is in the cache its' relatively easy to serve.
Live content is harder because it can't really be cached, nor, due to TLS, can you really serve everyone the same stream. I think the hardest problem to solve is provisioning. If you are expecting 1 million users, and 700,000 of them get routed to a single server, that server will begin to struggle. This can happen in a couple different ways - for example an ISP who isn't a large consumer normally, suddenly overloads its edge server. Even though your DC can handle the traffic just fine, the links between your DC and the ISP begin to suffer, and since the even is live, it's not like you can just wait until the cache is filled downstream.
isn't it a tree of cache servers? as origin sends the frames they're cached.
and as load grows the tree has to grow too, and when it cannot resorting to degrading bitrate, and ultimately to load shedding to keep the viewers happy?
and it seems Netflix opted to forego the last one to avoid a the bad PR of an error message of "we are over capacity" and instead went with actually let it burn, no?
When I mean "cached", it means that the PoP server can serve content without contacting the origin server. (The PoP can't serve content it does not have).
>and it seems Netflix opted to forego the last one to avoid a the bad PR of an error message of "we are over capacity" and instead went with actually let it burn, no?
Anything other than 100% uptime is bad PR for Netflix.
Live content is harder because it can't really be cached, nor, due to TLS, can you really serve everyone the same stream. I think the hardest problem to solve is provisioning. If you are expecting 1 million users, and 700,000 of them get routed to a single server, that server will begin to struggle. This can happen in a couple different ways - for example an ISP who isn't a large consumer normally, suddenly overloads its edge server. Even though your DC can handle the traffic just fine, the links between your DC and the ISP begin to suffer, and since the even is live, it's not like you can just wait until the cache is filled downstream.