The Red Wedding Problem: Write Spikes at the Edge and a Mitigation Strategy [pdf]

0xbadcafebee · on March 22, 2018

Another way to recap the paper is this: run small clusters at your PoPs, aggregate the results of those clusters and replicate back to an upstream cluster with eventual consistency. The PoP clusters throttle down their replication under high load. They also scale small clusters of PoPs at specific times, which saves them money at the same time as dealing with traffic spikes.

When you have a flood of writes, sometimes those writes are identical or nearly identical, or the data in each write barely changes. It makes no sense to flood those writes upstream, because since it's barely changing, you may not need it that urgently. Throttling lets you simply move the changes back with eventual consistency.

zzzcpan · on March 22, 2018

Not just aggregating writes, but merging them.

It seems that high load high scale solutions [1] [2] start to converge on this idea of merging updates and eventual consistency.

[1] https://arxiv.org/pdf/1708.06423.pdf

[2] https://databeta.wordpress.com/2018/03/09/anna-kvs/

rkachowski · on March 22, 2018

For people like me -

The Red Wedding Problem: A huge spike in read / write traffic. Exemplified in the paper as users viewing and editing the Game of Thrones wiki in the hours before, during and after an episode (Also gives realtime sports commentary on reddit as an example)

evrydayhustling · on March 22, 2018

So... Red Wedding problems are left behind by Bolt-on solutions that address only only read-heavy spikes?

tuna · on March 22, 2018

Complex and great read, thanks. I suggest following @cmeik on twitter for good content: https://twitter.com/cmeik

keyle · on March 22, 2018

Solid write up. Btw if you wonder what 'at the edge' means, it's basically what I knew as CDN so far.

godelmachine · on March 22, 2018

Wikipedia comes to the rescue → https://en.wikipedia.org/wiki/Edge_computing

Edge computing is a method of optimizing cloud computing systems by performing data processing at the edge of the network, near the source of the data. This reduces the communications bandwidth needed between sensors and the central data center by performing analytics and knowledge generation at or near the source of the data.

heathermiller · on March 22, 2018

https://www.youtube.com/watch?v=PJwt2dxx9yg