Humans aren't perfect. Open source projects have missed unescaped spaces in dire...

Cthulhu_ · on June 6, 2023

I was like "hold on, 2010 wasn't that long ago!" but thinking about it now, things like Github and its code review flow via merge requests only came into my radar at about 2012, and even then it took a long time before it was widely adopted.

Around that time I was in the middle of the Java ecosystem though, which had tons of tooling for validation, linting, verification, books full of best practices, etc.

Still a far cry from what is normal these days, but we did do things like unit tests, end-to-end tests and code reviews back then. But, that was enterprise, the SF web companies I think weren't as enterprisey as the bank I worked for at the time.

It's ironic though; the old fashioned companies I've worked for adopted a lot of the more fast-and-loose-feeling practices from SF, think extreme programming, agile, taking a chance on rewriting the whole back-end to a new language, things like that.

belter · on June 6, 2023

Completely disagree with the aproach humans are not perfect. This was not a one opportunity only event. Every test case they did not run, every load simulation they did not perform, every chaos test they decided to overlook was an example of humans failing repeatedly. Massive websites were running since the early nineties...And robust software practices are a thing of the sixties.

WJW · on June 6, 2023

Agreed, but this was not a solid company in the first place. Rather, Digg was (according to TFA) a failing startup losing money hand over fist that was launching v4 as a hail mary. They were so hard up that no hardware was available for hosting the new environment, so they started reusing v3 servers for v4 while the migration was ongoing.

They bet it all on this v4 thing, and lost.

dronf23 · on June 13, 2023

Funny you should mention that there was no hardware available. Digg had a second cage full of servers in VA as part of a failed attempt to go multi-DC. If that project had ever worked out there would have been 50% more capacity. Those servers and network gear just sat there unused until they finally got liquidated for pennies on the dollar.

throwaway290 · on June 6, 2023

> Humans aren't perfect.

Humans are good at designing processes and procedures that compensate for their lack of perfection. In this case that capability was not used apparently.

Anyone can make a mistake and that's fine, but a company would be expected to have processes and enough eyes to make sure that a bunch of other people need to make the same mistake before it makes its way into prod at which point the likelihood of mistake becomes really low...

that_guy_iain · on June 6, 2023

Not knowing Python, but I have a feeling things like this weren't as well known back in 2010 and the reason they're so well known is from people doing it at sites like Digg and then telling everyone that's a really bad idea. There are so many things that I know now for programming that were pretty much unknown 10-years ago because no one had really encountered them in a large scale setup.

biorach · on June 6, 2023

Nah, this has been very well known since decades and is very easy to spot.

ubercore · on June 6, 2023

Yeah this is covered in pretty much every style guide, tutorial, and reference. And is an easy screener in an interview. I'm also surprised anyone using Python as more than a toy experiment had this issue, especially in such a critical service.

_qxau · on June 6, 2023

I've worked on a lot of Python as a hobbyist and student, much of it in real world use - probably at least 100kloc - and I've never encountered this. Where is this information? I'm worried that I'm missing some other "obvious" things. I've of course run into python's shallow vs deep copies, but I don't remember default values being shared between invocations.

ubercore · on June 8, 2023

If you do a search for "python common gotchas" it's almost certain to come up, usually pretty prominently.

biorach · on June 6, 2023

Still... did no one with even moderate Python experience even glance at this very important endpoint at any stage during those 4 weeks? Like I say, this is the kind of thing that jumps out of the screen for an experienced developer.

Macha · on June 6, 2023

In 2010? You'd be surprised before widespread adoption of git at how many places the process was individual developers committing changes unsupervised. I'm not sure we even disagree that's the problem - you feel a second person should have looked at it, I feel the process should require that. A code review might be something that they'd sit down with a selected piece of code that they felt was risky on a biweekly basis or something.

godzillabrennus · on June 6, 2023

I’m 2010 access to prod over ftp:// might have still been a thing even at digg.

biorach · on June 6, 2023

Forget code review. This is one of their most important endpoints. They had severe issues for 4 weeks. Did no one not even glance over the code and spot this glaring screw-up?

0xAFFFF · on June 6, 2023

Considering the context, most people were probably overloaded with work and a lot of changes must have been made with little to no oversight.