> Our API server was a Python Tornado service... and one of the most frequently ...

BoppreH · on June 6, 2023

Sure it's a known footgun, but I wouldn't be so harsh without knowing more. It could be the result of a series of only slightly bad decisions.

A junior developer creates a function with default of [] instead of (), but otherwise no mutations:

    def get_user(unchecked_ids=[]):
        ids = get_valid_ids(unchecked_ids)
        if not ids: ids = [current_user_id]
        return query_users(ids)

Alice introduces a mutation in a place that is currently safe.

    fa55099 - 45 minutes ago - Alice - Avoid creating unnecessary new list
     def get_user(unchecked_ids=[]):
         ids = get_valid_ids(unchecked_ids)
    -    if not ids: ids = [current_user_id]
    +    if not ids: ids.append(current_user_id)
         return query_users(ids)

Meanwhile Bob simplifies the code in a separate branch.

    f4e704c - 30 minutes ago - Bob - Fix caller who was sending invalid ids
    + def get_user(ids=[]):
    - def get_user(unchecked_ids=[]):
    -     ids = get_valid_ids(unchecked_ids)
          if not ids: ids = [current_user_id]
          return query_users(ids)

Then Charlie does something else on the branch, tries to merge it into master, and Git auto-resolves the conflict because there's no overlap between the changes.

    def get_user(ids=[]):
        if not ids: ids.append(current_user_id)
        return query_users(ids)

And now everyone sees the data of a random user, and your foot is missing.

Is it good code? No. Good version control hygiene? Also no. Should you crucify the developer who made this mistake? Of course not, especially once you add the boilerplate chaff that was omitted here. That's why it's called a footgun, it's easy to misuse.

Macha · on June 6, 2023

Humans aren't perfect. Open source projects have missed unescaped spaces in directory paths that caused the deletion of /usr (Bumblebee), video games have forgotten to check the cwd and deleted vital windows boot files (Eve Online) and operating systems have forgotten to check passwords (MacOS).

Given this was 2010 I wouldn't be surprised if they had a less mature development process that doesn't use things people take for granted today, like linters, and pre-merge code reviews.

If you assume perfection of humans, maybe 95/100 times it works out and 4/100 it's a small oops, but 1/100 times you get something embarrassing like this

Cthulhu_ · on June 6, 2023

I was like "hold on, 2010 wasn't that long ago!" but thinking about it now, things like Github and its code review flow via merge requests only came into my radar at about 2012, and even then it took a long time before it was widely adopted.

Around that time I was in the middle of the Java ecosystem though, which had tons of tooling for validation, linting, verification, books full of best practices, etc.

Still a far cry from what is normal these days, but we did do things like unit tests, end-to-end tests and code reviews back then. But, that was enterprise, the SF web companies I think weren't as enterprisey as the bank I worked for at the time.

It's ironic though; the old fashioned companies I've worked for adopted a lot of the more fast-and-loose-feeling practices from SF, think extreme programming, agile, taking a chance on rewriting the whole back-end to a new language, things like that.

belter · on June 6, 2023

Completely disagree with the aproach humans are not perfect. This was not a one opportunity only event. Every test case they did not run, every load simulation they did not perform, every chaos test they decided to overlook was an example of humans failing repeatedly. Massive websites were running since the early nineties...And robust software practices are a thing of the sixties.

WJW · on June 6, 2023

Agreed, but this was not a solid company in the first place. Rather, Digg was (according to TFA) a failing startup losing money hand over fist that was launching v4 as a hail mary. They were so hard up that no hardware was available for hosting the new environment, so they started reusing v3 servers for v4 while the migration was ongoing.

They bet it all on this v4 thing, and lost.

dronf23 · on June 13, 2023

Funny you should mention that there was no hardware available. Digg had a second cage full of servers in VA as part of a failed attempt to go multi-DC. If that project had ever worked out there would have been 50% more capacity. Those servers and network gear just sat there unused until they finally got liquidated for pennies on the dollar.

throwaway290 · on June 6, 2023

> Humans aren't perfect.

Humans are good at designing processes and procedures that compensate for their lack of perfection. In this case that capability was not used apparently.

Anyone can make a mistake and that's fine, but a company would be expected to have processes and enough eyes to make sure that a bunch of other people need to make the same mistake before it makes its way into prod at which point the likelihood of mistake becomes really low...

that_guy_iain · on June 6, 2023

Not knowing Python, but I have a feeling things like this weren't as well known back in 2010 and the reason they're so well known is from people doing it at sites like Digg and then telling everyone that's a really bad idea. There are so many things that I know now for programming that were pretty much unknown 10-years ago because no one had really encountered them in a large scale setup.

biorach · on June 6, 2023

Nah, this has been very well known since decades and is very easy to spot.

ubercore · on June 6, 2023

Yeah this is covered in pretty much every style guide, tutorial, and reference. And is an easy screener in an interview. I'm also surprised anyone using Python as more than a toy experiment had this issue, especially in such a critical service.

_qxau · on June 6, 2023

I've worked on a lot of Python as a hobbyist and student, much of it in real world use - probably at least 100kloc - and I've never encountered this. Where is this information? I'm worried that I'm missing some other "obvious" things. I've of course run into python's shallow vs deep copies, but I don't remember default values being shared between invocations.

ubercore · on June 8, 2023

If you do a search for "python common gotchas" it's almost certain to come up, usually pretty prominently.

biorach · on June 6, 2023

Still... did no one with even moderate Python experience even glance at this very important endpoint at any stage during those 4 weeks? Like I say, this is the kind of thing that jumps out of the screen for an experienced developer.

Macha · on June 6, 2023

In 2010? You'd be surprised before widespread adoption of git at how many places the process was individual developers committing changes unsupervised. I'm not sure we even disagree that's the problem - you feel a second person should have looked at it, I feel the process should require that. A code review might be something that they'd sit down with a selected piece of code that they felt was risky on a biweekly basis or something.

godzillabrennus · on June 6, 2023

I’m 2010 access to prod over ftp:// might have still been a thing even at digg.

biorach · on June 6, 2023

Forget code review. This is one of their most important endpoints. They had severe issues for 4 weeks. Did no one not even glance over the code and spot this glaring screw-up?

0xAFFFF · on June 6, 2023

Considering the context, most people were probably overloaded with work and a lot of changes must have been made with little to no oversight.

Ndymium · on June 6, 2023

Having done this same mistake myself, it's just such an easy mistake to make. Most likely you're writing two or three programming languages in the same project, and the others work differently from Python. So it's very easy to miss an errant `foo={}` in a function definition when reviewing a commit of several hundred lines. Especially as the only indicator of something being wrong is the lowest priority warning PyCharm has, meaning the background colour `{}` is only slightly different.

These days I'd set up a linter that enforces these things do not happen, but it's easy to be smart after the fact. You say "once you've been trained by bitter experience it's very easy to spot", which I feel is true. You just need that bitter experience first, and for me it came in the form of a production issue.

ohgodplsno · on June 6, 2023

Forget about production, how did this make it into python? This is so unbelievably stupid it makes PHP look like a sensible language.

faefox · on June 6, 2023

This is the question people should be asking rather than trying to identify which Digg employee should be retroactively thrown under the bus. Just a completely insane design decision.

silvestrov · on June 6, 2023

This is one of those things that are so bad it warrants breaking backwards compability, possibly by releasing a python v4.

faefox · on June 6, 2023

If they start today they could complete the migration by the mid-2040s!

fantasizr · on June 6, 2023

It's one of the top python posts on S/O, every python developer finds their way to this discovery https://stackoverflow.com/questions/1132941/least-astonishme...

contravariant · on June 6, 2023

It's an interesting decision to be sure. Then again it's not really that arcane, the question is simply whether you calculate function defaults once or on every invocation. Python made the choice to run it once.

What I don't get here is why they didn't simply use 'None' instead. If a parameter is optional then 'None' makes way more sense than an empty list.

quickthrower2 · on June 6, 2023

It is perfectly sensible… if your language is guaranteed immutable everywhere

paulddraper · on June 6, 2023

I suspect because it's easier to implement defaults as values than as expressions.

The scoping of list comprehensions in Python 2 was far less justifiable, IMO.

geenat · on June 6, 2023

This. Should have been taken out in Python 3, but nah, still present even today.

Waterluvian · on June 6, 2023

It feels like they had someone developing this service that hasn’t done this kind of thing before. Which is a pretty startupy thing to do. I’ve been there before, even with that exact footgun. Though a much smaller feature and testing revealed it.

mrguyorama · on June 6, 2023

You can work in python for years and never know this. How many devs know the entire spec of their working language by heart?

I've been bitten by this exact thing. It's not strictly obvious IMO. Though it didn't make it through testing.

paulddraper · on June 6, 2023

Footguns are still footguns.