We've all been there. Shit happens. That's what backup is for.
OT: It's probably bad form to publicly blame someone for it, even if it's done by him. It's suffice to say, we screwed up but on our way to recovery. It's better to follow the practice of praising in public and discussing problem in private.
I worked on a team that had a list of "breakfastable offences" -- violating these rules meant you had to bring in breakfast for the whole team (donuts, bagels, whatever). One of them was "throwing someone under the bus." In conversations with anyone outside the team, you weren't allowed to single out a person as responsible for any particular bug/error/etc.
Granted, this is pretty vague (depending on how many "administrators" the company has), but it's still too specific for me.
I like the idea of "breakfastable offenses." I'm curious if you had remote teammates. If so, was there a workaround?
It reminds me of my SCUBA training. The head instructor had a list of offenses. If you committed an offense (i.e. leaving goggles on forehead after surfacing), he'd say, "6-pack," obligating you to bring a 6-pack of beer. ;-)
We had no remote teammates. Not sure how we would have handled that :-)
Some of the other offenses:
* Breaking the build (unit tests only) and then leaving for the day without fixing it or reverting your change
* Walking away without logging out of your machine (very security-conscious business)
* Not completing an assigned code review within a week of being assigned (within reason -- if the code review was enormous, or you were overly busy with an enormous project of your own, don't worry about it)
At a previous company, we had a culture of sending out prank emails from open machines.
I once sent out an email from a co-founders account which said that he was fed up with the crappy codebase and was hiring a new team to rewrite it from scratch and that the other (non-tech) co-founder was to take over the existing tech team. No one took it seriously (non-tech co-founder helped me draft the email, the other one laughed while reading it after the fact), but there was a board member in the mailing group who thought it was serious and started sending panicky emails to the co-founders.
That was how we generally accomplished that enforcement -- see an unlocked machine, write an email to the team list saying "I'm bringing breakfast tomorrow!"
Sorry that seems cancerous to me. It breeds entire departments who are unaccountable for the work they do because nobody is safe to speak out against it.
If someone fucks up you should be able to point it out - and they should be immediately forgiven. It's only when they show that they repeatedly take no care in their work and cause the same problems over and over that they should be fired - and those people should be fired instead of being the anchors around the necks of everyone else dragging us down to drown at the bottom of oceans of day to day misery.
The team as a whole was still accountable. The tech lead/PM were still accountable. Internally, members were accountable.
Certainly, if there was a specific problem, it could be raised to management. But generally, if we were in a weekly customer-attended meeting, or dealing with a bug discovered after a production release, no individual could be singled out as responsible for a particular blunder.
On a public relations note, though: I think a case could be made that it was important to give some specifics about who is to blame. Consider the alternative:
"We discovered the production database had been deleted but we are now working diligently to restore it"
How are people -- both non-technical and the HN crowd -- not supposed to suspect that this is a result of an external malicious hack?
The organization can take responsibility for the issue. "During a system update, we mistakenly deleted a production database. We are restoring it and shoring up our disaster recovery plan."
That's very different from "During a system update, Dave mistakenly deleted a production database." In an organization with 5 or 10 people, "During a system update, our administrator mistakenly deleted a production database," is still identifying.
Like I said, I'm not sure it's an issue in this particular case. I don't personally know anything about the site in question.
I guess it could be "we accidentally deleted the production database." But at that point they would just be euphemizing - clearly someone pulled the trigger. If they were naming the person, that would be pretty terrible on their part. But they're not. It seems perfectly fine to my eyes.
The person might be under the gun from someone else to fix the prior problem, and made a mistake under pressure. Singling out the admin just absolves management from any blame. And it shows the lack of leadership and the lack of willingness to take on responsibility for the organization.
And the problem is clearly a organizational problem. There's no clear backup and restore procedure. It's probably never tested for restore. There's no failover. There's no disaster recovery. Even if it's there, it has not been fire-drilled periodically. There's no clear access procedure in protecting the production servers. There's no prior spell out of definite steps to address production problems before doing them. There's no rollback procedure. There's no review. There's no approval process.
As I mention above, it's not so much the administrators fault as it is the VP of engineering. There was clearly no disaster recovery plan which is unacceptable.
OT: It's probably bad form to publicly blame someone for it, even if it's done by him. It's suffice to say, we screwed up but on our way to recovery. It's better to follow the practice of praising in public and discussing problem in private.