Amazon was Down

gkoberger · on April 22, 2013

A while ago, someone claiming to have worked at Amazon said that downtime doesn't really affect things as much as you'd think. He said most people simply just come back later.

https://news.ycombinator.com/item?id=5147461

[Edit: That being said, there's also the statistic that every 100ms of latency costs Amazon 1%. Imagine what 20+ minutes of "latency" would do. https://news.ycombinator.com/item?id=273900]

coldtea · on April 22, 2013

>A while ago, someone claiming to have worked at Amazon said that downtime doesn't really affect things as much as you'd think. He said most people simply just come back later.

Brick and mortar stores found that out ages ago. They were closed for large parts of the day/night and the customers just came back the next day.

If people didn't leave Tumblr and Twitter, with their constant massive outages (at some point in their life), when why would the leave Amazon, a huge established player, for a few hours outage?

freehunter · on April 22, 2013

Funny enough, I've gone to a brick and mortar store to find it was closed, then went back home and bought the item on Amazon. I wanted it right then, but since I would have to wait until the next day anyway I just ordered it online.

Contrast that with going to Amazon and finding their site is down or performing poorly; I have never gone to the store to buy something instead. If I was already going to be ordering it online, I was already resigned to waiting a day or two for it to arrive.

ams6110 · on April 22, 2013

I doubt many would leave Amazon permanently because of a service outage, but it probably does cost them something in impulse purchases, if the impulsive desire for the item passes during the downtime.

ryusage · on April 22, 2013

No, you would think that, but actually if you read the comment referenced above, what they actually found is that outages seem to have little effect on revenues. That's why it's so surprising, really. The implication seems to be that their customers don't actually spend a whole lot on impulse buys.

philwelch · on April 22, 2013

Tumblr and Twitter have orders of magnitude fewer users than Amazon.

coldtea · on April 22, 2013

Seeing that Twitter has 500 million users, an order of magnitude more would be in the range of 5 billion, and a second order of magnitude would be around 50 billion.

So I seriously doubt Amazon has "orders of magnitude more users" that Twitter. For Tumblr, maybe, but I doubt that too (maybe just one order of magnitude for it).

Besides, all that's orthogonal to my point. Except if you mean that the reason that Twitter and Tumblr have less users than Amazon is that they left those services due to outages.

philwelch · on April 22, 2013

Twitter has much better uptime than it used to, too.

When you get in the hundreds of millions of users, you've run out of early adopters and end up with users who are a little more demanding about uptime.

jonny_eh · on April 22, 2013

I imagine the difference between latency and downtime is that latency tends to occur every time you visit, while by definition, downtime is more rare. In other words, latency provides a bad experience, while downtime provides no experience.

SatvikBeri · on April 22, 2013

The Oatmeal actually has a great comparison of how people react to latency vs how they react to downtime: http://theoatmeal.com/comics/no_internet

ankitml · on April 22, 2013

the latency vs cost curve will not be linear. After some latency, increase in latency wont affect cost much.

troels · on April 22, 2013

Also, latency (a constant lack of resources) and down time (an extraordinary lack of resources) are two very different things. I wouldn't be surprised if some down time had little impact on sales, whereas latency has a lot.

ankitml · on April 22, 2013

true! latency is also ___location dependent. so in latency network and health of other external 'resources' also matter. Whereas downtime is only a server issue. So they are two very different things.

wdr1 · on April 22, 2013

This is true by my experience. I went online last night to buy a replacement garden hose. Was surprised Amazon was down. After confirming it wasn't just me (thanks isitdownforeveryoneorjustme.com), I gave up and ordered it this morning.

geuis · on April 22, 2013

This is just a small complaint coming many hours after this link was posted. You linked to perhaps the most top-level URL amazon has available for a temporary outage. This means a couple things. 1) Hours later, the outage is over and I'm just hitting the home page. No specific information about what you were reporting. 2) This specific link is, as far as I know, now no linger available for other stories. That may not matter I the long run but it bares mentioning.

austenallred · on April 22, 2013

At an estimated loss of $31,000 per minute http://news.cnet.com/8301-10784_3-9962010-7.html?tag=nefd.to... I'm blown away that I see Amazon goes down so often. That certainly, in my mind, doesn't bode well for the brand of AWS.

brokentone · on April 22, 2013

I can't imagine that's a real loss, likely only deferred sales. What are you going to do if you can't buy your thing at Amazon? Drive somewhere? I think you'll try again later.

Dividing income for time doesn't necessarily give you loss, especially this seems to have no weighting for time of day and season. I doubt an outage right now has anywhere near the same effect it would have during lunch break two weeks before Christmas.

gry · on April 22, 2013

Heroku is up: http://cl.ly/image/0B0U1K3Z342R.

Conversely, when AWS had issues, Amazon.com was not impacted.

Amazon.com != AWS. I'm curious to know when AWS or Amazon.com innovations impact each other, or which one leads. I'd rather it be Amazon.com.

ceol · on April 22, 2013

I think the gp was saying that the brand— as in, the perception of AWS— will suffer, not the actual services.

Anyone with an ounce of server knowledge would know it's impossible to keep a website up for 100% of the time, so downtime at Amazon is understandable, but maybe the average Joe Manager is deciding between Rackspace and AWS and happens to visit amazon.com during this downtime. "If Amazon can't even keep their bread-and-butter running, how can I trust them with something like AWS?" he might say.

scottbruin · on April 24, 2013

> Anyone with an ounce of server knowledge would know it's impossible to keep a website up for 100% of the time

As far as I know Google has 100% uptime, so it's not impossible. May not be 100% for every geographical ___location but that's partly because of things Google cannot control nor make redundant.

pkfrank · on April 22, 2013

Even if Amazon.com != AWS, it is still bad for the Amazon Brand which encompasses AWS.

If they can't keep their own server up, how can you trust them with yours?

An unfair argument, perhaps, but one that impacts them all the same.

olalonde · on April 22, 2013

Apparently, Amazon.com switched to AWS in 2011 http://www.quora.com/Amazon/Does-Amazon-com-use-Amazon-AWS.

ankushnarula · on April 22, 2013

I was at the AWS Summit in NYC last week and Werner claims that they only completed the transition of retail to AWS fairly recently: http://www.youtube.com/watch?v=oo1W92Teqx4

kamkazemoose · on April 22, 2013

Does Amazon really go down that often? Is there any data how often/ for how long Amazon does go down? I wonder how it compares to other sites that get the same amount of traffic.

what_ever · on April 22, 2013

I think it's the second time this year - http://money.cnn.com/2013/01/31/technology/amazon-down/index...

mbesto · on April 22, 2013

http://en.wikipedia.org/wiki/Availability_heuristic

austenallred · on April 22, 2013

It probably doesn't, but you notice every time it does.

It went down once in the last couple weeks as well, if I remember right.

philh · on April 22, 2013

FWIW, I did not notice that.

crucifiction · on April 22, 2013

It does not go down that often, and when it does its measured in minutes. This latest outage was maybe ~10 minutes total...

coldtea · on April 22, 2013

Is it a real cost (and how can you know that?) or just a naive interpolation sales_per_hour / hours_outage?

Most of those are surely done later, perhaps they lose some impulsive buys though.

jrosenblatt · on April 22, 2013

$31K in '08. What is it today?

michaelrbock · on April 22, 2013

A very quick calculation (using AMZN's $61b net sales in 2012) yields about $116k per minute.

robryan · on April 22, 2013

One thing that would contribute to extra cost is a large amount of advertising that they are paying per click that ends up leading to a page being down.

alpb · on April 22, 2013

Amazon.com retail website does not run on AWS.

lubos · on April 22, 2013

Amazon.com website actually runs on AWS for last 2-3 years.

nevir · on April 22, 2013

It sure didn't when I was there three years ago, but maybe they moved quickly.

crucifiction · on April 22, 2013

Not true...

InclinedPlane · on April 22, 2013

It runs 100% on AWS.

Not all of amazon runs on AWS though, since they use a service oriented architecture, but many of the services also run on AWS.

nevir · on April 22, 2013

Judging from alpb's post, they either currently work there, or did in the past. "retail website" being the key phrase used frequently internally to Amazon.

cybernoodles · on April 22, 2013

AWS is working at unprecedented scale and is definitely pushing the envelope. This sort of stuff is inevitable and I don't think it's inherently a bad thing.

podperson · on April 22, 2013

Somewhat off-topic: my (limited) experience with Amazon Prime video suggests it's significantly less reliable than Netflix or iTunes (neither of which are stupendously reliable, but I'd say Netflix is by far the most reliable of the three). Hulu might actually be worse than Amazon Prime.

notimetorelax · on April 22, 2013

I don't know if you saw this posted on HN, but Netflix test their system really well. They use so-called chaos monkey [1] that shuts down random servers on a whim. This allows them to detect and get rid of dependencies, i.e. tolerate failures in other parts of the system.

[1] http://techblog.netflix.com/2011/07/netflix-simian-army.html

jdrobins2000 · on April 22, 2013

Wish I had thought of chaos monkey, much cooler than whatever I called my version of it.

A few years ago I built an automated test system in perl, complete with message bus and message listener container for running tasks on various servers. One of the automated tests I wrote had a component that would periodically (at random intervals) kill processes, unmount shared filesystems, offline interfaces, etc. to cause failovers, to verify that all processes and resources were failed over, and all tasks were reassigned to other nodes and no jobs were dropped or stalled.

It is really the only way to ensure you've covered your bases - beating the shit out of your system repetitively. It uncovered a bunch of big holes and some very obscure ones too, and once we got those fixed it ran pretty much flawlessly.

smackfu · on April 22, 2013

The interesting problem is that the underlying AWS system seems to come up with more and more interesting failure modes due to system complexity, that the testing could never catch. Like Netflix had a major outage recently on Christmas Eve 2012.

http://techblog.netflix.com/2012/12/a-closer-look-at-christm...

mistermumble · on April 22, 2013

Isn't Netflix hosted on AWS?

svedlin · on April 22, 2013

Netflix is indeed running on AWS. Some details about their back-end here:

http://techblog.netflix.com/2012/12/aws-reinvent-was-awesome...

jherrick · on April 22, 2013

They also constant kill machines (and replace them with fresh instances of the image) that participate in key load-balanced activities.

incision · on April 22, 2013

My experience is the opposite and I use both services regularly. At least one fairly recent event corroborates this [1].

1: http://gigaom.com/2012/12/25/christmas-eve-aws-outage-stings...

rdl · on April 22, 2013

Wow. Something this big means I'll bet it's a networking issue.

I wonder if they lose money for a brief outage, or if people just delay their purchases. I seem to remember them graphing this somewhere.

ketralnis · on April 22, 2013

Do we really need a front-page post every time a well-known site has a hiccup? It's bad enough getting it every time github does. What are you hoping for here? A thread full of me-toos?

lucb1e · on April 22, 2013

Had downtime yesterdayevening (12 hours ago) as well in the Netherlands. People from Germany were able to load the website (the .com version; .de worked at all times), and after two hours I was able to as well. Upon trying to add something to my cart it returned the same error 500 though, so that was still down the last time I checked (about 10 hours ago). I'm not sure if or when this was resolved.

I didn't submit this as story because I didn't think anyone would care, given the recent call not to post downtimes. Given the #1 spot the story has now, it seems I should have. So do people care or not?

argsv · on April 22, 2013

http://www.amazon.ca/ is OK.

I get Http/1.1 Service Unavailable on first two requests.

I got 500 with the message "We're very sorry, but we're having trouble doing what you just asked us to do. Please give us another chance--click the Back button on your browser and try your request again. Or start from the beginning on our homepage. "

ck2 · on April 22, 2013

Comes up for me http://www.amazon.com/gp/cart/view.html

on April 22, 2013

[deleted]

knowtheory · on April 22, 2013

aws != amazon.

It's perfectly possible for AWS to keep running just fine, while Amazon the website bursts into flames.

Osiris · on April 22, 2013

Does Amazon not run their website on AWS? I assumed (incorrectly, apparently) that AWS was originally built to allow Amazon to scale their own services. Is it really a separate product that they don't use themselves?

simonster · on April 22, 2013

Just because the servers themselves are up doesn't mean that the software running on the servers is up (or that it's capable of handling the load).

I just got my shopping cart to load, but it took quite a long time. Maybe they're getting DOSed.

datasage · on April 22, 2013

Just because Amazon is down, doesn't mean the infrastructure is the reason. They did build AWS out of the technology they used to build Amazon, but its unclear if they are using it directly or use an isolated set of services.

spartango · on April 22, 2013

Amazon uses AWS to host Amazon.com (retail), although you can imagine that they dodge the billing structure and have quite a number of resources dedicated away from the main AWS fleets.

knowtheory · on April 22, 2013

Originally, that's true, Amazon didn't run on AWS afaik. But i believe they do now.

Nevertheless, they still have application architecture which sits above the aws substrate. It's perfectly feasible for them to have seriously fucked up a deployment that runs on top of AWS, which may be functioning just fine (and at least all of my services running out of us-east seem to be up and running).

mathrawka · on April 22, 2013

Just because a site is on AWS does not mean it cannot go down for its own reasons. There are more failures possible than infrastructure.

brokentone · on April 22, 2013

I thought the same. Werner Vogels, their CTO, said at the NYC cloud event that they moved amazon.com to it in 2010, and amazon.com international in 2011.

liveaxle · on April 22, 2013

AFAIK the "Amazon runs on AWS" meme was originally a line of pure marketing tripe.

frehpt · on April 22, 2013

Incorrect, Amazon.com runs on AWS: https://news.ycombinator.com/item?id=5588012

nevir · on April 22, 2013

AWS tech maybe, but you can be pretty assured that the data centers (or at least networks) are are almost completely segregated.

crucifiction · on April 23, 2013

Nope, its all mixed outside of software segregation.

nieksand · on April 22, 2013

Umm. If AWS is working fine, why should the status page show anything of note?

frehpt · on April 22, 2013

Exactly...

level09 · on April 22, 2013

http://i.imgur.com/E8vAzKp.jpg

simonster · on April 22, 2013

I doubt that works when the site is up either. Many servers these days reject pings.

davorak · on April 22, 2013

Doing fine here, both amazon.com and the AWS console.

rattray · on April 22, 2013

here=where?

michaelburk · on April 22, 2013

I got a 500 error on amazon.com. Now just a timeout.

seanp2k2 · on April 22, 2013

Can't get into AWS management console either.

ritchiea · on April 22, 2013

I can't get to the management console or status.aws.amazon.com

monksy · on April 22, 2013

Sounds like someone that is oncall is going to have a bad night. Amazon.de and .co.uk are up.

mattbillenstein · on April 22, 2013

There are two types of websites, those that have suffered downtime, and those that will.

ivabz · on April 22, 2013

Seems like Only US market got goosebumps. UK looks fine and up.

michaelrbock · on April 22, 2013

And it's back up for me.

itomatik · on April 22, 2013

aws console seems fine to me. the amazon.com is down for sure.

andrewryno · on April 22, 2013

Yeah same here. Amazon is throwing a "Http/1.1 Service Unavailable" but AWS console is fine.

magnacartic · on April 22, 2013

Just successfully launched an instance, but I can't get my Prime video!

DallaRosa · on April 22, 2013

Amazon.com is back up

danielovichdk · on April 22, 2013

Use Windows Azure!

rapcal · on April 22, 2013

Back online

gsibble · on April 22, 2013

Browsing the site is nearly impossible at the moment.

tquai · on April 22, 2013

It's a lesson in overengineering. At this point my $5 Pentium 3 server has a greater uptime than Amazon.

potatolicious · on April 22, 2013

Your $5 Pentium 3 server isn't the largest retail website on the internet making $61 billion a year.

Having seen a lot of the code that Amazon runs on, and having seen first-hand the scale that it runs on, I'll say this: it's not perfect, but it's remarkably well-engineered, and a hell of a lot better than most snarky HNers could do.

hhw · on April 22, 2013

But that's the point. Most people don't need anything that well-engineered. Compared to more traditional hosting solutions from quality providers, AWS has terrible uptime and at a much higher cost for the same amount of resources. Two VPS'es from two different providers in a simple failover configuration with an anycast DNS solution would be simpler, cheaper, and much more reliable.

hhw · on April 22, 2013

Wow, apparently that last comment really hit a nerve, as several people decided to downvote it, but not a single person actually refuted any of what I said. I was under the impression that downvotes were more to be used against trolling or flamebaiting, and not just opinions that people disagreed with. Considering everything I said is quite easy to verify as being true, this downvoting just strikes me as kind of intellectually dishonest. I expected better from HN.

tquai · on April 22, 2013

Yeah, I think there's a misunderstanding somewhere. Some people think I believe a $5 computer could handle amazon.com's traffic, which is clearly preposterous.

I know that almost all of my downtime comes from when I overengineer things. And I don't need to "patch my kernel" because my OS doesn't have kernel holes once a week. Linux isn't the only Unix OS out there.

Today, a lot of sysadmins believe that "LAMP" is a synonym for webserver, and consequently there are a bunch of webservers serving static content on a machine with way too many moving parts. Complexity is bad.

"Things should be made as simple as possible, but not any simpler." -- Albert Einstein

potatolicious · on April 22, 2013

I think the downvotes are because your post is somewhat off topic.

OP responded to "Amazon.com is down" with "this is a lesson in over-engineering" - which it isn't, because Amazon.com is most certainly not overengineered for its purpose (I've seen the code with my own two eyes).

Your response is "not everyone needs extensively engineered systems", which is true, but is a non-sequitor from the previous posts.

vineel · on April 22, 2013

Except your Pentium 3 server doesn't have to handle over 100 million unique visitors per quarter. :P

setrofim_ · on April 22, 2013

Unless your server needs to handle comparable amounts of traffic, it's not the same thing.

tquai · on April 22, 2013

It doesn't. My server is appropriately engineered to its task.

frehpt · on April 22, 2013

Is that the sound of the power supply going on your Pentium 3...

...or your internet connection

You never patch or reboot your magic box either?

illuminate · on April 22, 2013

How is your hooptie server at all comparable to the largest online retailer?

When very little changes and very little happens, uptime's a lot easier to accrue.