A while ago, someone claiming to have worked at Amazon said that downtime doesn't really affect things as much as you'd think. He said most people simply just come back later.
[Edit: That being said, there's also the statistic that every 100ms of latency costs Amazon 1%. Imagine what 20+ minutes of "latency" would do. https://news.ycombinator.com/item?id=273900]
>A while ago, someone claiming to have worked at Amazon said that downtime doesn't really affect things as much as you'd think. He said most people simply just come back later.
Brick and mortar stores found that out ages ago. They were closed for large parts of the day/night and the customers just came back the next day.
If people didn't leave Tumblr and Twitter, with their constant massive outages (at some point in their life), when why would the leave Amazon, a huge established player, for a few hours outage?
Funny enough, I've gone to a brick and mortar store to find it was closed, then went back home and bought the item on Amazon. I wanted it right then, but since I would have to wait until the next day anyway I just ordered it online.
Contrast that with going to Amazon and finding their site is down or performing poorly; I have never gone to the store to buy something instead. If I was already going to be ordering it online, I was already resigned to waiting a day or two for it to arrive.
I doubt many would leave Amazon permanently because of a service outage, but it probably does cost them something in impulse purchases, if the impulsive desire for the item passes during the downtime.
No, you would think that, but actually if you read the comment referenced above, what they actually found is that outages seem to have little effect on revenues. That's why it's so surprising, really. The implication seems to be that their customers don't actually spend a whole lot on impulse buys.
Seeing that Twitter has 500 million users, an order of magnitude more would be in the range of 5 billion, and a second order of magnitude would be around 50 billion.
So I seriously doubt Amazon has "orders of magnitude more users" that Twitter. For Tumblr, maybe, but I doubt that too (maybe just one order of magnitude for it).
Besides, all that's orthogonal to my point. Except if you mean that the reason that Twitter and Tumblr have less users than Amazon is that they left those services due to outages.
Twitter has much better uptime than it used to, too.
When you get in the hundreds of millions of users, you've run out of early adopters and end up with users who are a little more demanding about uptime.
I imagine the difference between latency and downtime is that latency tends to occur every time you visit, while by definition, downtime is more rare. In other words, latency provides a bad experience, while downtime provides no experience.
Also, latency (a constant lack of resources) and down time (an extraordinary lack of resources) are two very different things. I wouldn't be surprised if some down time had little impact on sales, whereas latency has a lot.
true! latency is also ___location dependent. so in latency network and health of other external 'resources' also matter. Whereas downtime is only a server issue. So they are two very different things.
This is true by my experience. I went online last night to buy a replacement garden hose. Was surprised Amazon was down. After confirming it wasn't just me (thanks isitdownforeveryoneorjustme.com), I gave up and ordered it this morning.
This is just a small complaint coming many hours after this link was posted. You linked to perhaps the most top-level URL amazon has available for a temporary outage. This means a couple things. 1) Hours later, the outage is over and I'm just hitting the home page. No specific information about what you were reporting. 2) This specific link is, as far as I know, now no linger available for other stories. That may not matter I the long run but it bares mentioning.
I can't imagine that's a real loss, likely only deferred sales. What are you going to do if you can't buy your thing at Amazon? Drive somewhere? I think you'll try again later.
Dividing income for time doesn't necessarily give you loss, especially this seems to have no weighting for time of day and season. I doubt an outage right now has anywhere near the same effect it would have during lunch break two weeks before Christmas.
I think the gp was saying that the brand— as in, the perception of AWS— will suffer, not the actual services.
Anyone with an ounce of server knowledge would know it's impossible to keep a website up for 100% of the time, so downtime at Amazon is understandable, but maybe the average Joe Manager is deciding between Rackspace and AWS and happens to visit amazon.com during this downtime. "If Amazon can't even keep their bread-and-butter running, how can I trust them with something like AWS?" he might say.
> Anyone with an ounce of server knowledge would know it's impossible to keep a website up for 100% of the time
As far as I know Google has 100% uptime, so it's not impossible. May not be 100% for every geographical ___location but that's partly because of things Google cannot control nor make redundant.
I was at the AWS Summit in NYC last week and Werner claims that they only completed the transition of retail to AWS fairly recently: http://www.youtube.com/watch?v=oo1W92Teqx4
Does Amazon really go down that often? Is there any data how often/ for how long Amazon does go down? I wonder how it compares to other sites that get the same amount of traffic.
One thing that would contribute to extra cost is a large amount of advertising that they are paying per click that ends up leading to a page being down.
Judging from alpb's post, they either currently work there, or did in the past. "retail website" being the key phrase used frequently internally to Amazon.
AWS is working at unprecedented scale and is definitely pushing the envelope. This sort of stuff is inevitable and I don't think it's inherently a bad thing.
Somewhat off-topic: my (limited) experience with Amazon Prime video suggests it's significantly less reliable than Netflix or iTunes (neither of which are stupendously reliable, but I'd say Netflix is by far the most reliable of the three). Hulu might actually be worse than Amazon Prime.
I don't know if you saw this posted on HN, but Netflix test their system really well. They use so-called chaos monkey [1] that shuts down random servers on a whim. This allows them to detect and get rid of dependencies, i.e. tolerate failures in other parts of the system.
Wish I had thought of chaos monkey, much cooler than whatever I called my version of it.
A few years ago I built an automated test system in perl, complete with message bus and message listener container for running tasks on various servers. One of the automated tests I wrote had a component that would periodically (at random intervals) kill processes, unmount shared filesystems, offline interfaces, etc. to cause failovers, to verify that all processes and resources were failed over, and all tasks were reassigned to other nodes and no jobs were dropped or stalled.
It is really the only way to ensure you've covered your bases - beating the shit out of your system repetitively. It uncovered a bunch of big holes and some very obscure ones too, and once we got those fixed it ran pretty much flawlessly.
The interesting problem is that the underlying AWS system seems to come up with more and more interesting failure modes due to system complexity, that the testing could never catch. Like Netflix had a major outage recently on Christmas Eve 2012.
Do we really need a front-page post every time a well-known site has a hiccup? It's bad enough getting it every time github does. What are you hoping for here? A thread full of me-toos?
Had downtime yesterdayevening (12 hours ago) as well in the Netherlands. People from Germany were able to load the website (the .com version; .de worked at all times), and after two hours I was able to as well. Upon trying to add something to my cart it returned the same error 500 though, so that was still down the last time I checked (about 10 hours ago). I'm not sure if or when this was resolved.
I didn't submit this as story because I didn't think anyone would care, given the recent call not to post downtimes. Given the #1 spot the story has now, it seems I should have. So do people care or not?
I get Http/1.1 Service Unavailable on first two requests.
I got 500 with the message "We're very sorry, but we're having trouble doing what you just asked us to do. Please give us another chance--click the Back button on your browser and try your request again. Or start from the beginning on our homepage.
"
Does Amazon not run their website on AWS? I assumed (incorrectly, apparently) that AWS was originally built to allow Amazon to scale their own services. Is it really a separate product that they don't use themselves?
Just because Amazon is down, doesn't mean the infrastructure is the reason. They did build AWS out of the technology they used to build Amazon, but its unclear if they are using it directly or use an isolated set of services.
Amazon uses AWS to host Amazon.com (retail), although you can imagine that they dodge the billing structure and have quite a number of resources dedicated away from the main AWS fleets.
Originally, that's true, Amazon didn't run on AWS afaik. But i believe they do now.
Nevertheless, they still have application architecture which sits above the aws substrate. It's perfectly feasible for them to have seriously fucked up a deployment that runs on top of AWS, which may be functioning just fine (and at least all of my services running out of us-east seem to be up and running).
I thought the same. Werner Vogels, their CTO, said at the NYC cloud event that they moved amazon.com to it in 2010, and amazon.com international in 2011.
Your $5 Pentium 3 server isn't the largest retail website on the internet making $61 billion a year.
Having seen a lot of the code that Amazon runs on, and having seen first-hand the scale that it runs on, I'll say this: it's not perfect, but it's remarkably well-engineered, and a hell of a lot better than most snarky HNers could do.
But that's the point. Most people don't need anything that well-engineered. Compared to more traditional hosting solutions from quality providers, AWS has terrible uptime and at a much higher cost for the same amount of resources. Two VPS'es from two different providers in a simple failover configuration with an anycast DNS solution would be simpler, cheaper, and much more reliable.
Wow, apparently that last comment really hit a nerve, as several people decided to downvote it, but not a single person actually refuted any of what I said. I was under the impression that downvotes were more to be used against trolling or flamebaiting, and not just opinions that people disagreed with. Considering everything I said is quite easy to verify as being true, this downvoting just strikes me as kind of intellectually dishonest. I expected better from HN.
Yeah, I think there's a misunderstanding somewhere. Some people think I believe a $5 computer could handle amazon.com's traffic, which is clearly preposterous.
I know that almost all of my downtime comes from when I overengineer things. And I don't need to "patch my kernel" because my OS doesn't have kernel holes once a week. Linux isn't the only Unix OS out there.
Today, a lot of sysadmins believe that "LAMP" is a synonym for webserver, and consequently there are a bunch of webservers serving static content on a machine with way too many moving parts. Complexity is bad.
"Things should be made as simple as possible, but not any simpler." -- Albert Einstein
I think the downvotes are because your post is somewhat off topic.
OP responded to "Amazon.com is down" with "this is a lesson in over-engineering" - which it isn't, because Amazon.com is most certainly not overengineered for its purpose (I've seen the code with my own two eyes).
Your response is "not everyone needs extensively engineered systems", which is true, but is a non-sequitor from the previous posts.
https://news.ycombinator.com/item?id=5147461
[Edit: That being said, there's also the statistic that every 100ms of latency costs Amazon 1%. Imagine what 20+ minutes of "latency" would do. https://news.ycombinator.com/item?id=273900]