How The Massive (AWS Outage) Internet Fail Yesterday Could Have Been Avoided.
- Mar 01, 2017
- Bobby DeVeaux
A lot of the reports talk about Amazon Web Services (AWS) and its S3 service failing at approx 17:45 GMT on 28th Feb, however very few (if any) are covering the fact that the reason half the Internet died was because most websites are not prepared for such an outage, even though the technology is available to prevent it.
Given the outage was only in 1 region (us-east-1, covering all 3 availability zones), the list of websites that suffered the outage is hugely surprising, especially when you consider how big some of these sites are.
To list a few; Docker's Registry Hub, Trello, Travis CI, GitHub and GitLab, Quora, Medium, Signal, Slack, Imgur, Twitch.tv, Razer, heaps of publications that stored images and other media in S3, Adobe's cloud, Zendesk, Heroku, Coursera, Bitbucket, Autodesk's cloud, Twilio, Mailchimp, Citrix, Expedia, Flipboard, and Yahoo! Mail (which you probably shouldn't be using anyway). Readers also reported that Zoom.us and some Salesforce.com services were having problems, as were Xero, SiriusXM, and Strava. Thanks tohttps://www.theregister.co.uk/2017/03/01/aws_s3_outage/ for the list)
Not only that, but apparently many expensive IoT cameras and ovens couldn’t communicate to Amazon S3 either.. (cool they exist though).
But how can this happen?! Amazon offers so many services offering failover (multiple regions and availability zones) that this shouldn’t happen to at least the largest of websites, right?
Posted: Mar 01, 2017