A major Amazon Web Services (AWS) infrastructure outage is underway, disrupting dozens of key online services around the world. The epicentre of the problems is in the US-EAST-1 region (Virginia, USA), one of the most important nodes of the global cloud. Amazon has confirmed the problems and is actively working to resolve them.
According to AWS‘ official status, the US-EAST-1 region is experiencing “an increased number of errors and delays”. The company said it has identified a potential cause in the APIs (APIs) of the DynamoDB service, a popular NoSQL database.
Global reach and local impact
The impact of the outage is being felt globally. Users are massively reporting problems with the operation of services such as:
- Communication and business tools: Zoom, Slack, Canva, Ring
- Entertainment platforms: Fortnite, Roblox, Snapchat, Duolingo, Reddit
- Other services: Shutterstock and even McDonald’s apps
The Downdetector service reported that this morning alone it recorded more than 4 million reports of network problems worldwide, while the daily average is usually 1.8 million. At its peak (at around 9:52am PT), there were almost 6,000 reports of AWS alone in the US.
The problems are also affecting Poland. Downdetector logs notifications concerning, among others, the Polish Post Office. mBank has officially acknowledged that due to the outage, customers may experience problems with the visibility of transactions in the transaction history. The bank assures that it is working to restore full functionality.
AWS response and diagnosis of the problem
Amazon engineers “immediately sprang into action” after detecting the anomaly. In a recent announcement, the company specified that it had identified flaws in the DynamoDB API and was “working on multiple parallel paths to accelerate the remediation of the outage”.
It is worth noting that not the entire network is experiencing problems. Google’s services and Meta’s platforms (Facebook, Instagram) are operating stably. Elon Musk also confirmed that Service X is not affected, suggesting a different infrastructure architecture or a lack of reliance on affected services in the US-EAST-1 region.
Experts quoted by The Independent point out that there is nothing to suggest that the failure was the result of a cyber-attack. The most common cause of such incidents is human error, a misimplemented update or a physical hardware problem in the data centre. Work to stabilise services is still ongoing.
