Incident affecting Google Cloud Networking
Customers experienced a cloud networking disruption from 04:28 AM - 04:50 AM US/Pacific
Incident began at 2022-09-22 04:28 and ended at 2022-09-22 04:50 (all times are US/Pacific).
Previously affected location(s)
Iowa (us-central1)South Carolina (us-east1)Oregon (us-west1)
| ||3 Oct 2022||12:35 PDT|| |
On Friday, 22 September 2022, Google Cloud experienced a traffic disruption in the wide-area network connecting the us-east1 and us-central1 cloud regions. Inter-region traffic in Google Cloud, and Internet-to-Google Cloud traffic, may have been disrupted if it transited this network path. We are aware of potential impact in several Cloud regions including asia-east1, asia-northeast1, asia-southeast1, australia-southeast1, europe-west1, europe-west2, europe-west3, europe-west4, northamerica-northeast1, us-central1, us-east1, us-east4, us-west1, us-west2, us-west4, as well as to Google Workspace, with a total duration of 22 minutes.
The traffic disruption in Google's wide-area network was triggered by brief failures in fiber-optic cables, in the presence of a pre-existing failure nearby in the network.
These brief failures occurred progressively across a 18-minute period on Friday, 22 September 2022, from 04:28 to 04:46 US/Pacific. Each event required a rerouting of traffic, extending the impact to 04:50.
The pre-existing failure occurred on Wednesday, 20 September 2022 22:10 US/Pacific, and was still under repair at the time of the second failure on Friday 22nd September.
Google's interregional backbone is designed with multiple levels of redundancy and is provisioned to reroute Cloud traffic with minimal disruption under all common failure scenarios. In this case, the backbone was designed with appropriate redundancy to survive this dual-failure scenario, but traffic in the affected regions experienced longer rerouting delays. Traffic flowing over other network links experienced disruption as rerouted traffic sought alternate, less congested backup paths.
Remediation and Prevention
Google's network reacted automatically to the 04:28 to 04:46 events, rerouting within our design goals and fully recovered by 04:50.
Our network controls software automatically removed the impacted links from service for our engineers to investigate, since unreliable paths cause more short-term impact than failed paths. There was no shortage of capacity at any time; all disruptions were caused by rerouting.
The probability and impact of these scenarios is exhaustively modeled to ensure such double failures occur very infrequently and do not exceed long-term (yearly & multi-year) availability targets.
Google is committed preventing a repeat of this issue in the future and is completing the following actions:
Detailed Description of Impact
On 22 September 2022, between 04:28 to 04:50 US/Pacific unless otherwise noted the following services (but not limited to) may have been impacted for various customers in the following cloud regions: asia-east1, asia-northeast1, asia-southeast1, australia-southeast1, europe-west1, europe-west2, europe-west3, europe-west4, northamerica-northeast1, us-central1, us-east1, us-east4, us-west1, us-west2, us-west4, unless otherwise noted.
Google Compute Engine
Affected Google Compute Engine customers may have experienced increased latency and packet loss between Compute Engine instances in affected regions.
Google Cloud BigTable
A small percentage of customers may have experienced errors in API calls from 04:29 through 04:51 in asia-east1, us-central1, us-south1, us-west1, and us-west4.
Affected Google Chat customers would have experienced errors when accessing, creating, or responding to chats from 04:28 through 04:51.
Google Voice users might have experienced some of their actions failing during the impact window due to an internal error. This includes all actions such as sending SMS, placing & receiving calls, loading call history, etc. Affected web, Android, and iOS users.
Google Cloud Storage
A small percentage of Google Cloud Storage customers may have experienced errors in requests to GCS buckets in asia-south2, asia-southeast2, us-west1 and us-west2.
Google Cloud Load Balancing
Affected Google Cloud Load Balancing customers may have experienced increased HTTP 5XX errors. Globally around 3M queries were served with 5XX response during the two outage windows and us-west1, asia-east1, asia-south1, asia-south2 and asia-southeast1 saw the most of the failing queries.
| ||22 Sep 2022||14:22 PDT|| |
Mini Incident Report
We apologize for the inconvenience this service disruption/outage may have caused. We would like to provide some information about this incident below. Please note, this information is based on our best knowledge at the time of posting and is subject to change as our investigation continues. If you have experienced impact outside of what is listed below, please reach out to Google Cloud Support using https://cloud.google.com/support .
(All Times US/Pacific)
Incident Start: 22 September 2022 04:30
Incident End: 22 September 2022 04:38
Duration: 8 minutes
Incident Start: 22 September 2022 04:48
Incident End: 22 September 2022 04:58
Duration: 10 minutes
Affected Services and Features:
Google Cloud Networking
Regions/Zones: us-central1, us-east1, us-west1
Customers using Google Cloud Networking experienced a network traffic disruption in us-central1, us-east1, us-west1 regions on 22 September 2022 for 8 minutes starting 04:30 US/Pacific and for 10 minutes starting 04:48 US/Pacific (total duration of 18 minutes). From preliminary analysis, the root cause of the issue was identified as failures of a high fraction of transport links between the affected regions.
The incident had the following impact for our customers. Some customers using Cloud Networking experienced severe traffic disruption for the two occurrences of the incident. Some cloud customers communicating outside the affected regions (including to the Internet) would have seen two periods of disruption, ~8 minutes at 04:30 AM, ~10 minutes at 04:48 AM US/Pacific.
| ||22 Sep 2022||05:45 PDT|| |
The issue with Google Cloud Networking has been resolved for all affected users as of Thursday, 2022-09-22 05:23 US/Pacific.
We thank you for your patience while we worked on resolving the issue.
| ||22 Sep 2022||05:30 PDT|| |
Summary: Customers experienced a cloud networking disruption from 04:30 AM - 04:58 AM US/Pacific
Description: Customers might have experienced a cloud networking disruption from 04:30 AM - 04:58 AM US/Pacific as a result of an issue on physical network.
We believe the network connectivity is currently stable.
We will provide an update by Thursday, 2022-09-22 06:45 US/Pacific with current details.
Diagnosis: All cloud customers communicating outside the region (including to the internet) would have seen two periods of disruption, ~8m at 04:30 AM, ~10m at 04:48 AM US/Pacific