Incident affecting Google Cloud Networking, Hybrid Connectivity
Global : Cloud Networking faced severe packet loss
Incident began at 2022-05-20 13:47 and ended at 2022-05-20 14:07 (all times are US/Pacific).
Previously affected location(s)
Taiwan (asia-east1)Hong Kong (asia-east2)Tokyo (asia-northeast1)Osaka (asia-northeast2)Seoul (asia-northeast3)Mumbai (asia-south1)Delhi (asia-south2)Singapore (asia-southeast1)Jakarta (asia-southeast2)Sydney (australia-southeast1)Melbourne (australia-southeast2)Warsaw (europe-central2)Finland (europe-north1)Belgium (europe-west1)London (europe-west2)Frankfurt (europe-west3)Netherlands (europe-west4)Zurich (europe-west6)Montréal (northamerica-northeast1)Toronto (northamerica-northeast2)São Paulo (southamerica-east1)Santiago (southamerica-west1)Iowa (us-central1)South Carolina (us-east1)Northern Virginia (us-east4)Oregon (us-west1)Los Angeles (us-west2)Salt Lake City (us-west3)Las Vegas (us-west4)
| ||2 Jun 2022||12:11 PDT|| |
On Friday, 20 May 2022 at 13:47 US/Pacific, Google Cloud Networking experienced intermittent packet loss for traffic between multiple cloud regions for a duration of 20 minutes. The issue was identified and mitigated automatically by 14:07 US/Pacific.
We understand this issue has affected our valued customers and users, and we apologize to those who were affected.
Google’s production backbone is a global network that enables connectivity for all user-facing traffic via Points of Presence (POPs) or internet exchanges.
A failure of a component on a fiber path from one of the central US gateway campuses in Google’s production backbone led to a decrease in available network bandwidth between the gateway and multiple edge locations, causing packet loss while the backbone automatically moved traffic onto remaining paths.
The network topology in this region is being augmented, and the second-best path has not completed its augmentation. This meant some traffic needed to reroute onto the third-best path, which led to an extended traffic migration period. This disruption was more severe than we had anticipated, and is the subject of remediation actions below.
REMEDIATION AND PREVENTION:
Google’s automated repair mechanisms detected the decrease in available network bandwidth on Friday, 20 May 2022 at 13:47 US/Pacific and automatically routed the traffic through alternate links. The traffic rerouting completed on Friday, 20 May 2022 at 14:06 US/Pacific, mitigating the issue.
While our automated mechanisms worked as intended and recovered the traffic without manual intervention, we understand that the scope of impact caused by this event affected our customers.
We have been working on optimizing our global network to minimize the time spent automatically reconfiguring around failures like this (known as "convergence time"). While we have made progress, efforts to improve are ongoing. We continue to ensure that the current technology is optimally configured to minimize the frequency and severity of these issues.
In this network region, we intend to complete the augmentation by 11 July 2022, which will return the network to the intended topology where any single fiber path failure can be rerouted quickly onto the second-best path.
We will also build automatic analysis to ensure that the network topology during augmentation always supports fast convergence.
Google is committed to quickly and continually improving our technology and operations to prevent service disruptions. We appreciate your patience and apologize again for the impact to your organization. We thank you for your business.
DETAILED DESCRIPTION OF IMPACT:
| ||23 May 2022||14:33 PDT|| |
Mini Incident Report
We apologize for the inconvenience this service disruption/outage may have caused. We would like to provide some information about this incident below. Please note, this information is based on our best knowledge at the time of posting and is subject to change as our investigation continues. If you have experienced impact outside of what is listed below, please reach out to Google Support by opening a case https://cloud.google.com/support or help article https://support.google.com/a/answer/1047213.
(All Times US/Pacific)
Incident Start: 20 May 2022 13:47
Incident End: 20 May 2022 14:07
Duration: 20 minutes
Affected Services and Features:
Regions/Zones: Multiple regions
Google Cloud Networking experienced intermittent packet loss for some transit traffic affecting inter-region connectivity for multiple cloud regions, which lasted 20 minutes. From preliminary analysis, the root cause is a hardware issue on an optical (amplification) component affecting Google's user facing backbone. The issue was identified and mitigated by our automated repair mechanism without manual intervention.
| ||20 May 2022||14:52 PDT|| |
The issue with Google Cloud Networking has been resolved for all affected users as of Friday, 2022-05-20 14:07 US/Pacific.
Affected customers would have experienced high latency, timeouts and errors.
We thank you for your patience while we worked on resolving the issue.
| ||20 May 2022||14:44 PDT|| |
Summary: Global : Cloud Networking faced severe packet loss
Description: We've received a report of an issue with Google Cloud Networking as of Friday, 2022-05-20 13:47 US/Pacific.
We will provide more information by Friday, 2022-05-20 14:45 US/Pacific.
Diagnosis: Customers may have encountered elevated latency errors
Workaround: None at this time.