Service Health
Incident affecting Google Cloud Networking, Google Compute Engine, VMWare engine, Google Cloud SQL, Google Kubernetes Engine
Customers may experience traffic loss across multiple products with requests destined to and from us-west2
Incident began at 2021-05-04 15:35 and ended at 2021-05-04 21:08 (all times are US/Pacific).
Date | Time | Description | |
---|---|---|---|
| 6 May 2021 | 12:12 PDT | Following is the Incident Report for the networking outage occurred on May 4th 2021. (All Times US/Pacific) Incident Start: 2021-05-04 15:35 Incident End: 2021-05-04 21:08 Duration:. 5 hours, 33 minutes Affected Services: Google Cloud Networking, Google Compute Engine (GCE), Google Cloud VMWare Engine, Cloud SQL and Google Kubernetes Engine (GKE) Features: Cloud VPN, Cloud Interconnect, Google Private Access Regions/Zones: us-west2 Description: Google Cloud Platform experienced an outage affecting network traffic in region us-west2 for a duration of 5 hours and 33 minutes. This impacted Internet and Cloud Interconnect connectivity to/from us-west2, including traffic between GCE VMs in the region and Internet endpoints, VM-to-VM traffic over Public IPs, External Network Load Balancing, Cloud VPN Classic (non-HA), and Cloud Interconnect. Cloud VPN HA was not impacted. Root cause and mitigation: The root cause was a rollout that changed some internal network settings on machines which handle internet routing to Cloud Services. Machines which received the change were unable to receive network programming information. The change caused new TCP connections to establish successfully, but dropped some packets sent between the Control and Data plane (Maglev[1]). Maglevs route traffic from public IPs and interconnects to various endpoints such as Cloud VPN tasks, individual instances, and groups of instances. When a Maglev task first starts, it must be programmed in order to start routing traffic. As independent Maglev Control and Dataplane rollouts restarted tasks, their long-standing TCP connections were reset, and the newly established connections were unable to exchange programming messages. This was mitigated by rolling back the configuration change once the root cause was identified. [1] https://research.google/pubs/pub44824 Customer Impact:
Additional Details:
|
| 4 May 2021 | 21:50 PDT | The issue with Cloud Networking has been resolved for all affected users as of approximately Tuesday, 2021-05-04 21:15 US/Pacific. Customers affected by this issue observed traffic loss and were unable to reach VPN or Interconnect gateways from and to resources in us-west2 between 2021-05-04 17:37 to 21:15 US/Pacific. The following products were impacted: Google Compute Engine/Google Kubernetes Engine (Any resources/products using these products may also be impacted): May experience high traffic loss and connections errors from and to us-west2 over public IP. Internal IP traffic should continue work as normal. Cloud Interconnect/Cloud VPN: May be unable to reach the gateway and high traffic loss. Google Private Access: May see high packet loss. Google Compute VMWare Engine Some instances may have entered a 'down' state. We thank you for your patience while we worked on resolving the issue. |
| 4 May 2021 | 20:36 PDT | Summary: Customers may experience traffic loss across multiple products with requests destined to and from us-west2 Description: Our engineering team continues their investigation into this issue. Affected customers will see traffic loss and may be unable to reach VPN or Interconnect gateways from and to resources in us-west2 beginning at, Tuesday, 2021-05-04 17:37 US/Pacific. The following products are currently impacted: Google Compute Engine/Google Kubernetes Engine - May see errors with connections to and from us-west2 over public IP. Internal IP traffic should continue work as normal. Cloud Interconnect/Cloud VPN - May see some session disconnects and high traffic loss and Google Private Access - May see high packet loss. We will provide an update by Tuesday, 2021-05-04 23:30 US/Pacific with current details. Diagnosis: None at this time. Workaround: None at this time. |
| 4 May 2021 | 19:03 PDT | Summary: Customers may experience traffic loss across multiple products with requests destined to and from us-west2 Description: We are experiencing an intermittent issue with Cloud Networking beginning at Tuesday, 2021-05-04 17:37 US/Pacific. The following products are currently impacted: Google Compute Engine may see errors with connections to and from us-west2 over public IP. Internal IP traffic should continue work as normal. Cloud Interconnect - May see some session disconnects Cloud VPN - May see some session disconnects. Our engineering team continues to investigate the issue. We will provide an update by Tuesday, 2021-05-04 20:30 US/Pacific with current details. We apologize to all who are affected by the disruption. Diagnosis: None at this time. Workaround: None at this time. |
- All times are US/Pacific