Google Cloud Status Dashboard
This page provides status information on the services that are part of Google Cloud Platform. Check back here to view the current status of the services listed below. If you are experiencing an issue not listed here, please contact Support. Learn more about what's posted on the dashboard in this FAQ. For additional information on these services, please visit cloud.google.com.
Google Cloud Networking Incident #18019
We are investigating an issue with Google Cloud Networking in the zone europe-west1-b. We will provide more information by Wednesday, 2018-12-19 07:00 US/Pacific.
Incident began at 2018-12-19 06:00 and ended at 2018-12-19 06:39 (all times are US/Pacific).
Date | Time | Description | |
---|---|---|---|
Dec 21, 2018 | 14:49 | ISSUE SUMMARY On Wednesday 19 December 2018 multiple GCP services in europe-west1-b experienced a disruption for a duration of 34 minutes. Several GCP services were impacted: GCE, Monitoring, Cloud Console, GAE Admin API, Task Queues, Cloud Spanner, Cloud SQL, GKE, Cloud Bigtable, and Redis. GCP services in all other zones remained unaffected. This service disruption was caused by an erroneous trigger leading to a switch re-installation during upgrades to two control plane network (CPN) switches impacting a portion of europe-west1-b. Most impacted GCP services in the zone recovered within a few minutes after the issue was mitigated. We understand that these services are critical to our customers and sincerely apologize for the disruption caused by this incident. To prevent the issue from recurring we are fixing our repair workflows to catch such errors before serving traffic. DETAILED DESCRIPTION OF IMPACT On Wednesday 19 December 2018 from 05:53 to 06:27 US/Pacific, multiple GCP services in europe-west1-b experienced disruption due to a network outage in one of Google’s data centers. The following Google Cloud Services in europe-west1-b were impacted: GCE instance creation, GCE networking, Cloud VPN, Cloud Interconnect, Stackdriver Monitoring API, Cloud Console, App Engine Admin API, App Engine Task Queues, Cloud Spanner, Cloud SQL, GKE, Cloud Bigtable, and Cloud Memorystore for Redis. Most of these services suffered a brief disruption during the duration of the incident and recovered when the issue was mitigated.
ROOT CAUSE As part of a program to upgrade network switches in control plane networks across Google’s data center, two control plane network (CPN) switches supporting a single CPN were scheduled to undergo upgrades. On December 17, the first switch was upgraded and was back online the same day. The issue triggered on December 19 when the second switch was due to be upgraded. During the upgrade of the second switch, a reinstallation was erroneously triggered on the first switch, causing it to go offline for a short period of time. Having both switches down partitioned the network supporting a portion of europe-west1-b. Due to this isolation, the zone was left partially functional. REMEDIATION AND PREVENTION The issue was mitigated at 06:27 US/Pacific when reinstallation of the first switch in the CPN completed. To prevent the issue from recurring we are changing the switch upgrade workflow to prevent erroneous triggers. The trigger inadvertently caused the switch to re-install before any CPN switch is deemed healthy to serve traffic. We are also adding additional checks to make sure upgraded devices are in full functional state before they are deemed healthy to start serving. We will also be improving our automation to catch offline peer devices sooner and help prevent related issues. |
|
ISSUE SUMMARY On Wednesday 19 December 2018 multiple GCP services in europe-west1-b experienced a disruption for a duration of 34 minutes. Several GCP services were impacted: GCE, Monitoring, Cloud Console, GAE Admin API, Task Queues, Cloud Spanner, Cloud SQL, GKE, Cloud Bigtable, and Redis. GCP services in all other zones remained unaffected. This service disruption was caused by an erroneous trigger leading to a switch re-installation during upgrades to two control plane network (CPN) switches impacting a portion of europe-west1-b. Most impacted GCP services in the zone recovered within a few minutes after the issue was mitigated. We understand that these services are critical to our customers and sincerely apologize for the disruption caused by this incident. To prevent the issue from recurring we are fixing our repair workflows to catch such errors before serving traffic. DETAILED DESCRIPTION OF IMPACT On Wednesday 19 December 2018 from 05:53 to 06:27 US/Pacific, multiple GCP services in europe-west1-b experienced disruption due to a network outage in one of Google’s data centers. The following Google Cloud Services in europe-west1-b were impacted: GCE instance creation, GCE networking, Cloud VPN, Cloud Interconnect, Stackdriver Monitoring API, Cloud Console, App Engine Admin API, App Engine Task Queues, Cloud Spanner, Cloud SQL, GKE, Cloud Bigtable, and Cloud Memorystore for Redis. Most of these services suffered a brief disruption during the duration of the incident and recovered when the issue was mitigated.
ROOT CAUSE As part of a program to upgrade network switches in control plane networks across Google’s data center, two control plane network (CPN) switches supporting a single CPN were scheduled to undergo upgrades. On December 17, the first switch was upgraded and was back online the same day. The issue triggered on December 19 when the second switch was due to be upgraded. During the upgrade of the second switch, a reinstallation was erroneously triggered on the first switch, causing it to go offline for a short period of time. Having both switches down partitioned the network supporting a portion of europe-west1-b. Due to this isolation, the zone was left partially functional. REMEDIATION AND PREVENTION The issue was mitigated at 06:27 US/Pacific when reinstallation of the first switch in the CPN completed. To prevent the issue from recurring we are changing the switch upgrade workflow to prevent erroneous triggers. The trigger inadvertently caused the switch to re-install before any CPN switch is deemed healthy to serve traffic. We are also adding additional checks to make sure upgraded devices are in full functional state before they are deemed healthy to start serving. We will also be improving our automation to catch offline peer devices sooner and help prevent related issues. |
|||
Dec 19, 2018 | 06:39 | The Google Cloud Networking issue in zone europe-west1-b has been resolved. No further updates will be provided here. |
|
The Google Cloud Networking issue in zone europe-west1-b has been resolved. No further updates will be provided here. |
|||
Dec 19, 2018 | 06:29 | We are investigating an issue with Google Cloud Networking in the zone europe-west1-b. We will provide more information by Wednesday, 2018-12-19 07:00 US/Pacific. |
|
We are investigating an issue with Google Cloud Networking in the zone europe-west1-b. We will provide more information by Wednesday, 2018-12-19 07:00 US/Pacific. |
|||
Dec 19, 2018 | 06:29 | We are investigating an issue with Google Cloud Networking in the zone europe-west1-b. We will provide more information by Wednesday, 2018-12-19 07:00 US/Pacific. |
|
We are investigating an issue with Google Cloud Networking in the zone europe-west1-b. We will provide more information by Wednesday, 2018-12-19 07:00 US/Pacific. |