Google Cloud Status Dashboard
This page provides status information on the services that are part of Google Cloud Platform. Check back here to view the current status of the services listed below. For additional information on these services, please visit cloud.google.com.
Google Compute Engine Incident #16019
Hurricane Matthew may impact GCP services in us-east1
Incident began at 2016-10-07 17:00 (all times are US/Pacific).
Date | Time | Description | |
---|---|---|---|
Oct 07, 2016 | 17:58 | The Google Cloud Platform team is keeping a close watch on the path of Hurricane Matthew. The National Hurricane Center 3-day forecast indicates that the storm is tracking within 200 miles of the datacenters housing the GCP region us-east1. We do not anticipate any specific service interruptions. Our datacenter is designed to withstand a direct hit from a more powerful hurricane than Matthew without disruption, and we maintain triple-redundant diverse-path backbone networking precisely to be resilient to extreme events. We have staff on site and plan to run services normally. Despite all of the above, it is statistically true that there is an increased risk of a region-level utility grid or other infrastructure disruption which may result in a GCP service interruption. If we anticipate a service interruption – for example, if the regional grid loses power and our datacenter is operating on generator – our protocol is to share specific updates to our customers with a 12 hour notice. |
|
The Google Cloud Platform team is keeping a close watch on the path of Hurricane Matthew. The National Hurricane Center 3-day forecast indicates that the storm is tracking within 200 miles of the datacenters housing the GCP region us-east1. We do not anticipate any specific service interruptions. Our datacenter is designed to withstand a direct hit from a more powerful hurricane than Matthew without disruption, and we maintain triple-redundant diverse-path backbone networking precisely to be resilient to extreme events. We have staff on site and plan to run services normally. Despite all of the above, it is statistically true that there is an increased risk of a region-level utility grid or other infrastructure disruption which may result in a GCP service interruption. If we anticipate a service interruption – for example, if the regional grid loses power and our datacenter is operating on generator – our protocol is to share specific updates to our customers with a 12 hour notice. |
|||
Jan 22, 2015 | 06:04 | SUMMARY: On Tuesday 20 January 2015, for a duration of 103 minutes, Google Compute Engine experienced an issue with outbound network connectivity to certain IP blocks which were used for other Google services. Compute Engine instances were not able to connect to IPs in the affected blocks, meaning that other Google services, including Cloud SQL and BigQuery, were unavailable to some Compute Engine users. We apologize for this disruption in connectivity, and are taking immediate steps to prevent its recurrence. DETAILED DESCRIPTION OF IMPACT: On Tuesday 20 January 2015 from 17:27 to 19:10 PST, the Google Compute Engine routing layer dropped connections to 33% of Google IP blocks. Compute Engine instances were impacted if they attempted to connect to a Google service (including Cloud SQL and BigQuery) and were given an IP in one of the affected blocks by Google DNS servers. 21% of GCE instances, distributed across all Compute Engine zones, attempted connections to the affected blocks during the incident. ROOT CAUSE: At 17:27 a configuration change previously initiated by Google engineers was rolled out to Google Compute Engine routers by an automated system. This change was intended to disable routing traffic to certain IP blocks, but due to a manual misconfiguration, the configured netblocks were too large, and included addresses used by Google services in production. Due to an unrelated process error, this configuration change was rolled out to the Compute Engine routing system before automated tests were run. Once the changes were rolled out, any traffic from a Compute Engine instance destined to Google services in the incorrectly labelled blocks would be erroneously dropped. REMEDIATION AND PREVENTION: Automated network monitoring systems alerted Google engineers of the issue at 17:31. Google engineers identified incorrectly labeled netblocks as the root cause at 18:35, and began deploying a fix at 18:58. As the labels were corrected by the fix, the system began to recover; full recovery was complete at 19:10. Google engineers are correcting the process used to push this network configuration to the Compute Engine routing system to utilize the existing automated tests. Additionally, the process to label netblocks will now include peer review, and the tool used to generate these changes is being improved to reduce the risk of errors. Finally, Google engineers are revising the process by which changes are rolled out to Compute Engine routing infrastructure to ensure changes are rolled out gradually across all zones. |
|
SUMMARY: On Tuesday 20 January 2015, for a duration of 103 minutes, Google Compute Engine experienced an issue with outbound network connectivity to certain IP blocks which were used for other Google services. Compute Engine instances were not able to connect to IPs in the affected blocks, meaning that other Google services, including Cloud SQL and BigQuery, were unavailable to some Compute Engine users. We apologize for this disruption in connectivity, and are taking immediate steps to prevent its recurrence. DETAILED DESCRIPTION OF IMPACT: On Tuesday 20 January 2015 from 17:27 to 19:10 PST, the Google Compute Engine routing layer dropped connections to 33% of Google IP blocks. Compute Engine instances were impacted if they attempted to connect to a Google service (including Cloud SQL and BigQuery) and were given an IP in one of the affected blocks by Google DNS servers. 21% of GCE instances, distributed across all Compute Engine zones, attempted connections to the affected blocks during the incident. ROOT CAUSE: At 17:27 a configuration change previously initiated by Google engineers was rolled out to Google Compute Engine routers by an automated system. This change was intended to disable routing traffic to certain IP blocks, but due to a manual misconfiguration, the configured netblocks were too large, and included addresses used by Google services in production. Due to an unrelated process error, this configuration change was rolled out to the Compute Engine routing system before automated tests were run. Once the changes were rolled out, any traffic from a Compute Engine instance destined to Google services in the incorrectly labelled blocks would be erroneously dropped. REMEDIATION AND PREVENTION: Automated network monitoring systems alerted Google engineers of the issue at 17:31. Google engineers identified incorrectly labeled netblocks as the root cause at 18:35, and began deploying a fix at 18:58. As the labels were corrected by the fix, the system began to recover; full recovery was complete at 19:10. Google engineers are correcting the process used to push this network configuration to the Compute Engine routing system to utilize the existing automated tests. Additionally, the process to label netblocks will now include peer review, and the tool used to generate these changes is being improved to reduce the risk of errors. Finally, Google engineers are revising the process by which changes are rolled out to Compute Engine routing infrastructure to ensure changes are rolled out gradually across all zones. |