Google Cloud Service Health

Google Cloud Service Health
Incidents
We are currently investigating connectivity issues with Cloud SQL instances. I will post an new update in 30 minutes.

This page provides status information on the services that are part of Google Cloud. Check back here to view the current status of the services listed below. If you are experiencing an issue not listed here, please contact Support. Learn more about what's posted on the dashboard in this FAQ. For additional information on these services, please visit https://cloud.google.com/.

Available
Service information
Service disruption
Service outage

Incident affecting Google Cloud SQL

We are currently investigating connectivity issues with Cloud SQL instances. I will post an new update in 30 minutes.

Incident began at 2015-01-20 17:31 and ended at 2015-01-20 19:05 (all times are US/Pacific).

Date	Time	Description
22 Jan 2015	06:01 PST	SUMMARY: On Tuesday 20 January 2015, for a duration of 103 minutes, Google Compute Engine experienced an issue with outbound network connectivity to certain IP blocks which were used for other Google services. Compute Engine instances were not able to connect to IPs in the affected blocks, meaning that other Google services, including Cloud SQL and BigQuery, were unavailable to some Compute Engine users. We apologize for this disruption in connectivity, and are taking immediate steps to prevent its recurrence. DETAILED DESCRIPTION OF IMPACT: On Tuesday 20 January 2015 from 17:27 to 19:10 PST, the Google Compute Engine routing layer dropped connections to 33% of Google IP blocks. Compute Engine instances were impacted if they attempted to connect to a Google service (including Cloud SQL and BigQuery) and were given an IP in one of the affected blocks by Google DNS servers. 21% of GCE instances, distributed across all Compute Engine zones, attempted connections to the affected blocks during the incident. ROOT CAUSE: At 17:27 a configuration change previously initiated by Google engineers was rolled out to Google Compute Engine routers by an automated system. This change was intended to disable routing traffic to certain IP blocks, but due to a manual misconfiguration, the configured netblocks were too large, and included addresses used by Google services in production. Due to an unrelated process error, this configuration change was rolled out to the Compute Engine routing system before automated tests were run. Once the changes were rolled out, any traffic from a Compute Engine instance destined to Google services in the incorrectly labelled blocks would be erroneously dropped. REMEDIATION AND PREVENTION: Automated network monitoring systems alerted Google engineers of the issue at 17:31. Google engineers identified incorrectly labeled netblocks as the root cause at 18:35, and began deploying a fix at 18:58. As the labels were corrected by the fix, the system began to recover; full recovery was complete at 19:10. Google engineers are correcting the process used to push this network configuration to the Compute Engine routing system to utilize the existing automated tests. Additionally, the process to label netblocks will now include peer review, and the tool used to generate these changes is being improved to reduce the risk of errors. Finally, Google engineers are revising the process by which changes are rolled out to Compute Engine routing infrastructure to ensure changes are rolled out gradually across all zones.
20 Jan 2015	19:29 PST	The fix has been fully rolled out at Tuesday, 2015-01-20 07:13 PM US/Pacific time. We apologize for any issues this may have caused to you or your users and thank you for your patience and continued support. Please rest assured that system reliability is a top priority at Google, and we are constantly working to improve the reliability of our systems.
20 Jan 2015	18:49 PST	We have identified the component causing the issue and a fix is being rolled out.

Date

Time

Description

22 Jan 2015

06:01 PST

SUMMARY: On Tuesday 20 January 2015, for a duration of 103 minutes, Google Compute Engine experienced an issue with outbound network connectivity to certain IP blocks which were used for other Google services. Compute Engine instances were not able to connect to IPs in the affected blocks, meaning that other Google services, including Cloud SQL and BigQuery, were unavailable to some Compute Engine users. We apologize for this disruption in connectivity, and are taking immediate steps to prevent its recurrence.

DETAILED DESCRIPTION OF IMPACT: On Tuesday 20 January 2015 from 17:27 to 19:10 PST, the Google Compute Engine routing layer dropped connections to 33% of Google IP blocks. Compute Engine instances were impacted if they attempted to connect to a Google service (including Cloud SQL and BigQuery) and were given an IP in one of the affected blocks by Google DNS servers. 21% of GCE instances, distributed across all Compute Engine zones, attempted connections to the affected blocks during the incident.

ROOT CAUSE: At 17:27 a configuration change previously initiated by Google engineers was rolled out to Google Compute Engine routers by an automated system. This change was intended to disable routing traffic to certain IP blocks, but due to a manual misconfiguration, the configured netblocks were too large, and included addresses used by Google services in production. Due to an unrelated process error, this configuration change was rolled out to the Compute Engine routing system before automated tests were run. Once the changes were rolled out, any traffic from a Compute Engine instance destined to Google services in the incorrectly labelled blocks would be erroneously dropped.

REMEDIATION AND PREVENTION: Automated network monitoring systems alerted Google engineers of the issue at 17:31. Google engineers identified incorrectly labeled netblocks as the root cause at 18:35, and began deploying a fix at 18:58. As the labels were corrected by the fix, the system began to recover; full recovery was complete at 19:10.

Google engineers are correcting the process used to push this network configuration to the Compute Engine routing system to utilize the existing automated tests. Additionally, the process to label netblocks will now include peer review, and the tool used to generate these changes is being improved to reduce the risk of errors. Finally, Google engineers are revising the process by which changes are rolled out to Compute Engine routing infrastructure to ensure changes are rolled out gradually across all zones.

20 Jan 2015

19:29 PST

The fix has been fully rolled out at Tuesday, 2015-01-20 07:13 PM US/Pacific time.

We apologize for any issues this may have caused to you or your users and thank you for your patience and continued support. Please rest assured that system reliability is a top priority at Google, and we are constantly working to improve the reliability of our systems.

20 Jan 2015

18:49 PST

We have identified the component causing the issue and a fix is being rolled out.

All times are US/Pacific