Google Cloud Service Health

Google Cloud Service Health
Incidents
On October 2014-10-22 from 14:45 to 15:35 PDT Google Cloud Storage users saw an increase in HTTP 5xx error rates. We apologize for the inconvenience and thank you for your patience and continued support. Please rest assured that system reliability is a top priority at Google, and we are making continuous improvements to make our systems better. We will provide a more detailed analysis of this incident once we have completed our internal investigation.

This page provides status information on the services that are part of Google Cloud. Check back here to view the current status of the services listed below. If you are experiencing an issue not listed here, please contact Support. Learn more about what's posted on the dashboard in this FAQ. For additional information on these services, please visit https://cloud.google.com/.

For incidents related to Google Security Products, visit https://status.cloud.google.com/security.

Available
Service information
Service disruption
Service outage

Incident affecting Google Cloud Storage

On October 2014-10-22 from 14:45 to 15:35 PDT Google Cloud Storage users saw an increase in HTTP 5xx error rates. We apologize for the inconvenience and thank you for your patience and continued support. Please rest assured that system reliability is a top priority at Google, and we are making continuous improvements to make our systems better. We will provide a more detailed analysis of this incident once we have completed our internal investigation.

Incident began at 2014-10-22 14:30 and ended at 2014-10-22 15:32 (all times are US/Pacific).

	Date	Time	Description
	23 Oct 2014	15:41 PDT	SUMMARY: On Wednesday 22 October 2014, some users of Google BigQuery, Google Cloud Storage and the Google Developers Console experienced elevated error rates for a period of 62 minutes. We apologize if your service or application was affected. We take these incidents extremely seriously and are taking immediate steps to ensure that this type of incident does not happen again. DETAILED DESCRIPTION OF IMPACT: On Wednesday 22 October 2014 from 14:30 to 15:32 PDT the following Google Cloud Platform services experienced elevated error rates: the Developers Console experienced a 96% error rate; BigQuery experienced a 45% error rate; and Cloud Storage experienced a 6.0% error rate for the XML API and a 3.4% error rate for the JSON API. ROOT CAUSE: The incident occurred when Google’s internal project metadata store experienced elevated latency, which was caused by an internal system writing to the metadata store at a higher rate than normal. BigQuery, Cloud Storage and the Developers Console frequently need to read project metadata in order to handle requests. REMEDIATION AND PREVENTION: Google engineers detected the incident at 14:36. We identified the root cause as the project metadata store at 14:58. At 15:06, we disabled the component of the metadata store that was responsible for the increased latency. The rollout of this fix was completed by 15:32. In order to prevent future problems of this nature from happening again, we will fix the performance issue in the metadata store that caused its latency to increase when under higher load. For Cloud Storage and BigQuery, we will improve caching of project metadata so that a failure of the metadata store has less impact on the service. We will also make monitoring improvements to get earlier detection and faster diagnosis of problems with the metadata store.

Date

Time

Description

23 Oct 2014

15:41 PDT

SUMMARY: On Wednesday 22 October 2014, some users of Google BigQuery, Google Cloud Storage and the Google Developers Console experienced elevated error rates for a period of 62 minutes. We apologize if your service or application was affected. We take these incidents extremely seriously and are taking immediate steps to ensure that this type of incident does not happen again.

DETAILED DESCRIPTION OF IMPACT: On Wednesday 22 October 2014 from 14:30 to 15:32 PDT the following Google Cloud Platform services experienced elevated error rates: the Developers Console experienced a 96% error rate; BigQuery experienced a 45% error rate; and Cloud Storage experienced a 6.0% error rate for the XML API and a 3.4% error rate for the JSON API.

ROOT CAUSE: The incident occurred when Google’s internal project metadata store experienced elevated latency, which was caused by an internal system writing to the metadata store at a higher rate than normal. BigQuery, Cloud Storage and the Developers Console frequently need to read project metadata in order to handle requests.

REMEDIATION AND PREVENTION: Google engineers detected the incident at 14:36. We identified the root cause as the project metadata store at 14:58. At 15:06, we disabled the component of the metadata store that was responsible for the increased latency. The rollout of this fix was completed by 15:32.

In order to prevent future problems of this nature from happening again, we will fix the performance issue in the metadata store that caused its latency to increase when under higher load. For Cloud Storage and BigQuery, we will improve caching of project metadata so that a failure of the metadata store has less impact on the service. We will also make monitoring improvements to get earlier detection and faster diagnosis of problems with the metadata store.

All times are US/Pacific