Google Cloud Service Health

Google Cloud Service Health
Incidents
Elevated error rates for the JSON and XML APIs January 2015-01-28

This page provides status information on the services that are part of Google Cloud. Check back here to view the current status of the services listed below. If you are experiencing an issue not listed here, please contact Support. Learn more about what's posted on the dashboard in this FAQ. For additional information on these services, please visit https://cloud.google.com/.

Available
Service information
Service disruption
Service outage

Incident affecting Google Cloud Storage

Elevated error rates for the JSON and XML APIs January 2015-01-28

Incident began at 2015-01-28 17:01 and ended at 2015-01-28 17:27 (all times are US/Pacific).

	Date	Time	Description
	30 Jan 2015	09:54 PST	SUMMARY: On Wednesday 28 January 2015, some API calls to BigQuery and Cloud Storage returned errors for a duration of 26 minutes. We apologize if your service or application was affected. We are working hard to avoid a recurrence of this type of issue. DETAILED DESCRIPTION OF IMPACT: On Wednesday 28 January 2015 from 17:01 to 17:27 PST, some API calls to BigQuery and Cloud Storage returned errors. The error rate for BigQuery was 11% during the period of the incident. The error rate for the Cloud Storage XML API was 6%. The error rate for the Cloud Storage JSON API was 12%. The Developers Console returned HTTP 500 errors for 41% of requests for a period of 11 minutes, from 17:01 to 17:12. ROOT CAUSE: The incident resulted from releasing a bad configuration for an internal service, which caused processes to crash. Normally, Google “canaries” new releases, by upgrading a small number of servers and looking for problems before releasing the change everywhere. In this case, the canary process failed to operate correctly, causing a large number of processes to crash. REMEDIATION AND PREVENTION: Automated monitoring detected the issue and alerted our engineers at 17:02, one minute after the start of the incident. We began rolling back the release at 17:16. The roll back was complete by 17:27. We are taking several actions to prevent a future recurrence of this type of incident. We have identified the issue that caused processes to crash, and are fixing the issue that caused the canary process to fail.
	28 Jan 2015	18:56 PST	We were experiencing an issue with Cloud Storage and the error rates for JSON and XML APIs were elevated and some API calls received 500 errors, beginning at Wednesday 28 January 2015, 17:00 US/Pacific time. The problem was resolved as of 17:30 US/Pacific. We apologize for any issues this may have caused to you or your users and thank you for your patience and continued support. Please rest assured that system reliability is a top priority at Google, and we are constantly working to improve the reliability of our systems.

Date

Time

Description

30 Jan 2015

09:54 PST

SUMMARY:

On Wednesday 28 January 2015, some API calls to BigQuery and Cloud Storage returned errors for a duration of 26 minutes. We apologize if your service or application was affected. We are working hard to avoid a recurrence of this type of issue.

DETAILED DESCRIPTION OF IMPACT:

On Wednesday 28 January 2015 from 17:01 to 17:27 PST, some API calls to BigQuery and Cloud Storage returned errors. The error rate for BigQuery was 11% during the period of the incident. The error rate for the Cloud Storage XML API was 6%. The error rate for the Cloud Storage JSON API was 12%. The Developers Console returned HTTP 500 errors for 41% of requests for a period of 11 minutes, from 17:01 to 17:12.

ROOT CAUSE:

The incident resulted from releasing a bad configuration for an internal service, which caused processes to crash. Normally, Google “canaries” new releases, by upgrading a small number of servers and looking for problems before releasing the change everywhere. In this case, the canary process failed to operate correctly, causing a large number of processes to crash.

REMEDIATION AND PREVENTION:

Automated monitoring detected the issue and alerted our engineers at 17:02, one minute after the start of the incident. We began rolling back the release at 17:16. The roll back was complete by 17:27.

We are taking several actions to prevent a future recurrence of this type of incident. We have identified the issue that caused processes to crash, and are fixing the issue that caused the canary process to fail.

28 Jan 2015

18:56 PST

We were experiencing an issue with Cloud Storage and the error rates for JSON and XML APIs were elevated and some API calls received 500 errors, beginning at Wednesday 28 January 2015, 17:00 US/Pacific time. The problem was resolved as of 17:30 US/Pacific. We apologize for any issues this may have caused to you or your users and thank you for your patience and continued support. Please rest assured that system reliability is a top priority at Google, and we are constantly working to improve the reliability of our systems.

All times are US/Pacific