Service Health
Incident affecting Google Cloud Storage
Elevated error rates for the JSON and XML APIs January 2015-01-28
Incident began at 2015-01-28 17:01 and ended at 2015-01-28 17:27 (all times are US/Pacific).
Date | Time | Description | |
---|---|---|---|
| 30 Jan 2015 | 09:54 PST | SUMMARY: On Wednesday 28 January 2015, some API calls to BigQuery and Cloud Storage returned errors for a duration of 26 minutes. We apologize if your service or application was affected. We are working hard to avoid a recurrence of this type of issue. DETAILED DESCRIPTION OF IMPACT: On Wednesday 28 January 2015 from 17:01 to 17:27 PST, some API calls to BigQuery and Cloud Storage returned errors. The error rate for BigQuery was 11% during the period of the incident. The error rate for the Cloud Storage XML API was 6%. The error rate for the Cloud Storage JSON API was 12%. The Developers Console returned HTTP 500 errors for 41% of requests for a period of 11 minutes, from 17:01 to 17:12. ROOT CAUSE: The incident resulted from releasing a bad configuration for an internal service, which caused processes to crash. Normally, Google “canaries” new releases, by upgrading a small number of servers and looking for problems before releasing the change everywhere. In this case, the canary process failed to operate correctly, causing a large number of processes to crash. REMEDIATION AND PREVENTION: Automated monitoring detected the issue and alerted our engineers at 17:02, one minute after the start of the incident. We began rolling back the release at 17:16. The roll back was complete by 17:27. We are taking several actions to prevent a future recurrence of this type of incident. We have identified the issue that caused processes to crash, and are fixing the issue that caused the canary process to fail. |
| 28 Jan 2015 | 18:56 PST | We were experiencing an issue with Cloud Storage and the error rates for JSON and XML APIs were elevated and some API calls received 500 errors, beginning at Wednesday 28 January 2015, 17:00 US/Pacific time. The problem was resolved as of 17:30 US/Pacific. We apologize for any issues this may have caused to you or your users and thank you for your patience and continued support. Please rest assured that system reliability is a top priority at Google, and we are constantly working to improve the reliability of our systems. |
- All times are US/Pacific