Service Health
Incident affecting Google App Engine
Google API issues beginning on January 28, 2015
Incident began at 2015-01-28 17:01 and ended at 2015-01-28 17:27 (all times are US/Pacific).
Date | Time | Description | |
---|---|---|---|
| 30 Jan 2015 | 09:53 PST | SUMMARY: On Wednesday 28 January 2015, some API calls to BigQuery and Cloud Storage returned errors for a duration of 26 minutes. We apologize if your service or application was affected. We are working hard to avoid a recurrence of this type of issue. DETAILED DESCRIPTION OF IMPACT: On Wednesday 28 January 2015 from 17:01 to 17:27 PST, some API calls to BigQuery and Cloud Storage returned errors. The error rate for BigQuery was 11% during the period of the incident. The error rate for the Cloud Storage XML API was 6%. The error rate for the Cloud Storage JSON API was 12%. The Developers Console returned HTTP 500 errors for 41% of requests for a period of 11 minutes, from 17:01 to 17:12. ROOT CAUSE: The incident resulted from releasing a bad configuration for an internal service, which caused processes to crash. Normally, Google “canaries” new releases, by upgrading a small number of servers and looking for problems before releasing the change everywhere. In this case, the canary process failed to operate correctly, causing a large number of processes to crash. REMEDIATION AND PREVENTION: Automated monitoring detected the issue and alerted our engineers at 17:02, one minute after the start of the incident. We began rolling back the release at 17:16. The roll back was complete by 17:27. We are taking several actions to prevent a future recurrence of this type of incident. We have identified the issue that caused processes to crash, and are fixing the issue that caused the canary process to fail. |
| 28 Jan 2015 | 18:40 PST | The problem with Google Cloud APIs was resolved as of Wednesday, 2015-01-28 17:28 (all times are in US/Pacific)]. We apologize for the inconvenience and thank you for your patience and continued support. Please rest assured that system reliability is a top priority at Google, and we are making continuous improvements to make our systems better. We will provide a more detailed analysis of this incident once we have completed our internal investigation. |
- All times are US/Pacific