Service Health
Incident affecting Google Cloud Storage
On October 2014-10-22 from 14:45 to 15:35 PDT Google Cloud Storage users saw an increase in HTTP 5xx error rates. We apologize for the inconvenience and thank you for your patience and continued support. Please rest assured that system reliability is a top priority at Google, and we are making continuous improvements to make our systems better. We will provide a more detailed analysis of this incident once we have completed our internal investigation.
Incident began at 2014-10-22 14:30 and ended at 2014-10-22 15:32 (all times are US/Pacific).
Date | Time | Description | |
---|---|---|---|
| 23 Oct 2014 | 15:41 PDT | SUMMARY: On Wednesday 22 October 2014, some users of Google BigQuery, Google Cloud Storage and the Google Developers Console experienced elevated error rates for a period of 62 minutes. We apologize if your service or application was affected. We take these incidents extremely seriously and are taking immediate steps to ensure that this type of incident does not happen again. DETAILED DESCRIPTION OF IMPACT: On Wednesday 22 October 2014 from 14:30 to 15:32 PDT the following Google Cloud Platform services experienced elevated error rates: the Developers Console experienced a 96% error rate; BigQuery experienced a 45% error rate; and Cloud Storage experienced a 6.0% error rate for the XML API and a 3.4% error rate for the JSON API. ROOT CAUSE: The incident occurred when Google’s internal project metadata store experienced elevated latency, which was caused by an internal system writing to the metadata store at a higher rate than normal. BigQuery, Cloud Storage and the Developers Console frequently need to read project metadata in order to handle requests. REMEDIATION AND PREVENTION: Google engineers detected the incident at 14:36. We identified the root cause as the project metadata store at 14:58. At 15:06, we disabled the component of the metadata store that was responsible for the increased latency. The rollout of this fix was completed by 15:32. In order to prevent future problems of this nature from happening again, we will fix the performance issue in the metadata store that caused its latency to increase when under higher load. For Cloud Storage and BigQuery, we will improve caching of project metadata so that a failure of the metadata store has less impact on the service. We will also make monitoring improvements to get earlier detection and faster diagnosis of problems with the metadata store. |
- All times are US/Pacific