Service Health
Incident affecting Google Cloud Storage
High error rate of requests to Google Cloud Storage
Incident began at 2015-08-08 20:22 and ended at 2015-08-08 22:50 (all times are US/Pacific).
Date | Time | Description | |
---|---|---|---|
| 10 Aug 2015 | 23:10 PDT | SUMMARY: On Saturday 8 August 2015, Google Cloud Storage served an elevated error rate for a duration of 139 minutes. If your service or application was affected, we apologize. We have taken an initial set of actions to prevent recurrence of the problem, and have a larger set of changes which will provide defense in depth currently under review by the engineering teams. DETAILED DESCRIPTION OF IMPACT: On Saturday 8 August 2015 from 20:21 to 22:40 PDT, Google Cloud Storage returned a high rate of error responses to queries. The average error rate during this time was 28.4%, with an initial peak of 47% at 20:27. Error levels gradually decreased subsequently, with intermediate periods of normal operation from 21:46-21:54 and 22:04-22:10. Usage was equally affected across the Google Cloud Storage XML and JSON APIs. ROOT CAUSE: The elevated GCS error rate was induced by a large increase in traffic compared to normal levels. The traffic surge was exacerbated by retries from software clients receiving errors. The GCS errors were principally served to the sources which were generating the unusual traffic levels, but a fraction of the errors were served to other users as well. REMEDIATION AND PREVENTION: Google engineers were alerted to the elevated error rate by automated monitoring, and took steps to both reduce the impact and to increase capacity to mitigate the duration and severity of the incident for GCS users. In parallel, Google’s support team contacted the system owners which were generating the bulk of unexpected traffic, and helped them reduce their demand. The combination of these two actions resolved the incident. To prevent a potential future recurrence, Google’s engineering team have made or are making a number of changes to GCS, including:
|
| 8 Aug 2015 | 23:03 PDT | The issue with Google Cloud Storage should be resolved for all affected projects as of 22:50 US/Pacific. We will conduct an internal investigation of this issue and make appropriate improvements to our systems to prevent or minimize future recurrence. We will provide a more detailed analysis of this incident once we have completed our internal investigation. |
| 8 Aug 2015 | 22:29 PDT | We are still investigating the issue with Google Cloud Storage. Current data indicates that the error rate is improving for most projects affected by this issue. We will provide another status update by 23:00 US/Pacific with current details. |
| 8 Aug 2015 | 22:03 PDT | We are investigating reports of an issue with Google Cloud Storage. We will provide more information by 22:30 US/Pacific. |
- All times are US/Pacific