Service Health
Incident affecting Google App Engine
Task queue delays in dispatching tasks. Files API errors creating files.
Incident began at 2014-09-29 19:30 and ended at 2014-09-30 09:00 (all times are US/Pacific).
Date | Time | Description | |
---|---|---|---|
| 1 Oct 2014 | 13:55 PDT | SUMMARY: On Monday 29 September 2014, some Google App Engine applications using the Task Queue API experienced a decrease in the dispatch rate for tasks for a period of 2 hours and 28 minutes. In addition, on Monday 29 September and Tuesday 30 September 2014, some App Engine applications experienced errors when creating files using the Files API for a period of 11 hours and 2 minutes. We hold ourselves to a high standard, and we failed to meet that standard. We are taking action to ensure that incidents like this do not happen in the future. DETAILED DESCRIPTION OF IMPACT: From Monday 29 September 2014 19:30 to 21:58 PDT, 29% of App Engine applications using the Task Queue API in US datacenters experienced a decrease in the dispatch rate for tasks. During the incident, tasks were dispatched at 78% of the rate seen during the previous day at the same time. From Monday 29 September 21:58 until Tuesday 30 September 09:00, 27% of App Engine applications using the Files API in US datacenters experienced errors when creating files. The error rate for affected applications during this period was 95%. ROOT CAUSE: Both the task queue dispatch issue and Files API issue were ultimately caused by a failure in the storage layer in one US datacenter. Initially, the impact of the storage layer issue was limited to a drop in the task queue dispatch rate. We later determined that its impact would become more severe. We therefore redirected all App Engine traffic to other datacenters. This change exposed a latent misconfiguration in the Files API, which caused affected applications to experience errors when creating files. REMEDIATION AND PREVENTION: The App Engine support team received the first customer report of a drop in the task queue dispatch rate at 20:31. To resolve this issue, our engineers moved task queue operations for affected applications to other datacenters at 21:58. At 22:54, our engineers moved all App Engine traffic away from the affected datacenter, which led to the Files API errors. Our engineers diagnosed and fixed the Files API issue at 07:41. The fix was fully rolled out to all affected customers by 09:00. For customers using the Files API, which is now deprecated (http://googlecloudplatform.blogspot.com/2013/06/google-app-engine-181-released.html), we recommend that you migrate your code to use the Cloud Storage client library instead: https://cloud.google.com/appengine/docs/java/googlecloudstorageclient/ https://cloud.google.com/appengine/docs/python/googlecloudstorageclient/ Our support team will contact customers that make significant use of the Files API and provide help to move their code to a fully supported solution. |
| 29 Sep 2014 | 23:04 PDT | The problem with Google App Engine Task Queue lower processing rate was fully resolved as of Monday, 2014-09-29 22:13 US/Pacific. We apologize for the inconvenience and thank you for your patience and continued support. Please rest assured that system reliability is a top priority at Google, and we are making continuous improvements to make our systems better. |
| 29 Sep 2014 | 23:02 PDT | We're investigating an issue with Google App Engine Task Queue beginning at Monday, 2014-09-29 19:30 US/Pacific. We will provide more information within the next 30 minutes. |
- All times are US/Pacific