Service Health
Incident affecting Google App Engine
We have noticed an issue impacting the networking to Google App Engine. Issue started at 2015/01/20 18:24 (all times are in US/Pacific). The problem was resolved as of 2015/01/20 18:42. We apologize for the inconvenience and thank you for your patience and continued support. Please rest assured that system reliability is a top priority at Google, and we are making continuous improvements to make our systems better.
Incident began at 2015-01-20 10:24 and ended at 2015-01-20 19:08 (all times are US/Pacific).
Date | Time | Description | |
---|---|---|---|
| 21 Jan 2015 | 21:10 PST | SUMMARY: On Tuesday 20 January 2015, some Google App Engine applications experienced elevated rates of HTTP 500 errors for a duration of 11 minutes. We apologize if you were affected by this incident. We are working hard to prevent incidents like this from recurring in future. DETAILED DESCRIPTION OF IMPACT: On Tuesday 20 January 2015, some Google App Engine apps experienced elevated rates of HTTP 500 errors during the following time intervals: 18:24 - 18:27, 18:36 - 18:41, and 19:06 - 19:08 (all times in PST). The issue affected 13% of applications. This issue caused 3% of requests to App Engine to receive 500 errors during the 11 minutes of the incident. ROOT CAUSE: The issue was caused by an error in the software-defined networking control system responsible for network traffic between Google datacenters. The system incorrectly determined that there had been a drop in network capacity available to App Engine applications in one datacenter. REMEDIATION AND PREVENTION: Our engineers received an automated alert for the issue at 18:42. At 18:55, we redirected some traffic away from the affected datacenter. The system returned to stability at 19:08. To prevent a recurrence of this issue, we will disable the subsystem which malfunctioned until both a fix for the immediate malfunction and a defense in depth have been deployed. |
- All times are US/Pacific