Google Cloud Service Health

Google Cloud Service Health
Incidents
We're investigating an issue with Google App Engine that occurred Thursday 2014-06-05 between 12:00 PM and 13:00 PM. We will provide more information shortly.

This page provides status information on the services that are part of Google Cloud. Check back here to view the current status of the services listed below. If you are experiencing an issue not listed here, please contact Support. Learn more about what's posted on the dashboard in this FAQ. For additional information on these services, please visit https://cloud.google.com/.

Available
Service information
Service disruption
Service outage

Incident affecting Google App Engine

We're investigating an issue with Google App Engine that occurred Thursday 2014-06-05 between 12:00 PM and 13:00 PM. We will provide more information shortly.

Incident began at 2014-06-05 11:57 and ended at 2014-06-05 12:40 (all times are US/Pacific).

Date	Time	Description
9 Jul 2014	09:52 PDT	SUMMARY: On Thursday 5 June 2014, some Google App Engine applications had elevated latency for a period of 43 minutes. If your service or application was affected, we apologize — this is not the level of quality and reliability we strive to offer you, and we have taken and are taking immediate steps to improve the platform’s performance and availability. DETAILED DESCRIPTION OF IMPACT: On Thursday 5 June 2014 from 11:57 AM to 12:40 PM US/Pacific, some App Engine applications using the High Replication Datastore in the US had elevated latency. 39% of applications were affected. The median latency for affected applications increased by 56% aggregated over the duration of the incident compared to earlier in the day. In addition, the URL Fetch service had elevated latency and errors. The Datastore service had elevated latency for operations. ROOT CAUSE: The incident was triggered by issues in the storage layer in one datacenter starting at 11:30 AM. Google engineers redirected traffic to another datacenter at 11:57 AM. This caused a short-term spike in load in that datacenter because requests for the moved applications were slower due to cold caches. In addition, the increase in Datastore queries to re-populate Memcache which had been flushed caused a spike in network traffic leading to packet loss. The packet loss caused the slow Datastore operations and also URL Fetch errors and elevated latency. REMEDIATION AND PREVENTION: After the initial traffic redirect, Google engineers re-balanced traffic among US datacenters to spread out the load. To prevent a recurrence, we will be spreading App Engine’s load among more datacenters, so that a single datacenter issue has reduced latency impact. We are also developing a method of reducing the fraction of Memcache content flushed in most cases when an application moves to a new datacenter.
9 Jul 2014	09:51 PDT	The problem with Google App Engine was resolved as of Thursday, 2014-06-05 12:45 PM US/Pacific. Some applications experienced increased latency serving requests and when using the Datastore API and the URL Fetch API. We apologize for the inconvenience and thank you for your patience and continued support. Please rest assured that system reliability is a top priority at Google, and we are making continuous improvements to make our systems better.
9 Jul 2014	09:51 PDT	We're investigating an issue with Google App Engine that occurred Thursday 2014-06-05 between 12:00 PM and 13:00 PM. We will provide more information shortly.

Date

Time

Description

9 Jul 2014

09:52 PDT

SUMMARY: On Thursday 5 June 2014, some Google App Engine applications had elevated latency for a period of 43 minutes. If your service or application was affected, we apologize — this is not the level of quality and reliability we strive to offer you, and we have taken and are taking immediate steps to improve the platform’s performance and availability.

DETAILED DESCRIPTION OF IMPACT: On Thursday 5 June 2014 from 11:57 AM to 12:40 PM US/Pacific, some App Engine applications using the High Replication Datastore in the US had elevated latency. 39% of applications were affected. The median latency for affected applications increased by 56% aggregated over the duration of the incident compared to earlier in the day. In addition, the URL Fetch service had elevated latency and errors. The Datastore service had elevated latency for operations.

ROOT CAUSE: The incident was triggered by issues in the storage layer in one datacenter starting at 11:30 AM. Google engineers redirected traffic to another datacenter at 11:57 AM. This caused a short-term spike in load in that datacenter because requests for the moved applications were slower due to cold caches. In addition, the increase in Datastore queries to re-populate Memcache which had been flushed caused a spike in network traffic leading to packet loss. The packet loss caused the slow Datastore operations and also URL Fetch errors and elevated latency.

REMEDIATION AND PREVENTION: After the initial traffic redirect, Google engineers re-balanced traffic among US datacenters to spread out the load. To prevent a recurrence, we will be spreading App Engine’s load among more datacenters, so that a single datacenter issue has reduced latency impact. We are also developing a method of reducing the fraction of Memcache content flushed in most cases when an application moves to a new datacenter.

9 Jul 2014

09:51 PDT

The problem with Google App Engine was resolved as of Thursday, 2014-06-05 12:45 PM US/Pacific. Some applications experienced increased latency serving requests and when using the Datastore API and the URL Fetch API. We apologize for the inconvenience and thank you for your patience and continued support. Please rest assured that system reliability is a top priority at Google, and we are making continuous improvements to make our systems better.

9 Jul 2014

09:51 PDT

We're investigating an issue with Google App Engine that occurred Thursday 2014-06-05 between 12:00 PM and 13:00 PM. We will provide more information shortly.

All times are US/Pacific