Service Health

This page provides status information on the services that are part of Google Cloud. Check back here to view the current status of the services listed below. If you are experiencing an issue not listed here, please contact Support. Learn more about what's posted on the dashboard in this FAQ. For additional information on these services, please visit https://cloud.google.com/.

Incident affecting Google App Engine

Datastore Indexing Issue

Incident began at 2015-04-10 11:31 and ended at 2015-04-10 14:00 (all times are US/Pacific).

Date Time Description
15 Apr 2015 12:25 PDT

SUMMARY:

On Friday 10th April 2015, attempts to create or update Datastore indexes failed for some Google App Engine applications for a duration of 148 minutes. In addition, a number of applications retrieved stale data using eventually consistent read operations for an unexpectedly long period. If your service or application was affected, we apologize — this is not the level of quality and reliability we strive to offer you, and we are taking immediate steps to improve the platform’s performance and availability.

DETAILED DESCRIPTION OF IMPACT:

On Friday 10 April 2015 from 11:30 to 13:58 PDT, 331 requests to create or update the definition of Datastore composite indexes across 21 applications failed to complete. In addition, about 34% of applications retrieved stale data using eventually consistent QUERY or GET operations [1]. Unlike strongly consistent queries, it is expected of eventually consistent read operations to return stale data for a brief period. However, this behaviour was extended to a longer duration than that which is typically observed during normal operations. There was no impact on strongly consistent operations.

During the recovery phase of this incident about 7% of Google App Engine applications experienced elevated latency on PUT operations for 15 minutes.

ROOT CAUSE:

During a planned maintenance activity, undertaken to create a new Datastore replica to accommodate organic growth, incorrectly configured automation created an unnecessary large table in the new replica. This resulted in exhaustion of resources allocated to Datastore and write failures to this replica. Once the underlying problem was resolved, a high volume of writes were routed to the new replica, resulting in elevated latency for write operations.

REMEDIATION AND PREVENTION:

At 00:30 PDT on Friday 10th April 2015, an automated alert on resource depletion was sent out to Google Engineers. However, this alert was suppressed, as is normal practice when undertaking this type of maintenance activity. At 11:30 PDT, quota allocated to the replica was exhausted. Google Engineers were notified by internal teams at 12:53 PDT of problems with Datastore indexes. At 13:26 PDT, Google Engineers deleted the problematic large table and started the procedure to reserve additional quota for this storage replica. This took effect at 13:35 PDT and the replica started receiving write requests immediately, which caused a brief increase in latency. Normal operation was restored at 13:58 PDT.

To prevent similar incidents in future, we are modifying our maintenance procedures to avoid suppression of the appropriate alerts, and to ensure that this large table is created under close monitoring.

[1]. Details on eventual and strong consistency on Google Cloud Datastore: https://cloud.google.com/developers/articles/balancing-strong-and-eventual-consistency-with-google-cloud-datastore/#h.tf76fya5nqk8

10 Apr 2015 14:15 PDT

The problem with Google App Engine Datastore was resolved as of Friday 2015-04-10 14:00 (US/Pacific). We apologize for the inconvenience and thank you for your patience and continued support. Please rest assured that system reliability is a top priority at Google, and we are making continuous improvements to make our systems better. We will provide a more detailed analysis of this incident once we have completed our internal investigation.

10 Apr 2015 13:45 PDT

We are currently experiencing an issue with Google App Engine Datastore. Some applications' Datastore indexes are not updating. For everyone who is affected, we apologize for any inconvenience you may be experiencing. We will provide an update by Friday 2015-10-04 14:45 (US/Pacific) with current details. Google Engineers have identified the cause and are currently working on multiple resolution strategies.

10 Apr 2015 13:09 PDT

We're investigating an issue with Google App Engine Datastore beginning at Friday 2015-04-10 12:30 (all times are in US/Pacific). We will provide more information shortly within one hour.