Google Cloud Status Dashboard

This page provides status information on the services that are part of Google Cloud Platform. Check back here to view the current status of the services listed below. If you are experiencing an issue not listed here, please contact Support. Learn more about what's posted on the dashboard in this FAQ. For additional information on these services, please visit cloud.google.com.

Google App Engine Incident #15005

Elevated latency in deploying applications to App Engine

Incident began at 2015-02-24 00:05 and ended at 2015-02-24 01:47 (all times are US/Pacific).

Date Time Description
Feb 25, 2015 09:47

SUMMARY:

On Monday 23 February and Tuesday 24 February 2015, some attempts to deploy new versions of App Engine applications failed for an aggregate period of 356 minutes. We realize that you depend on this service and we apologize if you were affected by this incident. We are taking steps to ensure that incidents of this nature will not happen again.

DETAILED DESCRIPTION OF IMPACT:

On Monday 23 February, 60% of new version deployment requests failed between 11:00 and 14:15 PST. On Tuesday 24 February 2015, 80% of new version deployment requests failed from 00:05 to 01:47. After that the rate of deployment failures decreased linearly until the incident ended at 02:46.

ROOT CAUSE:

The App Engine 1.9.18 release required an update to the settings for all applications across the global serving infrastructure. The propagation of this change is handled by Google’s internal Pub/Sub infrastructure. Posting an update for every one of App Engine’s large number of applications resulted in throttling of messages by this infrastructure. As a result, deployment messages were blocked behind these updates, resulting in timeouts.

REMEDIATION AND PREVENTION:

App Engine began to update application settings on Monday 23 February at 10:47. The deployment failures began at 11:00. Our engineers detected the problem at 13:06 and we began to investigate the root cause. The incident resolved itself at 14:15.

We then made a further update on Monday 23 February at 23:42. The deployment failures began on Tuesday 24 February at 00:05. Our engineers detected the issue at 01:08, and diagnosed the root cause at 01:11. The Pub/Sub infrastructure's throttle limit was increased at 01:47 and the incident ended at 02:46.

We have now increased App Engine’s throttle limits for the Pub/Sub infrastructure and added an alert to our monitoring systems that will be immediately triggered if this type of event recurs.

SUMMARY:

On Monday 23 February and Tuesday 24 February 2015, some attempts to deploy new versions of App Engine applications failed for an aggregate period of 356 minutes. We realize that you depend on this service and we apologize if you were affected by this incident. We are taking steps to ensure that incidents of this nature will not happen again.

DETAILED DESCRIPTION OF IMPACT:

On Monday 23 February, 60% of new version deployment requests failed between 11:00 and 14:15 PST. On Tuesday 24 February 2015, 80% of new version deployment requests failed from 00:05 to 01:47. After that the rate of deployment failures decreased linearly until the incident ended at 02:46.

ROOT CAUSE:

The App Engine 1.9.18 release required an update to the settings for all applications across the global serving infrastructure. The propagation of this change is handled by Google’s internal Pub/Sub infrastructure. Posting an update for every one of App Engine’s large number of applications resulted in throttling of messages by this infrastructure. As a result, deployment messages were blocked behind these updates, resulting in timeouts.

REMEDIATION AND PREVENTION:

App Engine began to update application settings on Monday 23 February at 10:47. The deployment failures began at 11:00. Our engineers detected the problem at 13:06 and we began to investigate the root cause. The incident resolved itself at 14:15.

We then made a further update on Monday 23 February at 23:42. The deployment failures began on Tuesday 24 February at 00:05. Our engineers detected the issue at 01:08, and diagnosed the root cause at 01:11. The Pub/Sub infrastructure's throttle limit was increased at 01:47 and the incident ended at 02:46.

We have now increased App Engine’s throttle limits for the Pub/Sub infrastructure and added an alert to our monitoring systems that will be immediately triggered if this type of event recurs.

Feb 24, 2015 03:24

Starting at Tuesday, 2015-02-24 00:00, Google App Engine showed increased latency when deploying applications possibly leading to timeouts. This incident with Google App Engine deployment was resolved as of Tuesday, 2015-02-24 01:54 (all times are in US/Pacific). We apologize for the inconvenience and thank you for your patience and continued support. Please rest assured that system reliability is a top priority at Google, and we are making continuous improvements to make our systems better.

Starting at Tuesday, 2015-02-24 00:00, Google App Engine showed increased latency when deploying applications possibly leading to timeouts. This incident with Google App Engine deployment was resolved as of Tuesday, 2015-02-24 01:54 (all times are in US/Pacific). We apologize for the inconvenience and thank you for your patience and continued support. Please rest assured that system reliability is a top priority at Google, and we are making continuous improvements to make our systems better.