Google Cloud Status Dashboard
Incident affecting Google App Engine
App Engine seeing elevated error rates
Incident began at 2018-02-15 11:42 and ended at 2018-02-15 12:44 (all times are US/Pacific).
|22 Feb 2018||07:13 PST|| |
On Thursday 15 February 2018, specific Google Cloud Platform services experienced elevated errors and latency for a period of 62 minutes from 11:42 to 12:44 PST. The following services were impacted:
Cloud Datastore experienced a 4% error rate for get calls and an 88% error rate for put calls.
App Engine's serving infrastructure, which is responsible for routing requests to instances, experienced a 45% error rate, most of which were timeouts.
App Engine Task Queues would not accept new transactional tasks, and also would not accept new tasks in regions outside us-central1 and europe-west1. Tasks continued to be dispatched during the event but saw start delays of 0-30 minutes; additionally, a fraction of tasks executed with errors due to the aforementioned Cloud Datastore and App Engine performance issues.
App Engine Memcache calls experienced a 5% error rate.
App Engine Admin API write calls failed during the incident, causing unsuccessful application deployments. App Engine Admin API read calls experienced a 13% error rate.
App Engine Search API index writes failed during the incident though search queries did not experience elevated errors.
Stackdriver Logging experienced delays exporting logs to systems including Cloud Console Logs Viewer, BigQuery and Cloud Pub/Sub. Stackdriver Logging retries on failure so no logs were lost during the incident. Logs-based Metrics failed to post some points during the incident.
We apologize for the impact of this outage on your application or service. For Google Cloud Platform customers who rely on the products which were part of this event, the impact was substantial and we recognize that it caused significant disruption for those customers. We are conducting a detailed post-mortem to ensure that all the root and contributing causes of this event are understood and addressed promptly.
|15 Feb 2018||14:04 PST|| |
The issue with App Engine has been resolved for all affected projects as of 12:44 US/Pacific. We will conduct an internal investigation of this issue and make appropriate improvements to our systems to help prevent or minimize future recurrence.
We will provide a more detailed analysis of this incident once we have completed our internal investigation.
|15 Feb 2018||13:04 PST|| |
We're seeing widespread improvement in error rates in many / most regions since ~12:40 PST. We're continuing to investigate and will provide another update by 13:30 PST.
|15 Feb 2018||12:41 PST|| |
We are investigating an issue with App Engine. We will provide more information by 13:00 US/Pacific.
- Service information
- Service disruption
- Service outage