Google Cloud Status

This page provides status information on the services that are part of the Google Cloud Platform. Check back here to view the current status of the services listed below. For additional information on these services, please visit cloud.google.com.

Google App Engine Incident #15011

High logging service error rate

Incident began at 2015-04-17 16:02 and ended at 2015-04-17 16:56 (all times are US/Pacific).

Date Time Description
Apr 24, 2015 05:46

SUMMARY:

On Friday 17 April 2015, the Google App Engine Logs API experienced intermittent failures and reduced throughput for read requests for a duration of 54 minutes. If your service or application was affected, we apologize — this is not the level of quality and reliability we strive to offer you, and we are taking immediate steps to improve the platform’s performance and availability.

DETAILED DESCRIPTION OF IMPACT:

On Friday 17 April 2015 from 16:02 to 16:56 PDT, 3% of read requests to the Logs API failed and there was a 96% drop in throughput. The problem affected 16% of applications that rely on this API to export logs. In this time window, users experienced intermittent timeouts while attempting to view application logs on App Engine Admin Console or Google Cloud Developers console.

ROOT CAUSE:

Hotspotting in the App Engine Logs API's storage subsystem caused a number of storage nodes to fail. This eventually resulted in resource depletion and request failures.

REMEDIATION AND PREVENTION:

At 16:05 on Friday 17 April 2015, an automated alert on depletion of available resources for the Logs API was sent out to Google Engineers. To resolve the immediate problem they started redirecting traffic away from the affected storage layer. The service started recovering at 16:51 and normal operation was restored at 16:56.

To prevent similar incidents in future, we are implementing changes to reallocate resources consumed by high use individual nodes of the storage layer backing the Logs API.

SUMMARY:

On Friday 17 April 2015, the Google App Engine Logs API experienced intermittent failures and reduced throughput for read requests for a duration of 54 minutes. If your service or application was affected, we apologize — this is not the level of quality and reliability we strive to offer you, and we are taking immediate steps to improve the platform’s performance and availability.

DETAILED DESCRIPTION OF IMPACT:

On Friday 17 April 2015 from 16:02 to 16:56 PDT, 3% of read requests to the Logs API failed and there was a 96% drop in throughput. The problem affected 16% of applications that rely on this API to export logs. In this time window, users experienced intermittent timeouts while attempting to view application logs on App Engine Admin Console or Google Cloud Developers console.

ROOT CAUSE:

Hotspotting in the App Engine Logs API's storage subsystem caused a number of storage nodes to fail. This eventually resulted in resource depletion and request failures.

REMEDIATION AND PREVENTION:

At 16:05 on Friday 17 April 2015, an automated alert on depletion of available resources for the Logs API was sent out to Google Engineers. To resolve the immediate problem they started redirecting traffic away from the affected storage layer. The service started recovering at 16:51 and normal operation was restored at 16:56.

To prevent similar incidents in future, we are implementing changes to reallocate resources consumed by high use individual nodes of the storage layer backing the Logs API.

Apr 17, 2015 17:48

Apologies - the date in the previous post was incorrect. The resolution time was Friday, April 17th, 2015 at 17:00 PDT.

Apologies - the date in the previous post was incorrect. The resolution time was Friday, April 17th, 2015 at 17:00 PDT.

Apr 17, 2015 17:45

The problem with the Google App Engine Logging service was resolved as of Friday April 18th, 2015 at 17:00 PDT. We apologize for the inconvenience and thank you for your patience and continued support. Please rest assured that system reliability is a top priority at Google, and we are making continuous improvements to make our systems better.

The problem with the Google App Engine Logging service was resolved as of Friday April 18th, 2015 at 17:00 PDT. We apologize for the inconvenience and thank you for your patience and continued support. Please rest assured that system reliability is a top priority at Google, and we are making continuous improvements to make our systems better.

Apr 17, 2015 16:48

We're investigating an issue with Google App Engine Logging service beginning at 2015-04-17 16:00 (all times are in US/Pacific). We will provide more information within one hour.

We're investigating an issue with Google App Engine Logging service beginning at 2015-04-17 16:00 (all times are in US/Pacific). We will provide more information within one hour.