Google Cloud Status Dashboard

This page provides status information on the services that are part of Google Cloud Platform. Check back here to view the current status of the services listed below. If you are experiencing an issue not listed here, please contact Support. Learn more about what's posted on the dashboard in this FAQ. For additional information on these services, please visit cloud.google.com.

Google App Engine Incident #15025

Authentication issues with App Engine

Incident began at 2015-12-07 22:00 and ended at 2015-12-08 13:25 (all times are US/Pacific).

Date Time Description
Dec 16, 2015 07:40

SUMMARY:

On Monday 7 December 2015, 1.29% of Google App Engine applications received errors when issuing authenticated calls to Google APIs over a period of 17 hours and 3 minutes. During a 45-minute period, authenticated calls to Google APIs from outside of App Engine also received errors, with the error rate peaking at 12%. We apologise for the impact of this issue on you and your service. We consider service degradation of this level and duration to be very serious and we are planning many changes to prevent this occurring again in the future.

DETAILED DESCRIPTION OF IMPACT:

Between Monday 7 December 2015 20:09 PST and Tuesday 8 December 2015 13:12, 1.29% of Google App Engine applications using service accounts received error 401 "Access Denied" for all requests to Google APIs requiring authentication. Unauthenticated API calls were not affected. Different applications experienced impact at different times, with few applications being affected for the full duration of the incident.

In addition, between 23:05 and 23:50, an average of 7% of all requests to Google Cloud APIs failed or timed out, peaking briefly at 12%. Outside of this time only API calls from App Engine were affected.

ROOT CAUSE:

Google engineers have recently carried out a migration of the Google Accounts system to a new storage backend, which included copying API authentication service credentials data and redirecting API calls to the new backend.

To complete this migration, credentials were scheduled to be deleted from the previous storage backend. This process started at 20:09 PST on Monday 7 December 2015. Due to a software bug, the API authentication service continued to look up some credentials, including those used by Google App Engine service accounts, in the old storage backend. As these credentials were progressively deleted, their corresponding service accounts could no longer be authenticated.

The impact increased as more credentials were deleted and some Google App Engine applications started to issue a high volume of retry requests. At 23:05, the retry volume exceeded the regional capacity of the API authentication service, causing 1.3% of all authenticated API calls to fail or timeout, including Google APIs called from outside Google App Engine. At 23:30 the API authentication service exceeded its global capacity, causing up to 12% of all authenticated API calls to fail until 23:50, when the overload issue was resolved.

REMEDIATION AND PREVENTION:

At 23:50 PST on Monday 8 December, Google engineers blocked certain authentication credentials that were known to be failing, preventing retries on these credentials from overloading the API authentication service.

On Tuesday 9 December 08:52 PST, the deletion process was halted, having removed 2.3% of credentials, preventing further applications from being affected. At 10:08, Google engineers identified the root cause for the misdirected credentials lookup. After thorough testing, a fix was rolled out globally, resolving the issue for all affected Google App Engine applications by 13:12.

Google has conducted a far-reaching review of the issue's root causes and contributory factors, leading to numerous prevention and mitigation actions in the following areas: — Google engineers have deployed monitoring for additional infrastructure signals to detect and analyse similar issues more quickly. — Google engineers have improved internal tools to extend auditing and logging and automatically advise relevant teams on potentially risky data operations. — Additional rate limiting and caching features will be added to the API authentication service, increasing its resilience to load spikes. — Google’s development guidelines are being reviewed and updated to improve the handling of service or backend migrations, including a grace period of disabling access to old data locations before fully decommissioning them.

Our customers rely on us to provide a superior service and we regret we did not live up to expectations in this case. We apologize again for the inconvenience this caused you and your users.

SUMMARY:

On Monday 7 December 2015, 1.29% of Google App Engine applications received errors when issuing authenticated calls to Google APIs over a period of 17 hours and 3 minutes. During a 45-minute period, authenticated calls to Google APIs from outside of App Engine also received errors, with the error rate peaking at 12%. We apologise for the impact of this issue on you and your service. We consider service degradation of this level and duration to be very serious and we are planning many changes to prevent this occurring again in the future.

DETAILED DESCRIPTION OF IMPACT:

Between Monday 7 December 2015 20:09 PST and Tuesday 8 December 2015 13:12, 1.29% of Google App Engine applications using service accounts received error 401 "Access Denied" for all requests to Google APIs requiring authentication. Unauthenticated API calls were not affected. Different applications experienced impact at different times, with few applications being affected for the full duration of the incident.

In addition, between 23:05 and 23:50, an average of 7% of all requests to Google Cloud APIs failed or timed out, peaking briefly at 12%. Outside of this time only API calls from App Engine were affected.

ROOT CAUSE:

Google engineers have recently carried out a migration of the Google Accounts system to a new storage backend, which included copying API authentication service credentials data and redirecting API calls to the new backend.

To complete this migration, credentials were scheduled to be deleted from the previous storage backend. This process started at 20:09 PST on Monday 7 December 2015. Due to a software bug, the API authentication service continued to look up some credentials, including those used by Google App Engine service accounts, in the old storage backend. As these credentials were progressively deleted, their corresponding service accounts could no longer be authenticated.

The impact increased as more credentials were deleted and some Google App Engine applications started to issue a high volume of retry requests. At 23:05, the retry volume exceeded the regional capacity of the API authentication service, causing 1.3% of all authenticated API calls to fail or timeout, including Google APIs called from outside Google App Engine. At 23:30 the API authentication service exceeded its global capacity, causing up to 12% of all authenticated API calls to fail until 23:50, when the overload issue was resolved.

REMEDIATION AND PREVENTION:

At 23:50 PST on Monday 8 December, Google engineers blocked certain authentication credentials that were known to be failing, preventing retries on these credentials from overloading the API authentication service.

On Tuesday 9 December 08:52 PST, the deletion process was halted, having removed 2.3% of credentials, preventing further applications from being affected. At 10:08, Google engineers identified the root cause for the misdirected credentials lookup. After thorough testing, a fix was rolled out globally, resolving the issue for all affected Google App Engine applications by 13:12.

Google has conducted a far-reaching review of the issue's root causes and contributory factors, leading to numerous prevention and mitigation actions in the following areas: — Google engineers have deployed monitoring for additional infrastructure signals to detect and analyse similar issues more quickly. — Google engineers have improved internal tools to extend auditing and logging and automatically advise relevant teams on potentially risky data operations. — Additional rate limiting and caching features will be added to the API authentication service, increasing its resilience to load spikes. — Google’s development guidelines are being reviewed and updated to improve the handling of service or backend migrations, including a grace period of disabling access to old data locations before fully decommissioning them.

Our customers rely on us to provide a superior service and we regret we did not live up to expectations in this case. We apologize again for the inconvenience this caused you and your users.

Dec 08, 2015 13:29

The issue with App Engine applications accessing Google APIs should have been resolved for all affected customers as of 13:15 US/Pacific. We will conduct an internal investigation of this issue and make appropriate improvements to our systems to prevent or minimize future recurrence. We will provide a more detailed analysis of this incident once we have completed our internal investigation.

The issue with App Engine applications accessing Google APIs should have been resolved for all affected customers as of 13:15 US/Pacific. We will conduct an internal investigation of this issue and make appropriate improvements to our systems to prevent or minimize future recurrence. We will provide a more detailed analysis of this incident once we have completed our internal investigation.

Dec 08, 2015 12:31

We believe the issue is resolved for most customers. A new update will be provided by 2015-12-08 13:30 US/Pacific with current details.

We believe the issue is resolved for most customers. A new update will be provided by 2015-12-08 13:30 US/Pacific with current details.

Dec 08, 2015 11:30

We’re investigating elevated error rates for some Google Cloud Platform users. We believe these errors are affecting between 2-5 percent of Google App Engine (GAE) applications. We are working directly with the customers who are affected to restore full operation in their application as quickly as possible, and apologize for any inconvenience. We will provide another status update by 2015-12-08 12:30 US/Pacific with current details.

We’re investigating elevated error rates for some Google Cloud Platform users. We believe these errors are affecting between 2-5 percent of Google App Engine (GAE) applications. We are working directly with the customers who are affected to restore full operation in their application as quickly as possible, and apologize for any inconvenience. We will provide another status update by 2015-12-08 12:30 US/Pacific with current details.

Dec 08, 2015 10:30

We are still investigating the issue with App Engine applications accessing Google APIs. We will provide another status update by 2015-12-08 11:30 US/Pacific with current details.

We are still investigating the issue with App Engine applications accessing Google APIs. We will provide another status update by 2015-12-08 11:30 US/Pacific with current details.

Dec 08, 2015 09:30

We are still investigating the issue with App Engine applications accessing Google APIs. We will provide another status update by 2015-12-08 10:30 US/Pacific with current details.

We are still investigating the issue with App Engine applications accessing Google APIs. We will provide another status update by 2015-12-08 10:30 US/Pacific with current details.

Dec 08, 2015 08:26

We are still investigating the issue with App Engine applications accessing Google APIs. We will provide another status update by 2015-12-08 09:30 US/Pacific with current details.

We are still investigating the issue with App Engine applications accessing Google APIs. We will provide another status update by 2015-12-08 09:30 US/Pacific with current details.

Dec 08, 2015 07:26

We are still investigating the issue with App Engine applications accessing Google APIs. We will provide another status update by 2015-12-08 08:30 US/Pacific with current details.

We are still investigating the issue with App Engine applications accessing Google APIs. We will provide another status update by 2015-12-08 08:30 US/Pacific with current details.

Dec 08, 2015 06:29

We are still investigating the issue with App Engine applications accessing Google APIs. We will provide another status update by 2015-12-08 07:30 US/Pacific with current details.

We are still investigating the issue with App Engine applications accessing Google APIs. We will provide another status update by 2015-12-08 07:30 US/Pacific with current details.

Dec 08, 2015 05:26

We are still investigating the issue with App Engine applications accessing Google APIs. We will provide another status update by 2015-12-08 06:30 US/Pacific with current details.

We are still investigating the issue with App Engine applications accessing Google APIs. We will provide another status update by 2015-12-08 06:30 US/Pacific with current details.

Dec 08, 2015 04:19

Despite actions taken to mitigate the problem, a significant number of App Engine applications have continued to experience errors while accessing Google APIs. For everyone who is affected, we apologize for any inconvenience you may be experiencing. We will provide an update by 2015-12-08 05:30 US/Pacific with current details.

Despite actions taken to mitigate the problem, a significant number of App Engine applications have continued to experience errors while accessing Google APIs. For everyone who is affected, we apologize for any inconvenience you may be experiencing. We will provide an update by 2015-12-08 05:30 US/Pacific with current details.

Dec 08, 2015 03:20

The issue with App Engine applications accessing Google APIs should have been resolved for the majority of projects and we expect a full resolution in the near future. We will provide another status update by 08:00 US/Pacific with current details.

The issue with App Engine applications accessing Google APIs should have been resolved for the majority of projects and we expect a full resolution in the near future. We will provide another status update by 08:00 US/Pacific with current details.

Dec 08, 2015 02:19

We are experiencing an issue with App Engine applications accessing Google APIs beginning at Monday, 2015-12-07 22:00 US/Pacific. Affected APIs may return a "401 Invalid Credentials" error message. For everyone who is affected, we apologize for any inconvenience you may be experiencing. We will provide an update by 2015-12-08 03:30 US/Pacific with current details.

We are experiencing an issue with App Engine applications accessing Google APIs beginning at Monday, 2015-12-07 22:00 US/Pacific. Affected APIs may return a "401 Invalid Credentials" error message. For everyone who is affected, we apologize for any inconvenience you may be experiencing. We will provide an update by 2015-12-08 03:30 US/Pacific with current details.

Dec 08, 2015 01:19

We are experiencing an issue with App Engine applications accessing Google APIs beginning at Monday, 2015-12-07 22:00 US/Pacific. Affected APIs may return a "401 Invalid Credentials" error message. For everyone who is affected, we apologize for any inconvenience you may be experiencing. We will provide an update by 2015-12-08 02:30 US/Pacific with current details.

We are experiencing an issue with App Engine applications accessing Google APIs beginning at Monday, 2015-12-07 22:00 US/Pacific. Affected APIs may return a "401 Invalid Credentials" error message. For everyone who is affected, we apologize for any inconvenience you may be experiencing. We will provide an update by 2015-12-08 02:30 US/Pacific with current details.

Dec 08, 2015 00:23

We are experiencing an issue with App Engine applications accessing Google APIs beginning at Monday, 2015-12-07 22:00 US/Pacific. Affected APIs may return a "401 Invalid Credentials" error message. For everyone who is affected, we apologize for any inconvenience you may be experiencing. We will provide an update by 2015-12-08 01:20 US/Pacific with current details.

We are experiencing an issue with App Engine applications accessing Google APIs beginning at Monday, 2015-12-07 22:00 US/Pacific. Affected APIs may return a "401 Invalid Credentials" error message. For everyone who is affected, we apologize for any inconvenience you may be experiencing. We will provide an update by 2015-12-08 01:20 US/Pacific with current details.

Dec 07, 2015 23:55

"We are experiencing an issue with App Engine applications accessing Google APIs beginning at Monday, 2015-12-07 22:00 US/Pacific. Affected APIs may return a "401 Invalid Credentials" error message. For everyone who is affected, we apologize for any inconvenience you may be experiencing. We will provide an update by 2015-12-08 00:20 US/Pacific with current details."

"We are experiencing an issue with App Engine applications accessing Google APIs beginning at Monday, 2015-12-07 22:00 US/Pacific. Affected APIs may return a "401 Invalid Credentials" error message. For everyone who is affected, we apologize for any inconvenience you may be experiencing. We will provide an update by 2015-12-08 00:20 US/Pacific with current details."

Dec 07, 2015 23:39

We are investigating reports of an issue with App Engine applications accessing Google APIs. We will provide more information by 23:50 US/Pacific. Affected APIs may return a "401 Invalid Credentials" error message.

We are investigating reports of an issue with App Engine applications accessing Google APIs. We will provide more information by 23:50 US/Pacific. Affected APIs may return a "401 Invalid Credentials" error message.