Service Health

This page provides status information on the services that are part of Google Cloud. Check back here to view the current status of the services listed below. If you are experiencing an issue not listed here, please contact Support. Learn more about what's posted on the dashboard in this FAQ. For additional information on these services, please visit https://cloud.google.com/.

Incident affecting Google App Engine

Authentication issues with Google App Engine

Incident began at 2015-03-05 07:33 and ended at 2015-03-05 08:27 (all times are US/Pacific).

Date Time Description
6 Mar 2015 14:00 PST

SUMMARY:

On Thursday 5 March 2015, for a duration of 84 minutes, Google App Engine applications that accessed some Google APIs over HTTP experienced elevated error rates. We apologize for any impact this incident had on your service or application, and have made immediate changes to prevent this issue from recurring.

DETAILED DESCRIPTION OF IMPACT:

On Thursday 5 January, from 07:04 AM to 08:28 AM, some Google App Engine applications making calls to other Google APIs via HTTP experienced elevated error rates. During the incident, the global error rate for all API calls remained under 1%, and in total, the outage affected 2% of applications that were active during the incident. The effect on those applications was significant: requests to issue OAuth tokens experienced an error rate of over 85%. In addition, the HTTP APIs to googleapis.com/storage and googleapis.com/gmail received error rates between 50% and 60%. Other googleapis.com endpoints were affected with error rates of 10% to 20%.

ROOT CAUSE:

A component in Google’s shared HTTP load balancing fabric experienced a non-malicious increase in traffic, exceeding its provisioned capacity. This triggered an automatic DoS protection which shunted a portion of the incoming traffic to a CAPTCHA. The unexpected response caused some clients to issue automated retries, exacerbating the problem.

REMEDIATION AND PREVENTION:

Google Engineers were alerted to the issue by automated monitoring at 07:02, as the load balancing system detected excess traffic and attempted to automatically mitigate it. At 07:46, Google Engineers enabled standby load balancing capacity to rectify the issue. From 08:15 to 08:40, Google Engineers continued to provision additional resources in the load balancing fabric in order to serve the increased traffic. During this period, at 08:28, Google engineers determined that sufficient capacity was in place to serve both regular and retry traffic, and instructed the load balancing system to cease mitigation and resume normal traffic serving. This action marked the end of the event.

To prevent this issue from recurring, Google engineers are comprehensively re-examining the affected load balancing fabric to ensure it is and remains correctly provisioned. Additionally, Google engineers are improving monitoring rules to provide an early warning of capacity shortfall. Finally, Google engineers are examining the services that depend on this load balancing system, and will move some services to a separate pool of more easily scalable load balancers where appropriate.

5 Mar 2015 15:01 PST

At 7:04 AM PST Google systems began returning errors for approximately 20% of requests from App Engine to many Google Cloud Platform APIs. The error rate peaked around 50% at 7:50 and remained at that level until the incident was resolved at 8:26. Many users observed this issue as a failure of the authentication service. We will post a complete incident report following our internal investigation.

5 Mar 2015 08:56 PST

The problem with authentication on Google App Engine and the Google APIs was resolved as of Thursday, 2015-03-05 08:27 (all times are in US/Pacific). We apologize to our customers for the inconvenience, and we thank you for your patience and continued support.

We will provide a more detailed analysis of this incident once we have completed our internal investigation.

5 Mar 2015 08:46 PST

The issue with Google App Engine and Google APIs authentication is resolved for most applications as of 08:28 US/Pacific. Our engineers continue to monitor the situation to ensure that service is fully restored and stable.

We will provide more information by 09:15 US/Pacific.

5 Mar 2015 08:27 PST

We are investigating an issue with authentication on Google App Engine beginning at Thursday, 2015-03-05 07:32 (all times are in US/Pacific).

Affected applications are responding with a HTTP 302 on user login, or with a 403 error connecting to Google APIs.

We will provide more information by 09:00 US/Pacific time.

5 Mar 2015 07:52 PST

We are investigating an issue with App Engine authentication services. We will post an update shortly.