Service Health

This page provides status information on the services that are part of Google Cloud. Check back here to view the current status of the services listed below. If you are experiencing an issue not listed here, please contact Support. Learn more about what's posted on the dashboard in this FAQ. For additional information on these services, please visit https://cloud.google.com/.

Incident affecting Cloud Firestore, Google App Engine, Google Cloud Functions

Increased latency and error rates observed on Google App Engine, Cloud Firestore, and Google Cloud Functions gen 1.

Incident began at 2024-09-18 12:34 and ended at 2024-09-18 15:30 (all times are US/Pacific).

Previously affected location(s)

Taiwan (asia-east1)Osaka (asia-northeast2)Seoul (asia-northeast3)Mumbai (asia-south1)Singapore (asia-southeast1)Jakarta (asia-southeast2)Sydney (australia-southeast1)Warsaw (europe-central2)London (europe-west2)Frankfurt (europe-west3)Zurich (europe-west6)São Paulo (southamerica-east1)Iowa (us-central1)South Carolina (us-east1)Northern Virginia (us-east4)Salt Lake City (us-west3)

Date Time Description
23 Sep 2024 07:11 PDT

Incident Report

Summary

On Wednesday, 18 September, 2024, Google App Engine, Cloud Firestore, and Google Cloud Run functions (1st gen) experienced increased latency and error rates for a duration of 2 hours and 56 minutes in multiple regions. In some regions, customers experienced a complete service outage for a period between 5 minutes and 67 minutes. Issue began on 18 September 2024 at 12:34 US/Pacific and was completely resolved on 18 September 2024 at 15:30 US/Pacific.

To our customers who were impacted during this disruption, we sincerely apologize. This is not the level of quality and reliability we strive to offer you, and we are taking immediate steps to improve the platform’s performance and availability.

Root Cause

The root cause was a newly implemented automation code which created a bad traffic routing policy. This policy incorrectly directed our traffic routing control plane to mark all clusters as being unavailable to serve traffic for App Engine, Google Cloud Run functions (1st gen)* and dependent services. Google engineers intervened before the policy was rolled out to all clusters, resulting in a partial outage of the service.

Remediation and Prevention

Google engineers were alerted to the issue via internal production monitoring on 18 September 2024 at 13:01 US/Pacific shortly after customers began experiencing the impact. Engineering teams have identified the automation which caused the impact and terminated it at 13:46. However customer impact was only mitigated at 15:30 post manually directing the traffic back to the affected clusters.

Google is committed to preventing a repeat of the issue in the future and is completing the following actions:

  • We have removed the automation which caused the outage as a short term measure
  • We are working to implement a more efficient and well tested traffic routing maintenance process
  • We will implement safeguards in the automation pipeline to prevent recurrence of this issue

Detailed Description of Impact

On Wednesday 18 September, 2024 from 12:34 US/Pacific to 15:30 US/Pacific, Google App Engine, Google Cloud Run Functions (1st gen)* and Cloud Firestore experienced elevated error rates and increased latency. Customers reported 5xx errors with the message “Request was aborted after waiting too long to attempt to service your request.” and high latency. Customers also experienced high cold starts during this time.

In 13 regions, customers experienced a complete service outage for a period between 5 minutes and 67 minutes.

  1. asia-east2
  2. asia-northeast2
  3. asia-northeast3
  4. asia-south1
  5. asia-southeast1
  6. asia-southeast2
  7. europe-west2
  8. europe-west3
  9. europe-west6
  10. northamerica-northeast1
  11. southamerica-east1
  12. us-central2
  13. us-west1

In other 11 regions, customers might observe elevated error rates:

  1. asia-east1
  2. asia-northeast1
  3. australia-southeast1
  4. europe-central2
  5. europe-west1
  6. us-central1
  7. us-east1
  8. us-east4
  9. us-west2
  10. us-west3
  11. us-west4

*Cloud Run and Cloud Run functions (gen2) were not affected.

18 Sep 2024 21:36 PDT

Mini Incident Report

We apologize for the inconvenience this service outage may have caused. We would like to provide some information about this incident below. Please note, this information is based on our best knowledge at the time of posting and is subject to change as our investigation continues. If you have experienced impact outside of what is listed below, please reach out to Google Cloud Support using https://cloud.google.com/support

(All Times US/Pacific)

Incident Start 18 September, 2024, 13:01

Incident End 18 September, 2024, 15:30

Duration 2 hours, 29 minutes

Affected Services and Features

  • Firestore
  • App Engine
  • Google Cloud Functions Gen 1

Regions/Zones

Global

Description

Google App Engine, Google Cloud Functions Gen1, Firestore experienced elevated error rates and increased latency for a period of 2 hours, 29 minutes. Based on our preliminary analysis, the root cause of the issue was identified as a newly implemented automation code which created a bad traffic routing policy. This policy incorrectly directed our traffic routing control plane to mark all clusters as being unavailable to serve traffic for App Engine and dependent services. Google engineers intervened before the policy was rolled out to all clusters, resulting in a partial outage of the service.

Google engineers have identified the automation that was responsible for this change and have terminated it until appropriate safeguards are put in place. The impact was mitigated by manually directing the traffic back to the affected clusters. There is no risk of a recurrence of this outage at the moment.

Google will complete a full IR in the following days that will provide a full root cause.

Customer Impact

  • Customers experienced elevated latency and error rates for Google App Engine, Google Cloud Functions Gen1 and Firestore services.
  • Customers in some regions experienced a complete service outage for Google App Engine, Google Cloud Functions Gen1 and Firestore services.

18 Sep 2024 15:50 PDT

The issue with Google App Engine, Google Cloud Functions, Cloud Firestore has been resolved for all affected users as of Wednesday, 2024-09-18 15:30 US/Pacific.

We will publish an analysis of this incident once we have completed our internal investigation.

We thank you for your patience while we worked on resolving the issue.

18 Sep 2024 15:19 PDT

Summary: Increased latency and error rates observed on Google App Engine, Cloud Firestore, and Google Cloud Functions gen 1.

Description: Mitigation has been successfully applied by our engineering team. We are currently monitoring our environment to ensure stability.

We will provide more information by Wednesday, 2024-09-18 16:00 US/Pacific.

Diagnosis: Affected users may encounter elevated latency or an elevated error rate for the impacted products.

Workaround: None at this time.

18 Sep 2024 14:57 PDT

Summary: Increased latency and error rates observed on Google App Engine and Google Cloud Functions gen 1.

Description: Mitigation work is currently underway by our engineering team. Based on the investigation thus far, our engineers have identified that Cloud Run is not currently impacted.

We do not have an ETA for mitigation at this point.

We will provide more information by Wednesday, 2024-09-18 16:00 US/Pacific.

Diagnosis: Affected users may encounter elevated latency or an elevated error rate for the impacted products.

Workaround: None at this time.