Service Health

This page provides status information on the services that are part of Google Cloud. Check back here to view the current status of the services listed below. If you are experiencing an issue not listed here, please contact Support. Learn more about what's posted on the dashboard in this FAQ. For additional information on these services, please visit https://cloud.google.com/.

Incident affecting Speech-to-Text, Dialogflow CX, Google Cloud Support, Dialogflow ES, Cloud Machine Learning, Contact Center Insights

Dialogflow ES, Dailgoflow CX, Speech-to-text and Google Cloud Support customers are experiencing elevated 500 error rates and service unavailability.

Incident began at 2023-10-02 12:30 and ended at 2023-10-02 14:30 (all times are US/Pacific).

Previously affected location(s)

Tokyo (asia-northeast1)Mumbai (asia-south1)Singapore (asia-southeast1)Sydney (australia-southeast1)Multi-region: europeBelgium (europe-west1)London (europe-west2)Frankfurt (europe-west3)GlobalMontréal (northamerica-northeast1)Multi-region: usIowa (us-central1)South Carolina (us-east1)Oregon (us-west1)

Date Time Description
6 Oct 2023 14:54 PDT

Incident Report

Summary

On Monday, 2 October 2023, Dialogflow - Essentials (ES) and Customer Experience (CX), Cloud Speech-to-Text v1 API, Contact Center AI Insights - Analysis and Summarization, Cloud Billing Support, Workspace Support experienced elevated error rates and service unavailability for two hours. To our customers whose business was impacted during this disruption, we sincerely apologize. This is not the level of quality and reliability we strive to offer you, and we are taking immediate steps to improve the platform’s performance and availability.

Root Cause

Dialogflow is used to analyze both text and audio inputs from customers. For audio inputs, Dialogflow has a critical dependency on Cloud Speech-to-Text service as this service transcribes audio to text.

A configuration issue caused our systems to think that a Cloud STT’s dependency was down and thus could not serve traffic. This propagated through multiple systems with dependencies on each other. The system that was down was not in fact needed to serve requests, but the configuration caused traffic to be deflected anyway.

This occurred because that unused dependency was scheduled for turndown on 2 October 2023. When this service was turned down globally instead of staged rollout, it triggered the configuration dependency and caused the outage.

It was a case of follow-on impact: Internal speech serving service > Cloud Speech-to-Text > Dialogflow > Billing Support / Workspace Support.

Remediation and Prevention

Google engineers were paged by automated alert 2 October at 12:32 US/Pacific and immediately started an investigation. Once the nature and scope of the issue became clear, they quickly rolled out a change to unblock customers. The issue was mitigated by marking Dialogflow’s dependency on Cloud Speech as optional, as well as marking Cloud Speech’s internal dependency as optional, allowing both services to recover.

Google is committed preventing a repeat of this issue in the future and are taking the following actions:

  • Enhance the turndown process to remove dependencies on redundant systems.
  • Augment testing of configuration changes prior to deployment in customer environments.

Detailed Description of Impact

On Monday 2 October from 12:30 to 14:30 US/Pacific, Dialogflow ES, Dialogflow CX, Cloud Speech-to-Text, Contact Center AI Insights, Google Cloud Billing support, and Workspace support experienced elevated error rates and service unavailability for two hours.

Dialogflow ES and CX:

  • Customers experienced UNAVAILABLE errors and service unavailability across all regions. 100% of Dialogflow traffic was affected for the duration of the incident.

Cloud Speech-to-Text v1 API:

  • Customers experienced UNAVAILABLE errors and service unavailability across all regions. 100% of Speech-to-Text v1 API traffic was affected for the duration of the incident. Speech-to-Text v2 API was not affected.

Contact Center AI Insights

  • Contact Center AI insights analysis experienced significant elevated latencies causing requests to fail. 90%+ traffic was affected. The affected features include summarization, create analysis, delete analysis, and bulk analysis.

Cloud Billing Support:

  • Customers were unable to get billing help through the “billing assistant” for the duration of the issue. 100% of customers who tried would have received an error and prevented them from receiving any support.

Workspace Support:

  • Workspace customers were unable to create cases from admin.google.com. Other support channels like support.cloud.google.com were unaffected by the issue. One hundred percent of customers who tried would have received an error that prevented them from receiving any support.
3 Oct 2023 11:29 PDT

Mini Incident Report

We apologize for the inconvenience this service outage may have caused. We would like to provide some information about this incident below. Please note, this information is based on our best knowledge at the time of posting and is subject to change as our investigation continues. If you have experienced impact outside of what is listed below, please reach out to Google Cloud Support using https://cloud.google.com/support or to Google Workspace Support using help article https://support.google.com/a/answer/1047213.

(All Times US/Pacific)

Incident Start: 02 October 2023 12:30

Incident End: 02 October 2023 14:30

Duration: 2 hours

Affected Services and Features:

  • Dialogflow - Essentials (ES) and Customer Experience (CX)
  • Cloud Speech-to-Text v1 API
  • Contact Center AI Insights - Analysis and Summarization.
  • Cloud Billing Support
  • Workspace Support

Regions/Zones: All regions including multi-regions and Global location

Description:

Dialogflow ES, Dialogflow CX, Cloud Speech-to-Text, Contact Center AI Insights, Google Cloud Billing support, and Workspace support experienced elevated error rates and service unavailability for a duration of 2 hours. From preliminary analysis, the root cause of the issue is a change to one of the dependency services.

Dialogflow is used to analyze both text and audio inputs from customers. For audio inputs, Dialogflow has a critical dependency on Cloud Speech-to-Text service as this service transcribes audio to text.

A configuration issue caused our systems to think that a dependency was down and thus could not serve traffic. This propagated through multiple systems with dependencies on each other. The system that was down was not in fact needed to serve requests, but the configuration caused traffic to be deflected anyway.

This occurred because that unused dependency was scheduled for turndown on 02 October 2023. When this service was turned down, it triggered the configuration dependency and caused the outage.

It was a case of cascading failures: Internal speech serving service > Cloud Speech-to-Text > Dialogflow > Billing Support / Workspace Support.

We are reviewing our dependencies configurations to remove this type of dependency and prevent this kind of outage in the future. We are also reviewing our policies and processes around service turndowns to make sure that 1) they were followed and 2) update them to prevent this kind of issue in the future

The incident was mitigated by marking Dialogflow's dependency on Cloud Speech as optional, as well as marking Cloud Speech's internal dependency as optional, to allow both services to recover.

Google will complete a full IR in the following days that will provide a detailed root cause.

Customer Impact:

Dialogflow ES and CX:

  • Customers were experiencing high UNAVAILABLE errors and service unavailability across all regions.

Cloud Speech-to-Text v1 API:

  • Customers were experiencing high UNAVAILABLE errors and service unavailability across all regions.

Contact Center AI Insights:

  • Contact Center AI insights analysis experienced significant elevated latencies causing requests to fail. The affected features include summarization, create analysis, delete analysis, and bulk analysis.

Cloud Billing Support:

  • Customers were unable to get billing help through the “billing assistant” for the duration of the issue.

Workspace Support:

  • Workspace customers who are unable to create cases from admin.google.com. Rest of the support channels like support.cloud.google.com were unaffected by the issue.

2 Oct 2023 15:19 PDT

The issue with Dialogflow CX, Dialogflow ES, Google Cloud Billing Assistant, Speech-to-Text has been resolved for all affected users as of Monday, 2023-10-02 15:00 US/Pacific.

From preliminary analysis, we believe the root cause of the issue is a change to a dependent service. The change has been rolled back and we are taking additional steps to prevent recurrence of the issue.

During the incident, customers experienced elevated error rates and service unavailability for Dialogflow ES, Dialogflow CX, Speech-to-text services. Customers who were attempting to get billing help from “billing assistant” were unable to get assistance.

We thank you for your patience while we worked on resolving the issue.

Google engineers are working on a detailed root cause analysis. A detailed incident report will be provided upon completion of our investigation.

2 Oct 2023 14:53 PDT

Summary: Dialogflow ES, Dailgoflow CX, Speech-to-text and Google Cloud Support customers are experiencing elevated 500 error rates and service unavailability.

Description: Our engineering team completed rollout of mitigation and internal monitoring shows significant decrease in error rate.

We believe the issue with Dialogflow CX, Dialogflow ES, Speech-to-Text, and Google Cloud Support is mitigated for most of the customers at this point.

Our engineers are closely monitoring the error rate to ensure full issue resolution.

We will provide an update by Monday, 2023-10-02 15:25 US/Pacific with latest details.

Diagnosis:

  • Customers are experiencing elevated 500 errors for Dialogflow ES and CX
  • Speech-to-Text customers are experiencing service unavailability.
  • Google Cloud customers getting bill help from "billing assistant" are affected.

Workaround: None at this time.

2 Oct 2023 14:41 PDT

Summary: Dialogflow ES and CX customers are experiencing elevated 500 error rates.

Description: Our engineering team completed rollout of mitigation and internal monitoring shows decrease in error rate. We believe the issue with Dialogflow CX, Dialogflow ES, Speech-to-Text is mitigated at this point.

Our engineers are closely monitoring the error rate to ensure full issue resolution.

We will provide an update by Monday, 2023-10-02 15:15 US/Pacific with latest details.

Diagnosis:

  • Customers are experiencing elevated 500 errors for Dialogflow ES and CX
  • Speech-to-Text customers are experiencing service unavailability.

Workaround: None at this time.

2 Oct 2023 14:08 PDT

Summary: Dialogflow ES and CX customers are experiencing elevated 500 error rates.

Description: Mitigation work is currently underway by our engineering team.

The mitigation is expected to complete by Monday, 2023-10-02 15:10 US/Pacific.

We will provide more information by Monday, 2023-10-02 15:15 US/Pacific.

Diagnosis:

  • Customers are experiencing elevated 500 errors for Dialogflow ES and CX
  • Speech-to-Text customers are experiencing service unavailability.

Workaround: None at this time.

2 Oct 2023 13:52 PDT

Summary: Dialogflow ES and CX customers are experiencing elevated 500 error rates.

Description: We are experiencing an issue with Dialogflow CX, Dialogflow ES, Speech-to-Text.

Our engineering team continues to investigate the issue.

We will provide an update by Monday, 2023-10-02 14:30 US/Pacific with current details.

We apologize to all who are affected by the disruption.

Diagnosis:

  • Customers are experiencing elevated 500 errors for Dialogflow ES and CX
  • Speech-to-Text customers are experiencing service unavailability.

Workaround: None at this time.

2 Oct 2023 13:15 PDT

Summary: Dialogflow ES and CX customers are experiencing elevated 500 error rates.

Description: We are experiencing an issue with Dialogflow ES and Dialogflow CX.

Our engineering team continues to investigate the issue.

We will provide an update by Monday, 2023-10-02 14:45 US/Pacific with current details.

We apologize to all who are affected by the disruption.

Diagnosis: Customers are experiencing elevated 500 errors for both Dialogflow ES and CX

Workaround: None at this time.