Service Health

This page provides status information on the services that are part of Google Cloud. Check back here to view the current status of the services listed below. If you are experiencing an issue not listed here, please contact Support. Learn more about what's posted on the dashboard in this FAQ. For additional information on these services, please visit https://cloud.google.com/.

Incident affecting Google BigQuery

We've received a report of an issue with Google BigQuery.

Incident began at 2023-11-20 20:00 and ended at 2023-11-21 05:50 (all times are US/Pacific).

Previously affected location(s)

Multi-region: eu

Date Time Description
23 Nov 2023 18:48 PST

Incident Report

Summary

On Monday, 20 November, Google BigQuery’s backend capacity management system experienced an issue that prevented existing capacity from being correctly distributed in the EU multi-region for a period of 9 hours and 50 minutes.

During this period, BigQuery customers in the EU multi-region were unable to purchase new slots, and customers who were using the BigQuery autoscaler were unable to scale up. To our BigQuery customers whose autoscaling functionality was impacted during this disruption, we sincerely apologize. This is not the level of quality and reliability we strive to offer you, and we are taking immediate steps to improve the platform’s performance and availability.

Root Cause

The process for approving slots is managed by the backend capacity management system, which combines information about the current placement of BigQuery customers, capacities, and reservation sizes taken from an internal database in order to determine current available capacity.

The root cause was an issue with our capacity management service that caused an invalid data entry to be written to this database. As a result of this invalid entry, the backend capacity management system incorrectly indicated that there was no available slot capacity in the region. Because of this, customers were unable to purchase slots and autoscale reservations in the EU multi-region.

Remediation and Prevention

Google engineers were alerted to the issue on 21 November at 03:40 US/Pacific from customer cases. Google engineers immediately started working on mitigating the impact by restarting the process that is responsible for backend capacity distribution; however, the issue resurfaced soon afterward.

The incident was fully mitigated at 05:50 US/Pacific when the invalid data in the reservations database was identified and fixed.

Google is committed to preventing a repeat of this issue in the future. In addition to fixing the issue that caused the original invalid data entry, Google is completing the following actions:

  • Increased monitoring to alert engineering teams in the event of latency encountered in autoscaling.
  • Improved our monitoring and tracking capabilities to detect problems in reservation baseline adjustment and autoscaling.
  • Improved how the backend capacity management component deals with invalid entries to reduce instances of errors or incorrect computations.

Detailed Description of Impact

For a period of 9 hours and 50 minutes, starting at 20:13 US/Pacific on 20 November customers in the EU multi-region:

  • Were unable to create new reservations or upsize existing reservations.
  • Were unable to have their autoscale reservation scale up.
21 Nov 2023 11:34 PST

Mini Incident Report

We apologize for the inconvenience this service disruption has caused. We would like to provide some information about this incident below. Please note, this information is based on our best knowledge at the time of posting and is subject to change as our investigation continues. If you have experienced impact outside of what is listed below, please reach out to Google Cloud Support using https://cloud.google.com/support. (All Times US/Pacific)

Incident Start: 20 November 2023, 20:00

Incident End: 21 November 2023, 05:50

Duration: 9 hours, 50 minutes

Affected Services and Features:

Google BigQuery - Workload Management

Regions/Zones: EU multiregion

Description:

Google BigQuery users experienced issues with slot autoscaling and creating new reservations for a duration of 9 hours, 50 minutes

From preliminary analysis, the issue was due to a backend capacity management system error that prevented existing capacity from being correctly distributed in the EU multiregion. Our engineering team has mitigated the issue and took additional steps to avoid recurrence.

Google will complete a full IR in the following days that will provide a full root cause.

Customer Impact:

  • BigQuery users in EU multi-region were unable to scale up slots with autoscaling and were unable to create new reservations.

21 Nov 2023 06:00 PST

The issue with Google BigQuery has been resolved for all affected users as of Tuesday, 2023-11-21 05:34 US/Pacific.

The issue was due to a backend capacity management system error that prevented existing capacity from being correctly distributed.

We thank you for your patience while we worked on resolving the issue.

21 Nov 2023 05:24 PST

Summary: We've received a report of an issue with Google BigQuery.

Description: Mitigation work is currently underway by our engineering team.

We do not have an ETA for mitigation at this point.

We will provide more information by Tuesday, 2023-11-21 06:33 US/Pacific.

Diagnosis: Reservations are not able to be created, or increased by autoscaling

Workaround: None at this time.

21 Nov 2023 04:40 PST

Summary: We've received a report of an issue with Google BigQuery.

Description: Our engineering team has determined that in-depth investigation is required and is currently considering alternative mitigation options.

We will provide an update by Tuesday, 2023-11-21 05:32 US/Pacific with current details.

Diagnosis: Reservations are not able to be created, or increased by autoscaling

Workaround: None at this time.

21 Nov 2023 04:06 PST

Summary: We've received a report of an issue with Google BigQuery.

Description: Mitigation work is currently underway by our engineering team.

We do not have an ETA for mitigation at this point.

We will provide more information by Tuesday, 2023-11-21 04:49 US/Pacific.

Diagnosis: Reservations are not able to be created, or increased by autoscaling

Workaround: None at this time.

21 Nov 2023 03:39 PST

Summary: We've received a report of an issue with Google BigQuery.

Description: We are experiencing an issue with Google BigQuery.

Our engineering team continues to investigate the issue.

We will provide an update by Tuesday, 2023-11-21 04:30 US/Pacific with current details.

We apologize to all who are affected by the disruption.

Diagnosis: Reservations are not able to be created, or increased by autoscaling

Workaround: Creating an on-demand reservation might work

21 Nov 2023 03:28 PST

Summary: We've received a report of an issue with Google BigQuery.

Description: We are experiencing an issue with Google BigQuery.

Our engineering team continues to investigate the issue.

We will provide an update by Tuesday, 2023-11-21 04:30 US/Pacific with current details.

We apologize to all who are affected by the disruption.

Diagnosis: Reservations are not able to be created, or increased by autoscaling

Workaround: Creating an on-demand reservation might work