Service Health
Incident affecting Google BigQuery
We've received a report of an issue with Google BigQuery.
Incident began at 2023-11-20 20:00 and ended at 2023-11-21 05:50 (all times are US/Pacific).
Previously affected location(s)
Multi-region: eu
Date | Time | Description | |
---|---|---|---|
| 23 Nov 2023 | 18:48 PST | Incident ReportSummaryOn Monday, 20 November, Google BigQuery’s backend capacity management system experienced an issue that prevented existing capacity from being correctly distributed in the EU multi-region for a period of 9 hours and 50 minutes. During this period, BigQuery customers in the EU multi-region were unable to purchase new slots, and customers who were using the BigQuery autoscaler were unable to scale up. To our BigQuery customers whose autoscaling functionality was impacted during this disruption, we sincerely apologize. This is not the level of quality and reliability we strive to offer you, and we are taking immediate steps to improve the platform’s performance and availability. Root CauseThe process for approving slots is managed by the backend capacity management system, which combines information about the current placement of BigQuery customers, capacities, and reservation sizes taken from an internal database in order to determine current available capacity. The root cause was an issue with our capacity management service that caused an invalid data entry to be written to this database. As a result of this invalid entry, the backend capacity management system incorrectly indicated that there was no available slot capacity in the region. Because of this, customers were unable to purchase slots and autoscale reservations in the EU multi-region. Remediation and PreventionGoogle engineers were alerted to the issue on 21 November at 03:40 US/Pacific from customer cases. Google engineers immediately started working on mitigating the impact by restarting the process that is responsible for backend capacity distribution; however, the issue resurfaced soon afterward. The incident was fully mitigated at 05:50 US/Pacific when the invalid data in the reservations database was identified and fixed. Google is committed to preventing a repeat of this issue in the future. In addition to fixing the issue that caused the original invalid data entry, Google is completing the following actions:
Detailed Description of ImpactFor a period of 9 hours and 50 minutes, starting at 20:13 US/Pacific on 20 November customers in the EU multi-region:
|
| 21 Nov 2023 | 11:34 PST | Mini Incident ReportWe apologize for the inconvenience this service disruption has caused. We would like to provide some information about this incident below. Please note, this information is based on our best knowledge at the time of posting and is subject to change as our investigation continues. If you have experienced impact outside of what is listed below, please reach out to Google Cloud Support using https://cloud.google.com/support. (All Times US/Pacific) Incident Start: 20 November 2023, 20:00 Incident End: 21 November 2023, 05:50 Duration: 9 hours, 50 minutes Affected Services and Features: Google BigQuery - Workload Management Regions/Zones: EU multiregion Description: Google BigQuery users experienced issues with slot autoscaling and creating new reservations for a duration of 9 hours, 50 minutes From preliminary analysis, the issue was due to a backend capacity management system error that prevented existing capacity from being correctly distributed in the EU multiregion. Our engineering team has mitigated the issue and took additional steps to avoid recurrence. Google will complete a full IR in the following days that will provide a full root cause. Customer Impact:
|
| 21 Nov 2023 | 06:00 PST | The issue with Google BigQuery has been resolved for all affected users as of Tuesday, 2023-11-21 05:34 US/Pacific. The issue was due to a backend capacity management system error that prevented existing capacity from being correctly distributed. We thank you for your patience while we worked on resolving the issue. |
| 21 Nov 2023 | 05:24 PST | Summary: We've received a report of an issue with Google BigQuery. Description: Mitigation work is currently underway by our engineering team. We do not have an ETA for mitigation at this point. We will provide more information by Tuesday, 2023-11-21 06:33 US/Pacific. Diagnosis: Reservations are not able to be created, or increased by autoscaling Workaround: None at this time. |
| 21 Nov 2023 | 04:40 PST | Summary: We've received a report of an issue with Google BigQuery. Description: Our engineering team has determined that in-depth investigation is required and is currently considering alternative mitigation options. We will provide an update by Tuesday, 2023-11-21 05:32 US/Pacific with current details. Diagnosis: Reservations are not able to be created, or increased by autoscaling Workaround: None at this time. |
| 21 Nov 2023 | 04:06 PST | Summary: We've received a report of an issue with Google BigQuery. Description: Mitigation work is currently underway by our engineering team. We do not have an ETA for mitigation at this point. We will provide more information by Tuesday, 2023-11-21 04:49 US/Pacific. Diagnosis: Reservations are not able to be created, or increased by autoscaling Workaround: None at this time. |
| 21 Nov 2023 | 03:39 PST | Summary: We've received a report of an issue with Google BigQuery. Description: We are experiencing an issue with Google BigQuery. Our engineering team continues to investigate the issue. We will provide an update by Tuesday, 2023-11-21 04:30 US/Pacific with current details. We apologize to all who are affected by the disruption. Diagnosis: Reservations are not able to be created, or increased by autoscaling Workaround: Creating an on-demand reservation might work |
| 21 Nov 2023 | 03:28 PST | Summary: We've received a report of an issue with Google BigQuery. Description: We are experiencing an issue with Google BigQuery. Our engineering team continues to investigate the issue. We will provide an update by Tuesday, 2023-11-21 04:30 US/Pacific with current details. We apologize to all who are affected by the disruption. Diagnosis: Reservations are not able to be created, or increased by autoscaling Workaround: Creating an on-demand reservation might work |
- All times are US/Pacific