Service Health
Incident affecting Cloud Logging, Google BigQuery, Operations
Issue with BigQuery Streaming API and Cloud Logging in US Multiregion
Incident began at 2024-06-10 18:07 and ended at 2024-06-10 20:09 (all times are US/Pacific).
Previously affected location(s)
Multi-region: us
Date | Time | Description | |
---|---|---|---|
| 26 Jun 2024 | 15:08 PDT | Incident ReportSummaryBetween Thursday, 30 May 2024 and Thursday, 13 June 2024 at 23:45 US/Pacific on, BigQuery customers experienced elevated query latencies across the multiregions:us and multiregions:eu. As a result, operations requiring compute resources (queries, loads, extracts) would have also experienced resource contention during the same period, and some customers with queries or operations relying on the BigQuery Autoscaler would have been unable to scale up to the desired limits. Additionally, between 16:46 to 20:00 US/Pacific on Monday, 10 June 2024, BigQuery experienced elevated errors across multiple APIs (Jobs, Query, Load, Extract, Storage Read and Write APIs) in the multiregion:us region. Root CauseBigQuery uses a distributed shuffle infrastructure for execution of large and complex joins, aggregations and analytic operations needed for query execution. Shuffle storage is a tiered architecture, optimizing for storing data in-memory, but uses SSD then HDD as backing stores to flush to as the aggregate needs increase. The incident was caused by a combination of factors.
Diverting the network traffic from an affected zone is a mitigation usually taken while determining the root cause of a problem. However, an operator error resulted in reduction of capacity in multiple zones simultaneously on Monday 10 June, 2024 at 16:46 US/Pacific. This led to elevated errors across BigQuery APIs until 20:00 PST the same day when the impact of excessive traffic redirection was mitigated. Remediation and Prevention
We apologize for the length and severity of this incident. We are taking immediate steps to prevent a recurrence and improve reliability in the future.
Detailed Description of ImpactBigQuery:
Cloud Logging: Ingestion delays to analytics buckets with local global. Logs Explorer queries were not affected. |
| 11 Jun 2024 | 09:11 PDT | Mini Incident ReportWe apologize for the inconvenience this service disruption/outage may have caused. We would like to provide some information about this incident below. Please note, this information is based on our best knowledge at the time of posting and is subject to change as our investigation continues. If you have experienced impact outside of what is listed below, please reach out to Google Cloud Support using https://cloud.google.com/support . (All Times US/Pacific) Incident Start: 10 June 2024 16:45 Incident End: 10 June 2024 20:00 Duration: 3 hours and 15 minutes Affected Services and Features: Google BigQuery and Google Cloud Logging Regions/Zones: Multi-regions: US Description: BigQuery experienced elevated errors across multiple APIs in the US Multi-region due to the concurrent mitigation of simultaneous degradations, which impacted the BigQuery projects hosted in two degraded clusters. Google will complete a full IR in the following days that will provide a full root cause. Customer Impact: BigQuery: A subset of customers would have experienced HTTP 500 errors while executing calls to insertAll and storage write APIs in the US Multi-region. Additionally, some customers may have experienced system errors using the Jobs and Query API. Cloud Logging: Cloud Logging customers using Log Analytics faced log ingestion delays up to 2 hours for their analytics buckets if they ingested logs via Cloud regions us-central1 or us-central2. Logs Explorer queries in the Google Cloud console were not impacted. |
| 10 Jun 2024 | 20:09 PDT | The issue with Cloud Logging, Google BigQuery has been resolved for all affected users as of Monday, 2024-06-10 20:00 US/Pacific. We will publish an analysis of this incident once we have completed our internal investigation. We thank you for your patience while we worked on resolving the issue. |
| 10 Jun 2024 | 19:06 PDT | Summary: Issue with BigQuery Streaming API and Cloud Logging in US Multiregion Description: Google engineers are working on mitigating the problem and we are observing the error rate for the impacted APIs dropping. We are closely monitoring the progress of the problem mitigation. We will provide more information by Monday, 2024-06-10 20:30 US/Pacific. Diagnosis: Google BigQuery: Customers impacted by this issue may see 500 errors while executing BigQuery statements. Cloud Logging: Ingestion delays to analytics buckets with local global. Logs Explorer queries are not affected. Workaround: None at this time. |
| 10 Jun 2024 | 18:36 PDT | Summary: Issue with BigQuery Streaming API and Cloud Logging Description: We are experiencing an issue with Google BigQuery Streaming API, Cloud Logging beginning on Monday, 2024-06-10 16:46 US/Pacific. Our engineering team continues to investigate the issue. We will provide an update by Monday, 2024-06-10 18:59 US/Pacific with current details. Diagnosis: Google BigQuery: Customers impacted by this issue may see 500 errors while executing BigQuery statements. Cloud Logging: Ingestion delays to analytics buckets with local global. Logs Explorer queries are not affected. Workaround: None at this time. |
- All times are US/Pacific