Service Health
Incident affecting Google BigQuery
BigQuery observing high import/export job latencies in several regions
Incident began at 2022-07-14 19:30 and ended at 2022-07-15 15:02 (all times are US/Pacific).
Date | Time | Description | |
---|---|---|---|
| 27 Jul 2022 | 17:46 PDT | INCIDENT REPORT Summary: Google Cloud Networking experienced reduced capacity for lower priority traffic such as batch, streaming and transfer operations from 19:30 US/Pacific on Thursday, 14 July 2022, through 15:02 US/Pacific on Friday, 15 July 2022. High-priority user-facing traffic was not affected. This service disruption resulted from an issue encountered during a combination of repair work and a routine network software upgrade rollout. Due to the nature of the disruption and resilience capabilities of Google Cloud products, the impacted regions and individual impact windows varied substantially. To our customers whose businesses were impacted during this disruption, we sincerely apologize. This is not the level of quality and reliability we strive to offer you, and we are taking immediate steps to improve the platform’s availability. Root Cause: The root cause was identified as an issue with a new control plane configuration rollout, causing low-priority classified traffic capacity reduction in Google’s internal backbone network connecting data centers. Mitigation efforts were slowed by the capacity reduction, and engineering teams required more than their usual time to safely undo the configuration change. During the period of the rollout and subsequent rollback, constrained traffic in certain cloud zones affected the performance of some Cloud services. Remediation and Prevention: At approximately 02:00 US/Pacific on Friday, 15 July, as an in-progress rollout expanded to more regions, Google engineers observed performance degradation in Cloud Networking due to reduced capacity. The engineering team then started an investigation into the cause. At 03:50 US/Pacific, Google engineers pushed the first mitigation attempt to halt the ongoing rollout. While this effort succeeded in pausing any new actions, those already in progress continued, which further reduced network capacity. Subsequently, the engineering team shifted their mitigation efforts toward a global rollback of the problematic configuration. Their first attempt to mitigate using a configuration push was applied at 08:40 US/Pacific, but it was not successfully applied to all nodes, due in part to the reduced network performance. Google engineers worked through alternate mitigations, and by 12:40 US/Pacific, the configuration was updated correctly, and this mitigated the majority of impact. By 15:02 US/Pacific on 15 July 2022, services for all customers had been restored. The Google Cloud Service Health Dashboard was updated to reflect this. Google is committed to preventing future recurrence, and we are taking the following actions: Detection:
Prevention:
Mitigation:
Detailed Description of Impact: Google Cloud Networking
Google Cloud Storage (GCS)
Google BigQuery
Note
|
| 18 Jul 2022 | 15:31 PDT | This is a preliminary Incident Report (Mini-IR). A Full Incident Report with additional details is being prepared and will be posted at a later date. We apologize for the inconvenience this service disruption/outage may have caused. We would like to provide some information about this incident below. Please note, this information is based on our best knowledge at the time of posting and is subject to change as our investigation continues. If you have experienced impact outside of what is listed below, please reach out to Google Cloud Support using https://cloud.google.com/support. (All Times US/Pacific) Incident Start: 14 July 2022 19:30 Incident End: 15 July 2022 15:02 Duration: 19 hours, 32 minutes Affected Services Google Cloud Networking Google BigQuery Google Cloud Storage (GCS) Regions/Zones: Global Description: Google Cloud Networking experienced reduced availability globally for a period of 19 hours and 32 minutes. Because of the nature of the outage and resilience capabilities of GCP products, the impacted regions and individual impacted windows may vary inside of the network impact window. From preliminary analysis, the root cause was due to an issue with a new control plane configuration rollout. Customer Impact: Google Cloud Storage
Google BigQuery
|
| 15 Jul 2022 | 14:17 PDT | The issue with Google BigQuery has been resolved for all affected users as of Friday, 2022-07-15 14:12 US/Pacific. We thank you for your patience while we worked on resolving the issue. |
| 15 Jul 2022 | 14:02 PDT | Summary: BigQuery observing high import/export job latencies in several regions Description: We are experiencing an issue with Google BigQuery beginning at Friday, 2022-07-15 05:00 US/Pacific. Our engineering team continues to investigate the issue. We will provide an update by Friday, 2022-07-15 15:05 US/Pacific with current details. We apologize to all who are affected by the disruption. Diagnosis: BigQuery is observing high import/export job latencies (slowness) in several regions, which may ultimately result in errors. Workaround: None at this time. |
| 15 Jul 2022 | 13:40 PDT | Summary: BigQuery observing high import/export job latencies in several regions Description: We are experiencing an issue with Google BigQuery beginning at Friday, 2022-07-15 05:00 US/Pacific. Our engineering team continues to investigate the issue. We will provide an update by Friday, 2022-07-15 15:00 US/Pacific with current details. We apologize to all who are affected by the disruption. Diagnosis: BigQuery is observing high import/export job latencies (slowness) in several regions, which may ultimately result in errors. Workaround: None at this time. |
- All times are US/Pacific