Service Health
Incident affecting Cloud Build, Cloud Developer Tools, Cloud Machine Learning, Google Cloud Dataflow, Google Cloud Dataproc, Google Cloud Pub/Sub, Google Compute Engine, Google Kubernetes Engine, Persistent Disk, Vertex AI Batch Prediction
Multiple GCP services impacted in the europe-west3-c zone
Incident began at 2024-10-23 18:22 and ended at 2024-10-24 02:01 (all times are US/Pacific).
Previously affected location(s)
Frankfurt (europe-west3)
Date | Time | Description | |
---|---|---|---|
| 31 Oct 2024 | 08:50 PDT | Incident ReportSummaryOn Wednesday, 23 October 2024, a power failure occurred in a single data center within the europe-west3 region. This failure degraded the building’s cooling infrastructure, leading to a partial shutdown of the europe-west-c zone to avoid thermal damage and causing Virtual Machines (VMs) to go offline. The event duration was 7 hours and 39 minutes and impacted various Google Cloud services in the affected zone. To our Google Cloud customers whose businesses were impacted during this outage, we sincerely apologize. This is not the level of quality and reliability we strive to offer, and we are taking immediate steps to improve the platform’s performance and resilience. Root CauseOn 23 October 2024, at 18:22 US/Pacific time, an electrical arc flash occurred in one of the europe-west3-c zone's power distribution units, resulting in a partial power outage. This incident also affected the cooling infrastructure, leading to a rise in ambient temperature. To prevent damage, some IT equipment at the facility was shut down, causing Virtual Machines (VMs) in the datacenter to go offline and impacting multiple cloud services in the zone. Remediation and PreventionGoogle engineers were alerted to VM failures in europe-west3-c zone at 18:39 US/Pacific on 23 October 2024, and immediately launched an investigation. Upon understanding the issue's nature and scope, engineers took precautionary measures to ensure equipment safety by shutting it down and diverting traffic away from the affected infrastructure at 20:43 US/Pacific. Power was manually restored at 21:44 US/Pacific by transferring the load away from the failed components. Cloud traffic was gradually reintroduced to the datacenter at 00:30 US/Pacific on 24 October, 2024. Full restoration of all cloud services in the affected zone was completed by 2:09 US/Pacific. We apologize for the length and severity of this incident. We are taking immediate steps to prevent a recurrence and improve reliability in the future. To ensure continued high availability in the future, Google are pursuing the following actions:
Detailed Description of ImpactGoogle Compute Engine (GCE) and Persistent Disk (PD): Additionally, a small percentage of Persistent Disk volumes in europe-west3-c were unavailable from 23 October 2024 18:24 to 24 October 2024 00:58 US/Pacific. Google Cloud Pub/Sub: Google Cloud Dataflow: Dataproc: Cloud Build: Google Kubernetes Engine (GKE): Vertex AI Batch Prediction: Cloud SQL: |
| 24 Oct 2024 | 10:49 PDT | Mini Incident ReportWe apologize for the inconvenience this service disruption/outage may have caused. We would like to provide some information about this incident below. Please note, this information is based on our best knowledge at the time of posting and is subject to change as our investigation continues. If you have experienced impact outside of what is listed below, please reach out to Google Cloud Support using https://cloud.google.com/support. (All Times US/Pacific) Incident Start: 23 October 2024 18:30 Incident End: 24 October 2024 02:09 Duration: 7 hours, 39 minutes Affected Services and Features:
Regions/Zones: Region europe-west3 / Zone europe-west3-c Description: Multiple Google Cloud products were impacted in the europe-west3-c zone for a duration of 7 hours, 39 minutes. From preliminary analysis, the root cause of the issue was due to a power failure and cooling issue leading to a fraction of a zone being powered down causing services to be degraded. Google engineers implemented a fix to return the datacenter to full operation and this mitigated the issue. Google will complete a full IR in the following days that will provide a full root cause. Customer Impact:
|
| 24 Oct 2024 | 02:14 PDT | The issue with Google Cloud Pub/Sub, Google Compute Engine, Persistent Disk, Google Cloud Dataflow, Google Cloud Dataproc, Google Kubernetes Engine, Cloud Build, Vertex AI Batch Prediction has been resolved for all affected users as of Thursday, 2024-10-24 02:09 US/Pacific. We will publish an analysis of this incident once we have completed our internal investigation. We thank you for your patience while we worked on resolving the issue. |
| 24 Oct 2024 | 01:25 PDT | Summary: Multiple GCP services impacted in the europe-west3-c zone Description: We are experiencing an issue with multiple GCP services including Google Compute Engine, Persistent Disk, Google Cloud Dataflow in the europe-west3-c zone due to power and a cooling issue. Mitigation work is still underway by our engineering team and we do not have an ETA at the moment. We will provide more information by Thursday, 2024-10-24 02:30 US/Pacific. Diagnosis: The majority of the services impact is now limited to the zonal level. Vertex AI Batch Prediction continues to be impacted at the regional level. Services impacted include: Google Compute Engine: The loss of power has led to capacity failure in the region. Customers may experience: A percentage of Virtual Machines (VMs) being terminated and not available until power is restored. A percentage of VMs may have lost access to their Persistent Disk and may be crashlooping. A percentage of regional Persistent Disks may be running in a degraded state. The incident is affecting the Compute API in the following ways: Creation of new VMs or disks in europe-west3-c may fail. A percentage of customers attempting to consume VM reservations will be unable to do so. A percentage of customers who would like to delete their previously running VMs in europe-west3, can delete VMs via the console or GCE APIs. However, there may be a delay in processing these deletions. All deletions will be fully processed when issues in europe-west3-c are resolved. Google Kubernetes Engine: The Google Kubernetes Engine nodes in the impacted location may be inaccessible and creation of new nodes may fail. Google Cloud Dataflow: Some existing batch jobs may experience delays when scaling workers. In addition streaming jobs may not be progressing or scaling up workers. Google Cloud Dataproc: While the existing clusters are not impacted, creation of new clusters may fail. Cloud Build: Builds in Custom Worker pools take a long time to start. Vertex AI Batch Prediction: A Vertex batch prediction job may fail with an error, "Unable to prepare an infrastructure for serving within time". Google Cloud Pub/Sub: There is no ongoing impact for the users at the moment. Workaround:
|
| 24 Oct 2024 | 00:26 PDT | Summary: Multiple GCP services impacted in the europe-west3-c zone Description: We are experiencing an issue with multiple GCP services including Google Compute Engine, Persistent Disk, Google Cloud Dataflow in the europe-west3-c zone due to power and a cooling issue. Mitigation work is still underway by our engineering team and we do not have an ETA at the moment. We will provide more information by Thursday, 2024-10-24 01:30 US/Pacific. Diagnosis: The majority of the services impact is now limited to the zonal level. Vertex AI Batch Prediction continues to be impacted at the regional level. Services impacted include: Google Compute Engine: The loss of power has led to capacity failure in the region. Customers may experience: A percentage of Virtual Machines (VMs) being terminated and not available until power is restored. A percentage of VMs may have lost access to their Persistent Disk and may be crashlooping. A percentage of regional Persistent Disks may be running in a degraded state. The incident is affecting the Compute API in the following ways: Creation of new VMs or disks in europe-west3-c may fail. A percentage of customers attempting to consume VM reservations will be unable to do so. A percentage of customers who would like to delete their previously running VMs in europe-west3, can delete VMs via the console or GCE APIs. However, there may be a delay in processing these deletions. All deletions will be fully processed when issues in europe-west3-c are resolved. Google Kubernetes Engine: The Google Kubernetes Engine nodes in the impacted location may be inaccessible and creation of new nodes may fail. Google Cloud Dataflow: Some existing batch jobs may experience delays when scaling workers. In addition streaming jobs may not be progressing or scaling up workers. Google Cloud Dataproc: While the existing clusters are not impacted, creation of new clusters may fail. Cloud Build: Builds in Custom Worker pools take a long time to start. Vertex AI Batch Prediction: A Vertex batch prediction job may fail with an error, "Unable to prepare an infrastructure for serving within time". Google Cloud Pub/Sub: There is no ongoing impact for the users at the moment. Workaround:
|
| 23 Oct 2024 | 23:55 PDT | Summary: Multiple GCP services impacted in the europe-west3-c zone Description: We are experiencing an issue with multiple GCP services including Google Compute Engine, Persistent Disk, Google Cloud Dataflow in the europe-west3-c zone due to power and a cooling issue. Mitigation work is still underway by our engineering team and we do not have an ETA at the moment. We will provide more information by Thursday, 2024-10-24 01:30 US/Pacific. Diagnosis: The impact is now determined to be back at zonal level. Regional level impact is mitigated at the moment. Multiple services are impacted in the europe-west3-c zone: Google Compute Engine: The loss of power has led to capacity failure in the region. Customers may experience: A percentage of Virtual Machines (VMs) being terminated and not available until power is restored. A percentage of VMs may have lost access to their Persistent Disk and may be crashlooping. A percentage of regional Persistent Disks may be running in a degraded state. The incident is affecting the Compute API in the following ways: Creation of new VMs or disks in europe-west3-c may fail. A percentage of customers attempting to consume VM reservations will be unable to do so. A percentage of customers who would like to delete their previously running VMs in europe-west3, can delete VMs via the console or GCE APIs. However, there may be a delay in processing these deletions. All deletions will be fully processed when issues in europe-west3-c are resolved. Google Kubernetes Engine: The Google Kubernetes Engine nodes in the impacted location may be inaccessible and creation of new nodes may fail. Google Cloud Dataflow: Some existing batch jobs may experience delays when scaling workers. In addition streaming jobs may not be progressing or scaling up workers. Google Cloud Dataproc: While the existing clusters are not impacted, creation of new clusters may fail. Cloud Build: Builds in Custom Worker pools take a long time to start. Google Cloud Pub/Sub: There is no ongoing impact for the users at the moment. Workaround: 1. If you are impacted, please migrate the workload or operations from the europe-west3-c zone to other available zones or regions.
|
| 23 Oct 2024 | 23:14 PDT | Summary: Multiple GCP services impacted in the europe-west3 region Description: We are experiencing an issue with multiple GCP services including Google Compute Engine, Persistent Disk, Google Cloud Dataflow in the europe-west3 region due to power and a cooling issue. Mitigation work is still underway by our engineering team and we do not have an ETA at the moment. We will provide more information by Thursday, 2024-10-24 01:00 US/Pacific. Diagnosis: Multiple services are impacted in the europe-west3 region: Google Compute Engine: The loss of power has led to capacity failure in the region. Customers may experience: A percentage of Virtual Machines (VMs) being terminated and not available until power is restored. A percentage of VMs may have lost access to their Persistent Disk and may be crashlooping. A percentage of regional Persistent Disks may be running in a degraded state. The incident is affecting the Compute API in the following ways: Creation of new VMs or disks in europe-west3 may fail. A percentage of customers attempting to consume VM reservations will be unable to do so. A percentage of customers who would like to delete their previously running VMs in europe-west3, can delete VMs via the console or GCE APIs. However, there may be a delay in processing these deletions. All deletions will be fully processed when issues in europe-west3 are resolved. Google Kubernetes Engine: The Google Kubernetes Engine nodes in the impacted location may be inaccessible and creation of new nodes may fail. Google Cloud Dataflow: Some existing batch jobs may experience delays when scaling workers. In addition streaming jobs may not be progressing or scaling up workers. Google Cloud Dataproc: While the existing clusters are not impacted, creation of new clusters may fail. Google Cloud Pub/Sub: There is no ongoing impact for the users at the moment. Workaround:
|
| 23 Oct 2024 | 22:59 PDT | Summary: Multiple GCP services impacted in europe-west3-c zone Description: We are experiencing an issue with multiple GCP services including Google Compute Engine, Persistent Disk, Google Cloud Dataflow in europe-west3-c zone due to power and a cooling issue. Mitigation work is still underway by our engineering team and we do not have an ETA at the moment. We will provide more information by Thursday, 2024-10-24 01:00 US/Pacific. Diagnosis: Multiple services are impacted in europe-west3-c: Google Compute Engine: The loss of power has led to capacity failure in the zone. Customers may experience: A percentage of Virtual Machines (VMs) being terminated and not available until power is restored. A percentage of VMs may have lost access to their Persistent Disk and may be crashlooping. A percentage of regional Persistent Disks may be running in a degraded state. The incident is affecting the Compute API in the following ways: Creation of new VMs or disks in europe-west3-c may fail. A percentage of customers attempting to consume VM reservations will be unable to do so. A percentage of customers who would like to delete their previously running VMs in europe-west3-c, can delete VMs via the console or GCE APIs. However, there may be a delay in processing these deletions. All deletions will be fully processed when issues in europe-west3-c are resolved. Google Kubernetes Engine: The Google Kubernetes Engine nodes in the impacted location may be inaccessible and creation of new nodes may fail. Google Cloud Dataflow: Some existing batch jobs may experience delays when scaling workers. In addition streaming jobs may not be progressing or scaling up workers. Google Cloud Dataproc: While the existing clusters are not impacted, creation of new clusters may fail. Google Cloud Pub/Sub: There is no ongoing impact for the users at the moment. Workaround:
|
| 23 Oct 2024 | 21:59 PDT | Summary: Multiple GCP services impacted in europe-west3-c zone Description: We are experiencing an issue with Google Cloud Pub/Sub, Google Compute Engine, Persistent Disk, Google Cloud Dataflow. Our engineering team continues to investigate the issue. We will provide an update by Wednesday, 2024-10-23 23:00 US/Pacific with current details. Diagnosis: Multiple services are impacted in europe-west3-c: Google Compute Engine: Impacted users may observe VM creation failing and some instances may not be available for operations in this zone. Google Kubernetes Engine: The Google Kubernetes Engine nodes in impacted location might be inaccessible. Also, creation of new node may fail. Persistent Disk: The persistent disk instances might be unreachable for operations. Google Cloud Dataflow: Some existing batch jobs may experience delays when scaling workers. Also, the streaming jobs may not be progressing or scaling up workers. Google Cloud Dataproc: While the existing clusters are not impacted, creating new clusters may fail. Google Cloud Pub/Sub: There is no ongoing impact for the users at the moment. Workaround: If you are impacted, please migrate the workload or operations from the europe-west3-c zone to other available zones or regions. |
| 23 Oct 2024 | 21:27 PDT | Summary: Multiple GCP services impacted in europe-west3-c zone Description: We are experiencing an issue with Google Cloud Pub/Sub, Google Compute Engine, Persistent Disk, Google Cloud Dataflow. Our engineering team continues to investigate the issue. We will provide an update by Wednesday, 2024-10-23 23:00 US/Pacific with current details. Diagnosis: Multiple services are impacted in europe-west3-c: Google Compute Engine: Impacted users may observe VM creation failing and some instances may not be available for operations in this zone. Persistent Disk: The persistent disk instances might be unreachable for operations. Google Cloud Dataflow: While the existing clusters are not impacted, creating new clusters may fail. Google Cloud Dataproc: While the existing clusters are not impacted, creating new clusters may fail. Google Cloud Pub/Sub: There is no ongoing impact for the users at the moment. Workaround: If you are impacted, please migrate the workload or operations from the europe-west3-c zone to other available zones or regions. |
| 23 Oct 2024 | 20:32 PDT | Summary: Multiple GCP services impacted in europe-west3-c zone Description: We are experiencing an issue with Google Cloud Pub/Sub, Google Compute Engine, Persistent Disk, Google Cloud Dataflow. Our engineering team continues to investigate the issue. We will provide an update by Wednesday, 2024-10-23 21:30 US/Pacific with current details. Diagnosis: Multiple services are impacted in europe-west3-c: Google Compute Engine: Impacted users may observe VM creation failing and some instances may not be available for operations in this zone. Persistent Disk: The persistent disk instances might be unreachable for operations. Google Cloud Dataflow: While the existing clusters are not impacted, creating new clusters may fail. Google Cloud Dataproc: While the existing clusters are not impacted, creating new clusters may fail. Workaround: None at this time. |
| 23 Oct 2024 | 19:56 PDT | Summary: Multiple GCP services impacted in europe-west3-c zone Description: We are experiencing an issue with Google Cloud Pub/Sub, Google Compute Engine, Persistent Disk beginning at Wednesday, 2024-10-23 18:24 US/Pacific. Our engineering team continues to investigate the issue. We will provide an update by Wednesday, 2024-10-23 20:30 US/Pacific with current details. We apologize to all who are affected by the disruption. Diagnosis: None at this time. Workaround: None at this time. |
- All times are US/Pacific