Service Health
Incident affecting Apigee, Apigee Edge Public Cloud, Batch, Cloud Filestore, Cloud Firestore, Cloud Key Management Service, Cloud NAT, Cloud Run, Google BigQuery, Google Cloud Dataflow, Google Cloud Dataproc, Google Cloud Deploy, Google Cloud Networking, Google Cloud Pub/Sub, Google Cloud SQL, Google Cloud Storage, Google Compute Engine, Google Kubernetes Engine, Identity and Access Management, Persistent Disk, Resource Manager API, Virtual Private Cloud (VPC)
All the impacted GCP products in australia-southeast2 have recovered.
Incident began at 2024-10-29 16:21 and ended at 2024-10-29 19:34 (all times are US/Pacific).
Previously affected location(s)
Melbourne (australia-southeast2)
Date | Time | Description | |
---|---|---|---|
| 30 Oct 2024 | 12:30 PDT | Mini Incident ReportWe apologize for the inconvenience this service disruption/outage may have caused. We would like to provide some information about this incident below. Please note, this information is based on our best knowledge at the time of posting and is subject to change as our investigation continues. If you have experienced impact outside of what is listed below, please reach out to Google Cloud Support using https://cloud.google.com/support . (All Times US/Pacific) Incident Start: 29 October 2024 16:21 Incident End: 29 October 2024 19:34 Duration: 3 hours, 13 minutes Affected Services and Features: Apigee, Apigee Edge Public Cloud, Batch, Cloud Filestore, Cloud Firestore, Cloud Key Management Service, Cloud NAT, Cloud Run, Google BigQuery, Google Cloud Dataflow, Google Cloud Dataproc, Google Cloud Networking, Google Cloud Pub/Sub, Google Cloud SQL, Google Cloud Storage, Google Compute Engine, Google Kubernetes Engine, Identity and Access Management, Persistent Disk, Resource Manager API, Virtual Private Cloud (VPC) Regions/Zones: australia-southeast2 Description: Multiple Google Cloud products experienced service disruptions of varying impact and duration, with the longest lasting 2 hours, 3 minutes in the australia-southeast2 region. From preliminary analysis, the root cause of the issue was a power interruption causing network and optical infrastructure to reboot in a subset of the sites in the australia-southeast2 region. Google will complete a full Incident Report in the following days that will provide a detailed root cause. Customer Impact: Apigee - Impacted users experienced issues with the pod scheduling from 16:40 to 19:34. Apigee Edge Public Cloud - Impacted users observed increased runtime, latency and an increase in 5XX errors for runtime from 16:21 to 17:46. Batch - New batch jobs that were created in the australia-southeast2 region remained in SCHEDULED status and did not progress. Customers had the option to switch to different regions. Cloud Filestore - Instance creation/deletion operations failed in the region. Instances with virtual machines in these clusters were not reachable from other regions. Regional instances, depending on the placement of the majority of the virtual machines, went into lockdown. Cloud Firestore - Customers in the australia-southeast2 region experienced limited to no availability from 16:21 to 16:44. Cloud Key Management Service - KMS was not reachable in the australia-southeast2 region. Approximately 60% of the traffic was lost during the impacted period. Cloud NAT - Customers experienced loss of connectivity in the australia-southeast2 region. Cloud Run - Impacted users experienced dropped requests and high error rates from 16:20 to 17:14. Google BigQuery - Impacted users observed delay/errors in handling the requests from 16:20 to 16:55. Google Cloud Dataflow - Impacted users had limited to no availability in the australia-southeast2 region between 16:21 to 17:40. Google Cloud Dataproc - Impacted users observed that all dataproc services were down from 17:13 to 19:10. Google Cloud Networking - Impacted users experienced up to 100% of the requests being dropped from 16:21 to 16:43. Google Cloud Pub/Sub - Cloud Pub/Sub had limited to no availability in the australia-southeast2 region between 16:21 to 17:06. Google Cloud SQL - Multiple instances of Cloud SQL were unavailable between 16:23 and 17:52. Google Cloud Storage - Impacted users/projects experienced request timeout or unavailable errors between 16:21 and 17:12. Google Compute Engine - Impacted users observed GCE operations such as compute.instances.insert fail from 16:40 to 19:34. Google Kubernetes Engine - Impacted users were unable to make changes to their workloads on GKE clusters from 16:20 to 17:30. Identity and Access Management - Impacted users experienced loss of traffic and visible latency/unavailability within the region from 16:28 to 17:14. Persistent Disk - Customers experienced high latency up to several minutes for I/O operations in the australia-southeast2 region. Customers had the workaround to restore snapshots in another region or use asynchronous failover replicas out of this impacted region. Virtual Private Cloud (VPC) - Impacted users experienced a network outage in and out of the impacted region from 16:35 to 18:41. Resource Manager API - Impacted users experienced loss of traffic and visible latency/unavailability within the region from 16:28 to 17:14. |
| 29 Oct 2024 | 19:29 PDT | The issue with Google Cloud Networking, Cloud Run, Identity and Access Management, Resource Manager API, Persistent Disk, Virtual Private Cloud (VPC), Google Cloud Dataflow, Apigee, Google Compute Engine, Google Cloud SQL, Apigee Edge Public Cloud, Google Cloud Storage, Google BigQuery, Google Kubernetes Engine, Cloud Dataproc has been resolved for all affected users as of Tuesday, 2024-10-29 18:24 US/Pacific. We thank you for your patience while we worked on resolving the issue. |
| 29 Oct 2024 | 18:47 PDT | Summary: Some GCP products in australia-southeast2 may have intermittent network connectivity. Description: We are experiencing an issue with Google Cloud Networking, Cloud Run, Resource Manager API, Google Cloud Dataflow, Google Cloud SQL, Google Cloud Storage, Google Kubernetes Engine, Google Compute Engine beginning at Tuesday, 2024-10-29 16:28 US/Pacific. Based on the internal metrics the following products have seen recovery.
Our engineering team continues to investigate the issue. We will provide an update by Tuesday, 2024-10-29 19:30 US/Pacific with current details. Diagnosis: Customers impacted by this may see issues with new deployment. Workaround: None at this time. |
| 29 Oct 2024 | 17:59 PDT | Summary: Multiple GCP products impacted in australia-southeast2 with intermittent network connectivity. Description: We are experiencing an issue with Google Cloud Networking, Cloud Run, Resource Manager API, Virtual Private Cloud (VPC), Google Cloud Dataflow, Google Compute Engine, Google Cloud SQL, Google Cloud Storage, Google BigQuery beginning at Tuesday, 2024-10-29 16:28 US/Pacific. Based on the internal metrics the following products have seen recovery.
Our engineering team continues to investigate the issue. We will provide an update by Wednesday, 2024-10-30 00:41 US/Pacific with current details. Diagnosis: Customers impacted by this may see issues with new deployment. Workaround: None at this time. |
| 29 Oct 2024 | 17:29 PDT | Summary: Multiple GCP products impacted in australia-southeast2 Description: We are experiencing an issue with Google Cloud Networking, Cloud Run, Identity and Access Management, Resource Manager API, Persistent Disk, Virtual Private Cloud (VPC), Google Cloud Dataflow, Google Compute Engine beginning at Tuesday, 2024-10-29 16:21US/Pacific. Our engineering team continues to investigate the issue. We will provide an update by Tuesday, 2024-10-29 18:00 US/Pacific with current details. Diagnosis: Customers impacted by this may see issues with new deployment. Workaround: None at this time. |
| 29 Oct 2024 | 17:17 PDT | Summary: Multiple GCP products impacted in australia-southeast2 Description: We are experiencing an issue with FEATURE beginning at Tuesday, 2024-10-29 16:21US/Pacific. Our engineering team continues to investigate the issue. We will provide an update by Tuesday, 2024-10-29 18:00 US/Pacific with current details. We apologize to all who are affected by the disruption. Diagnosis: Customers impacted by this may see issues with new deployment. Workaround: None at this time. |
- All times are US/Pacific