Service Health
Incident affecting Apigee, Cloud Filestore, Cloud Logging, Google BigQuery, Google Cloud Bigtable, Google Cloud Dataflow, Google Cloud Networking, Google Cloud Pub/Sub, Google Cloud SQL, Google Compute Engine, Operations, Persistent Disk, Virtual Private Cloud (VPC)
Multiple services impacted in australia-southeast1.
Incident began at 2024-05-08 19:00 and ended at 2024-05-08 22:28 (all times are US/Pacific).
Previously affected location(s)
Sydney (australia-southeast1)
Date | Time | Description | |
---|---|---|---|
| 21 May 2024 | 15:51 PDT | Incident ReportSummaryOn Wednesday, 8 May 2024, multiple Google Cloud services experienced a partial service outage in australia-southeast1-a for varying durations of up to 2 hours and 55 minutes. The full list of impacted products and services is detailed below. To our Google Cloud customers whose businesses were impacted during this outage, we sincerely apologize. This is not the level of quality and reliability we strive to offer you. Root CauseOn 8 May, at 18:44 US/Pacific, a public utility power issue resulted in an undervoltage condition followed by power loss that affected a portion of Google’s third-party data center in Sydney. As a result of this issue, the operating current exceeded the trip settings of the automatic transfer switch (ATS) units. ATS units have trip settings to protect the load from electrical faults. Additionally, ATS units are configured in pairs to provide a redundant power path to the critical load. In this case, both ATS units feeding the affected rows exceeded their trip settings due to overcurrent. Further investigation into the ATS units determined that they were configured with trip settings that were not in accordance with the site design. Remediation and PreventionGoogle engineers were alerted to the outage via internal monitoring on Wednesday, 8 May at 18:55 US/Pacific and immediately started an investigation. On-site data center operations were engaged at 19:00 US/Pacific, and the scope of the power loss was confirmed at 19:22 US/Pacific. On-site engineers restored power to the affected rows at 19:00 US/Pacific by manually closing breakers for both of the ATS units. On Wednesday, 8 May at19:42 US/Pacific, network connectivity for the affected racks began recovering. All services had recovered by 21:55 US/Pacific with the exception of a very small percentage of Persistent Disk devices which required manual intervention. On Thursday, 9 May at 07:47 US/Pacific, the public utility power issue was resolved. All power had been switched back to utility feeds on Thursday, 9 May at 09:26 US/Pacific. Google is committed preventing a repeat of this issue in the future and is completing the following actions:
Detailed Description of ImpactOn Wednesday 8 May, from 18:45 to 21:40 US/Pacific, multiple Google Cloud services experienced a partial service outage in the australia-southeast1-a zone. Persistent Disk:
Google Cloud Dataflow:
Google Cloud Pub/Sub:
Google BigQuery:
Google Compute Engine:
Cloud Filestore:
Virtual Private Cloud (VPC):
Cloud SQL:
Cloud Logging:
Cloud Bigtable:
Cloud Apigee:
Memorystore for Redis
Dialogflow
Google Kubernetes Engine
|
| 9 May 2024 | 10:58 PDT | Mini Incident ReportWe apologize for any inconvenience this service disruption/outage may have caused. We would like to provide some information about this incident below. Please note, this information is based on our best knowledge at the time of posting and is subject to change as our investigation continues. If you have experienced impact outside of what is listed below, please reach out to Google Cloud Support using https://cloud.google.com/support. (All Times US/Pacific) Incident Start: 8 May, 2024 18:45 Incident End: 8 May, 2024 21:40 Duration: 2 hrs 55 minutes Affected Services and Features:
Regions/Zones: australia-southeast1 Description: Multiple Google Cloud products experienced service disruptions of varying impact and duration, with the longest lasting being 2 hours and 55 minutes in the australia-southeast1 region. From preliminary analysis, the root cause of this incident is currently believed to be an unplanned power event caused by a power failover due to a utility company outage. Google will complete a full Incident Report in the following days that will provide a detailed root cause. Customer Impact:
Additional details: After service mitigation and full closure of the incident, there was continued Persistent Disk impact for a narrowed group of customers identified. This has since been resolved with no further isolated impact. |
| 8 May 2024 | 22:28 PDT | The issue with Apigee, Cloud Filestore, Cloud Logging, Google BigQuery, Google Cloud Bigtable, Google Cloud Dataflow, Google Cloud Pub/Sub, Google Cloud SQL, Google Compute Engine, Persistent Disk, Virtual Private Cloud (VPC) has been resolved for all affected users as of Wednesday, 2024-05-08 21:40 US/Pacific. We will publish an analysis of this incident once we have completed our internal investigation. We thank you for your patience while we worked on resolving the issue. |
| 8 May 2024 | 21:32 PDT | Summary: Multiple services impacted in australia-southeast1. Description: We are experiencing an issue with Persistent Disk, Google Cloud Dataflow, Google Cloud Pub/Sub, Google BigQuery, Google Compute Engine, Cloud Filestore, Virtual Private Cloud (VPC), Cloud logging, Cloud SQL, Cloud Bigtable, Apigee beginning at Wednesday, 2024-05-08 18:45 US/Pacific. Mitigation strategy has been identified. The services are now recovering. We will provide an update by Wednesday, 2024-05-08 23:00 US/Pacific with current details. Diagnosis: Multiple GCP services are experiencing issues in australia-southeast1 region. Persistent Disk: While most devices have restored their functionality, some users might encounter slow or unavailable devices. Google Cloud Dataflow: Users experienced issues for streaming jobs with Watermark increasing. The issue with Google Cloud Dataflow is mitigated at 2024-05-08 19:47:27 PDT. Google Cloud Pub/Sub: The PubSub impact is mitigated. Google BigQuery: The impacted users experienced failures with the bigquery jobs in the australia-southeast1 Region. The issue with Google Bigquery has been resolved for all the affected users as of Wednesday, 2024-05-08 21:13 US/Pacific. Google Compute Engine: VM’s went into repair for around 45 minutes and have started recovering. Cloud Filestore: The Filestore is partially recovered. However, a small subset of users would not able to access the NFS filestore in the australia-southeast1-a zone. Virtual Private Cloud (VPC): The impacted users may face delays while creating new VMs and packet loss / unreachability for existing VMs. Cloud SQL: A subset of the Cloud SQL users are experiencing errors when accessing their Cloud SQL database instances in the australia-southeast1-a zone. Cloud logging: All requests are failing at the send request step. The issue with Cloud logging has been resolved for all the affected users as of Wednesday, 2024-05-08 21:16:07 US/Pacific. Cloud Bigtable: Cloud Bigtable experienced a high error rate for 25 minutes in australia-southeast1-a due to a power event. The issue with Cloud Bigtable has been resolved for all the affected users as of Wednesday, 2024-05-08 20:08:30 US/Pacific. Apigee: There was a minor outage due to the GKE error which caused all of the nodes to restart. The GKE cluster is currently undergoing repair. This resulted in a 30 minute outage for the customer. The issue with Apigee has been resolved for all the affected users as of Wednesday 2024-05-08 20:34:47 US/Pacific. Workaround: None at this time. |
| 8 May 2024 | 20:29 PDT | Summary: Multiple services impacted in australia-southeast1. Description: We are experiencing an issue with Big Query, Google filestore, Cloud PubSub beginning at Wednesday, 2024-05-08 18:45 US/Pacific. Mitigation strategy has been identified. The services are now recovering. We will provide an update by Wednesday, 2024-05-08 21:30 US/Pacific with current details. We apologize to all who are affected by the disruption. Diagnosis: Multiple GCP services are experiencing issues in australia-southeast1 region. Persistent Disk: While most devices have restored their functionality, some users might encounter slow or unavailable devices. Google Cloud Dataflow: Users experienced issues for streaming jobs with Watermark increasing. The issue with Google Cloud Dataflow is mitigated at 2024-05-08 19:47:27 PDT. Google Cloud Pub/Sub: The PubSub impact is mitigated. Google BigQuery: The impacted users may experience failures with the bigquery jobs in the australia-southeast1 Region. Google Compute Engine: VM’s went into repair for around 45 minutes and have started recovering. The issue with the Compute Engine is mitigated at 2024-05-08 19:43:43 PDT. Cloud Filestore: The impacted customers are unable to access the NFS Filestores in the australia-southeast1-a Zone. Workaround: None at this time. |
- All times are US/Pacific