Service Health
Incident affecting AlloyDB for PostgreSQL, Artifact Registry, Cloud Developer Tools, Google Cloud Bigtable, Google Cloud Dataflow, Google Cloud Deploy, Google Cloud Networking, Google Cloud SQL, Google Compute Engine, Hybrid Connectivity, Identity and Access Management, Persistent Disk, Virtual Private Cloud (VPC)
Multiple GCP products impacted in us-west2 region / us-west2-a zone
Incident began at 2024-10-15 23:05 and ended at 2024-10-16 02:49 (all times are US/Pacific).
Previously affected location(s)
Los Angeles (us-west2)
Date | Time | Description | |
---|---|---|---|
| 18 Oct 2024 | 11:28 PDT | Incident ReportSummaryOn Wednesday, 16 October 2024, multiple Google Cloud products became unavailable in the us-west2-a zone / us-west2 region for a duration of 3 hours. To our affected customers whose business was impacted during this disruption, we sincerely apologize. This is not the level of quality and reliability we strive to offer you, and we are taking immediate steps to improve the platform’s performance and availability. Root CauseA mismatch in configuration between two components of Google’s internal cluster management system resulted in a failure of the cluster service discovery component. This failure was triggered by a software rollout within the cluster management system that was believed to be non-impactful to operating jobs. The service discovery component is a fundamental dependency of the cluster management system, and its failure resulted in failures of other infrastructure services hosted in the cluster. Those failures resulted in downstream impacts for various Google Cloud services which are dependent on the cluster management system. Remediation and PreventionOnce the mechanism of the fault in the internal lookup and task addressing system was identified, remediation was performed by correcting a file path, allowing the system to restart successfully. The cluster management and other downstream systems recovered once this system was back in operation. The rollout of the portion of the cluster management system that triggered the outage has been paused, and the specific trigger will be remediated before rollouts of that system resume. Additionally, an update to the lookup and task addressing system is being applied to prevent recurrence even if the problematic cluster management software is rolled out again. Detailed Description of ImpactOn 15 October, 2024 23:05 to 16 October, 2024 02:49 US/Pacific, multiple Google Cloud products became unavailable in the us-west2-a zone / us-west2 region for a duration of 3 hours, 44 minutes. Artifact RegistryAsynchronous API operations to create new repositories saw an increased failure rate. Cloud BuildIncreased errors in CreateBuild calls (up to 7% at peak), builds delayed for 2 hours, builds with shorter TTLs expired. Google Cloud DeployRender, predeploy, deploy, verify, and postdeploy operations in us-west2 were unable to complete. Google Cloud DataflowCustomers experienced latency (slowness) while running both batch and streaming jobs. Google Cloud BigtableSome resources in us-west2-a experienced an increase in CANCELED and UNAVAILABLE errors. Google Cloud SQLSome instance creation and backup operations failed. Cloud InterconnectCloud Interconnect attachments to us-west2 could not be created, updated, or deleted. Virtual Private CloudProgramming of new VMs from other regions was not reaching the impacted zone. This also resulted in cross region packet loss. Cloud Identity and Access ManagementReplication Delay of Sessions, Service Accounts, and other resources to the impacted zone Persistent DiskSnapshots intermittently failed in this zone. Compute EngineSome VMs in us-west2-a were unavailable; customers were unable to connect to them, modify them, or delete them. New VM creations in the zone were working and those VMs remained healthy. |
| 16 Oct 2024 | 07:48 PDT | # Mini Incident ReportWe apologize for the inconvenience this service outage may have caused. We would like to provide some information about this incident below. Please note, this information is based on our best knowledge at the time of posting and is subject to change as our investigation continues. If you have experienced impact outside of what is listed below, please reach out to Google Cloud Support using https://cloud.google.com/support (All Times US/Pacific) Incident Start: 15 October, 2024 23:05 Incident End: 16 October, 2024 02:49 Duration: 3 hours, 44 minutes Affected Services and Features:
Regions/Zones: Region us-west2 / Zone us-west2-a Description: Multiple Google Cloud products were unavailable in us-west2-a zone / us-west2 region for a duration of 3 hours, 44 minutes. From preliminary analysis, the root cause of the issue was due to a failure in an internal lookup and task addressing system. Its failure led to the unavailability of our internal cluster management system in the affected zone, impacting Google Cloud services dependent on it. Google Engineers implemented a fix to return the lookup and task addressing system to full operation and this mitigated the issue. Google will complete a full IR in the following days that will provide a full root cause. Customer Impact:
|
| 16 Oct 2024 | 02:59 PDT | The issue with Google Cloud SQL, Google Compute Engine, Virtual Private Cloud (VPC), Google Cloud Dataflow, Hybrid Connectivity, Google Cloud Networking, Identity and Access Management, Artifact Registry, Persistent Disk, Google Cloud Deploy, Google Cloud Bigtable, AlloyDB for PostgreSQL has been resolved for all affected users as of Wednesday, 2024-10-16 02:49 US/Pacific. We will publish an analysis of this incident once we have completed our internal investigation. We thank you for your patience while we worked on resolving the issue. |
| 16 Oct 2024 | 02:48 PDT | Summary: Multiple GCP products impacted in us-west2 region / us-west2-a zone Description: We are experiencing an issue with multiple GCP products. Initial mitigation steps have been successfully completed but still some work is underway by our engineering team. We will provide more information by Wednesday, 2024-10-16 03:30 US/Pacific. Diagnosis:
Workaround:
|
| 16 Oct 2024 | 02:32 PDT | Summary: Multiple GCP products impacted in us-west2 region / us-west2-a zone Diagnosis:
Workaround:
|
| 16 Oct 2024 | 02:25 PDT | Summary: Multiple GCP products impacted in us-west2 region / us-west2-a zone Description: We are experiencing an issue with multiple GCP products including Google Cloud SQL, Google Compute Engine, Virtual Private Cloud (VPC), Google Cloud Networking, Google Cloud Bigtable, Hybrid Connectivity, Identity and Access Management and Google Cloud Dataflow. Our engineering teams continue to remain fully engaged, and focusing on finding a mitigation. We will provide an update by Wednesday, 2024-10-16 03:00 US/Pacific with current details. Diagnosis:
Workaround:
|
| 16 Oct 2024 | 02:05 PDT | Summary: Multiple GCP products impacted in us-west2 region / us-west2-a zone Description: We are experiencing an issue with several GCP products including Google Cloud SQL, Google Compute Engine, Virtual Private Cloud (VPC), Google Cloud Networking, Cloud Bigtable, Hybrid Connectivity, Identity and Access Management and Google Cloud Dataflow. Our engineering teams continue to remain fully engaged, and focusing on finding a mitigation. We will provide an update by Wednesday, 2024-10-16 02:30 US/Pacific with current details. Diagnosis:
Workaround:
|
| 16 Oct 2024 | 01:48 PDT | Summary: Multiple GCP products impacted in us-west2 region / us-west2-a zone Description: We are experiencing an issue with several GCP products including Google Cloud SQL, Google Compute Engine, Virtual Private Cloud (VPC), Google Cloud Networking, Hybrid Connectivity, Identity and Access Management and Google Cloud Dataflow. Our engineering teams continue to remain fully engaged, and focusing on finding a mitigation. We will provide an update by Wednesday, 2024-10-16 02:30 US/Pacific with current details. Diagnosis:
Workaround:
|
- All times are US/Pacific