Service Health

This page provides status information on the services that are part of Google Cloud. Check back here to view the current status of the services listed below. If you are experiencing an issue not listed here, please contact Support. Learn more about what's posted on the dashboard in this FAQ. For additional information on these services, please visit https://cloud.google.com/.

Incident affecting AlloyDB for PostgreSQL, Artifact Registry, Cloud Developer Tools, Google Cloud Bigtable, Google Cloud Dataflow, Google Cloud Deploy, Google Cloud Networking, Google Cloud SQL, Google Compute Engine, Hybrid Connectivity, Identity and Access Management, Persistent Disk, Virtual Private Cloud (VPC)

Multiple GCP products impacted in us-west2 region / us-west2-a zone

Incident began at 2024-10-15 23:05 and ended at 2024-10-16 02:49 (all times are US/Pacific).

Previously affected location(s)

Los Angeles (us-west2)

Date Time Description
18 Oct 2024 11:28 PDT

Incident Report

Summary

On Wednesday, 16 October 2024, multiple Google Cloud products became unavailable in the us-west2-a zone / us-west2 region for a duration of 3 hours.

To our affected customers whose business was impacted during this disruption, we sincerely apologize. This is not the level of quality and reliability we strive to offer you, and we are taking immediate steps to improve the platform’s performance and availability.

Root Cause

A mismatch in configuration between two components of Google’s internal cluster management system resulted in a failure of the cluster service discovery component. This failure was triggered by a software rollout within the cluster management system that was believed to be non-impactful to operating jobs. The service discovery component is a fundamental dependency of the cluster management system, and its failure resulted in failures of other infrastructure services hosted in the cluster. Those failures resulted in downstream impacts for various Google Cloud services which are dependent on the cluster management system.

Remediation and Prevention

Once the mechanism of the fault in the internal lookup and task addressing system was identified, remediation was performed by correcting a file path, allowing the system to restart successfully. The cluster management and other downstream systems recovered once this system was back in operation.

The rollout of the portion of the cluster management system that triggered the outage has been paused, and the specific trigger will be remediated before rollouts of that system resume. Additionally, an update to the lookup and task addressing system is being applied to prevent recurrence even if the problematic cluster management software is rolled out again.

Detailed Description of Impact

On 15 October, 2024 23:05 to 16 October, 2024 02:49 US/Pacific, multiple Google Cloud products became unavailable in the us-west2-a zone / us-west2 region for a duration of 3 hours, 44 minutes.

Artifact Registry

Asynchronous API operations to create new repositories saw an increased failure rate.

Cloud Build

Increased errors in CreateBuild calls (up to 7% at peak), builds delayed for 2 hours, builds with shorter TTLs expired.

Google Cloud Deploy

Render, predeploy, deploy, verify, and postdeploy operations in us-west2 were unable to complete.

Google Cloud Dataflow

Customers experienced latency (slowness) while running both batch and streaming jobs.

Google Cloud Bigtable

Some resources in us-west2-a experienced an increase in CANCELED and UNAVAILABLE errors.

Google Cloud SQL

Some instance creation and backup operations failed.

Cloud Interconnect

Cloud Interconnect attachments to us-west2 could not be created, updated, or deleted.

Virtual Private Cloud

Programming of new VMs from other regions was not reaching the impacted zone. This also resulted in cross region packet loss.

Cloud Identity and Access Management

Replication Delay of Sessions, Service Accounts, and other resources to the impacted zone

Persistent Disk

Snapshots intermittently failed in this zone.

Compute Engine

Some VMs in us-west2-a were unavailable; customers were unable to connect to them, modify them, or delete them. New VM creations in the zone were working and those VMs remained healthy.

16 Oct 2024 07:48 PDT

# Mini Incident Report

We apologize for the inconvenience this service outage may have caused. We would like to provide some information about this incident below. Please note, this information is based on our best knowledge at the time of posting and is subject to change as our investigation continues. If you have experienced impact outside of what is listed below, please reach out to Google Cloud Support using https://cloud.google.com/support

(All Times US/Pacific)

Incident Start: 15 October, 2024 23:05

Incident End: 16 October, 2024 02:49

Duration: 3 hours, 44 minutes

Affected Services and Features:

  • Artifact Registry
  • Cloud Build
  • Google Cloud Deploy
  • Google Cloud Dataflow
  • Google Cloud Bigtable
  • Google Cloud SQL
  • Cloud Interconnect
  • Virtual Private Cloud (VPC)
  • Cloud Identity and Access Management
  • Google Compute Engine
  • Persistent Disk

Regions/Zones: Region us-west2 / Zone us-west2-a

Description:

Multiple Google Cloud products were unavailable in us-west2-a zone / us-west2 region for a duration of 3 hours, 44 minutes. From preliminary analysis, the root cause of the issue was due to a failure in an internal lookup and task addressing system. Its failure led to the unavailability of our internal cluster management system in the affected zone, impacting Google Cloud services dependent on it.

Google Engineers implemented a fix to return the lookup and task addressing system to full operation and this mitigated the issue.

Google will complete a full IR in the following days that will provide a full root cause.

Customer Impact:

  • Artifact Registry : Asynchronous API operations to create new repositories saw an increased failure rate.
  • Cloud Build : Increased errors in CreateBuild calls (up to 7% at peak), builds delayed for 2 hours, builds with shorter TTLs expired.
  • Google Cloud Deploy : Render, predeploy, deploy, verify, and postdeploy operations in us-west2 were unable to complete.
  • Google Cloud Dataflow : Customers experienced latency (slowness) while running both batch and streaming jobs.
  • Google Cloud Bigtable : Clusters in us-west2-a experienced an increase in CANCELED and UNAVAILABLE errors and admin operations were failing.
  • Google Cloud SQL : Some instance creation and backup operations failed.
  • Cloud Interconnect : Cloud Interconnect attachments to us-west2 could not be created, updated, or deleted.
  • Virtual Private Cloud : Programming of new VMs from other regions was not reaching the impacted zone. This also resulted in cross region packet loss.
  • Cloud Identity and Access Management : Replication Delay of Sessions, Service Accounts, and other resources to the impacted zone
  • Persistent Disk : Snapshots intermittently failed in this zone.
  • Compute Engine : Some VMs in us-west2-a were unavailable; customers were unable to connect to them, modify them, or delete them. New VM creations in the zone were working and those VMs remained healthy.

16 Oct 2024 02:59 PDT

The issue with Google Cloud SQL, Google Compute Engine, Virtual Private Cloud (VPC), Google Cloud Dataflow, Hybrid Connectivity, Google Cloud Networking, Identity and Access Management, Artifact Registry, Persistent Disk, Google Cloud Deploy, Google Cloud Bigtable, AlloyDB for PostgreSQL has been resolved for all affected users as of Wednesday, 2024-10-16 02:49 US/Pacific.

We will publish an analysis of this incident once we have completed our internal investigation.

We thank you for your patience while we worked on resolving the issue.

16 Oct 2024 02:48 PDT

Summary: Multiple GCP products impacted in us-west2 region / us-west2-a zone

Description: We are experiencing an issue with multiple GCP products.

Initial mitigation steps have been successfully completed but still some work is underway by our engineering team.

We will provide more information by Wednesday, 2024-10-16 03:30 US/Pacific.

Diagnosis:

  • Google Cloud SQL: Some operations such as instance create and backup may fail.
  • Cloud Interconnect: [mitigated] Cloud Interconnect attachments to us-west2 cannot be created, updated, or deleted.
  • Google Compute Engine: [mitigated] Customers will see some of their VMs unavailable in us-west2-a, where they will not be able to SSH, modify or delete them. New VM creations are working for the zone and those VMs will be healthy.
  • Google Cloud Bigtable: Increase in CANCELED and UNAVAILABLE errors for some resources in us-west2-a
  • Google Cloud Dataflow: [mitigated] Impacted customers may have experienced latency (slowness) while running both batch and streaming jobs. This has been mitigated and no further impact is expected.
  • Artifact Registry (AR): Asynchronous API operations to create new AR repositories saw an increased failure rate
  • Persistent Disk: Snapshots might fail in this zone.
  • Cloud Deploy: Cloud Deploy's render, predeploy, deploy, verify and postdeploy operations in us-west2 are unable to complete.
  • Cloud Pub/Sub: [mitigated] The impact for Pub/sub is already mitigated as traffic is diverted away from the impacted location.
  • Identity and Access Management: [mitigated] The impact is already mitigated as traffic is diverted away from the impacted location.

Workaround:

  • Google Cloud SQL: Users can perform these operations in a region other than us-west2 for new instance creation. There is no workaround for backups on existing instances at this time.
  • Message Streams: Users can create new VMs. Current VMs are unavailable
  • Google Cloud Bigtable: Route traffic from resources is us-west2-a to another replicated resources in different zone
16 Oct 2024 02:32 PDT

Summary: Multiple GCP products impacted in us-west2 region / us-west2-a zone

Diagnosis:

  • Google Cloud SQL: Some operations such as instance create and backup may fail.
  • Cloud Interconnect: Cloud Interconnect attachments to us-west2 cannot be created, updated, or deleted.
  • Google Compute Engine: [mitigated] Customers will see some of their VMs unavailable in us-west2-a, where they will not be able to SSH, modify or delete them. New VM creations are working for the zone and those VMs will be healthy.
  • Google Cloud Bigtable: Increase in CANCELED and UNAVAILABLE errors for some resources in us-west2-a
  • Google Cloud Dataflow: Impacted customers may have experienced latency (slowness) while running both batch and streaming jobs. This has been mitigated and no further impact is expected.
  • Artifact Registry (AR): Asynchronous API operations to create new AR repositories saw an increased failure rate
  • Persistent Disk: Snapshots might fail in this zone.
  • Cloud Deploy: Cloud Deploy's render, predeploy, deploy, verify and postdeploy operations in us-west2 are unable to complete.
  • Cloud Pub/Sub: The impact for Pub/sub is already mitigated as traffic is diverted away from the impacted location.
  • Identity and Access Management: The impact is already mitigated as traffic is diverted away from the impacted location
  • Google Cloud Networking:
  • Virtual Private Cloud (VPC):

Workaround:

  • Google Cloud SQL: Users can perform these operations in a region other than us-west2 for new instance creation. There is no workaround for backups on existing instances at this time.
  • Message Streams: Users can create new VMs. Current VMs are unavailable
  • Google Cloud Bigtable: Route traffic from resources is us-west2-a to another replicated resources in different zone
16 Oct 2024 02:25 PDT

Summary: Multiple GCP products impacted in us-west2 region / us-west2-a zone

Description: We are experiencing an issue with multiple GCP products including Google Cloud SQL, Google Compute Engine, Virtual Private Cloud (VPC), Google Cloud Networking, Google Cloud Bigtable, Hybrid Connectivity, Identity and Access Management and Google Cloud Dataflow. Our engineering teams continue to remain fully engaged, and focusing on finding a mitigation. We will provide an update by Wednesday, 2024-10-16 03:00 US/Pacific with current details.

Diagnosis:

  • Google Cloud SQL: Some operations such as instance create and backup may fail.
  • Cloud Interconnect: Cloud Interconnect attachments to us-west2 cannot be created, updated, or deleted.
  • Google Compute Engine: Customers will see some of their VMs unavailable in us-west2-a, where they will not be able to SSH, modify or delete them. New VM creations are working for the zone and those VMs will be healthy.
  • Google Cloud Bigtable: Increase in CANCELED and UNAVAILABLE errors for some resources in us-west2-a
  • Google Cloud Dataflow: Impacted customers may have experienced latency (slowness) while running both batch and streaming jobs. This has been mitigated and no further impact is expected.
  • Artifact Registry (AR): Asynchronous API operations to create new AR repositories saw an increased failure rate
  • Persistent Disk: Snapshots might fail in this zone.
  • Cloud Deploy: Cloud Deploy's render, predeploy, deploy, verify and postdeploy operations in us-west2 are unable to complete.
  • Pub/Sub: The impact for Pub/sub is already mitigated as traffic is diverted away from the impacted location.

Workaround:

  • CloudSQL: Users can perform these operations in a region other than us-west2 for new instance creation. There is no workaround for backups on existing instances at this time.
  • Message Streams: Users can create new VMs. Current VMs are unavailable
  • Google Cloud Bigtable: Route traffic from resources is us-west2-a to another replicated resources in different zone
16 Oct 2024 02:05 PDT

Summary: Multiple GCP products impacted in us-west2 region / us-west2-a zone

Description: We are experiencing an issue with several GCP products including Google Cloud SQL, Google Compute Engine, Virtual Private Cloud (VPC), Google Cloud Networking, Cloud Bigtable, Hybrid Connectivity, Identity and Access Management and Google Cloud Dataflow. Our engineering teams continue to remain fully engaged, and focusing on finding a mitigation. We will provide an update by Wednesday, 2024-10-16 02:30 US/Pacific with current details.

Diagnosis:

  • Google Cloud SQL: Some instance operations such as instance create and backup may fail.
  • Cloud Interconnect: Cloud Interconnect attachments to us-west2 cannot be created, updated, or deleted.
  • Google Compute Engine: Customers will see some of their VMs unavailable in us-west2-a, where they will not be able to SSH, modify or delete them. New VM creations are working for the zone and those VMs will be healthy..
  • Google Cloud Bigtable: Increase in CANCELED and UNAVAILABLE errors for some resources in us-west2-a
  • Google Cloud Dataflow: Impacted customers may experience latency (slowness) while running both batch jobs. Streaming jobs are no longer affected.
  • Artifact Registry: Asynchronous API operations to create new AR repositories saw an increased failure rate, up to 5% average
  • Persistent Disk: Snapshots might fail in this zone.
  • Cloud Deploy: Cloud Deploy's render, predeploy, deploy, verify, and postdeploy operations in us-west2 are unable to complete.
  • Pub/Sub: The impact for Pub/sub is already mitigated as traffic is diverted away from the impacted location

Workaround:

  • CloudSQL: Users can perform these operations in a region other than us-west2
  • Message Streams: Users can create new VMs. Current VMs are unavailable
  • Google Cloud Bigtable: Route traffic from resources is us-west2-a to another replicated resources in different zone
16 Oct 2024 01:48 PDT

Summary: Multiple GCP products impacted in us-west2 region / us-west2-a zone

Description: We are experiencing an issue with several GCP products including Google Cloud SQL, Google Compute Engine, Virtual Private Cloud (VPC), Google Cloud Networking, Hybrid Connectivity, Identity and Access Management and Google Cloud Dataflow.

Our engineering teams continue to remain fully engaged, and focusing on finding a mitigation. We will provide an update by Wednesday, 2024-10-16 02:30 US/Pacific with current details.

Diagnosis:

  • Google Cloud SQL: Some instance operations such as instance create and backup may fail.
  • Cloud Interconnect: Cloud Interconnect attachments to us-west2 cannot be created, updated, or deleted.
  • Google Compute Engine: Customers will see some of their VMs unavailable in us-west2-a, where they will not be able to SSH, modify or delete them. New VM creations are working for the zone and those VMs will be healthy..
  • Google Cloud Bigtable: Increase in CANCELED and UNAVAILABLE errors for some resources in us-west2-a
  • Google Cloud Dataflow Impacted customers may experience latency (slowness) while running both batch jobs. Streaming jobs are no longer affected.
  • Artifact Registry: Asynchronous API operations to create new AR repositories saw an increased failure rate, up to 5% average
  • Persistent Disk: Snapshots might fail in this zone.
  • Cloud Deploy: Cloud Deploy's render, predeploy, deploy, verify, and postdeploy operations in us-west2 are unable to complete.
  • Pub/Sub: The impact for Pub/sub is already mitigated as traffic is diverted away from the impacted location

Workaround:

  • CloudSQL: Users can perform these operations in a region other than us-west2
  • Message Streams: Users can create new VMs. Current VMs are unavailable
  • Google Cloud Bigtable: Route traffic from resources is us-west2-a to another replicated resources in different zone