Service Health

This page provides status information on the services that are part of Google Cloud. Check back here to view the current status of the services listed below. If you are experiencing an issue not listed here, please contact Support. Learn more about what's posted on the dashboard in this FAQ. For additional information on these services, please visit https://cloud.google.com/.

Incident affecting Google Kubernetes Engine

us-central1-c: GKE experiencing issues with some cluster and nodepool operations. Issue Mitigated.

Incident began at 2021-10-19 11:40 and ended at 2021-10-19 16:21 (all times are US/Pacific).

Date Time Description
21 Oct 2021 14:56 PDT

We apologize for the inconvenience this service outage may have caused. We would like to provide some information about this incident below. Please note, this information is based on our best knowledge at the time of posting and is subject to change as our investigation continues. If you have experienced impact outside of what is listed below, please reach out to Google Support by opening a case using https://cloud.google.com/support

(All Times US/Pacific)

First Impact

Incident Start: 19 October 2021 11:40

Incident End: 19 October 2021 16:21

Duration: 4 hours, 41 minutes

Second Impact

Incident Start: 20 October 2021 02:00

Incident End: 20 October 2021 09:30

Duration: 7 hours, 30 minutes

Affected Services and Features:

Google Kubernetes Engine

Regions/Zones: us-central1-c

Description:

Google Kubernetes Engine experienced two impacts to operations, the first on 19 October 2021 and the second on 20 October 2021.

First Impact: Customers may have experienced up to 100% failure rate for create-cluster, delete-cluster, delete-nodepool operations and node-pool resizes in us-central1-c for 4 hours and 41 minutes. From preliminary analysis, the root cause was resource contention related to an unexpected increase in API operations. Engineers scaled up instances to mitigate the issue.

Second Impact: Customers may have experienced up to 80% failure for create-cluster, delete-cluster, delete-nodepool operations and node-pool resizes in us-central1-c for 7 hours and 30 minutes. From preliminary analysis, the root case was created by an unexpected increase of nodepool operations in a single customer cluster. Engineers mitigated the issue through additional quota enforcement.

Customer Impact:

Customers affected would have experienced 500/503 errors for create-cluster, delete-cluster, create-nodepool, delete-nodepool, and node-pool resizes.

19 Oct 2021 16:40 PDT

The issue with GKE cluster and node pool operations, has been resolved for all affected projects as of Tuesday, 2021-10-19 16:21 US/Pacific.

We thank you for your patience while we worked on resolving the issue.

19 Oct 2021 16:25 PDT

Summary: us-central1-c: GKE experiencing issues with some cluster and nodepool operations. Mitigation underway.

Description: Mitigation work is currently underway by our engineering team.

We do not have an ETA for mitigation at this point.

We will provide more information by Tuesday, 2021-10-19 17:00 US/Pacific.

Diagnosis: The following operations may fail with 500/503 errors: create-cluster, delete-cluster, create-nodepool, delete-nodepool, and node-pool resizes

Workaround: The failed operations may succeed on retrying them

19 Oct 2021 15:32 PDT

Summary: us-central1-c: GKE experiencing issues with some cluster and nodepool operations. Investigations ongoing.

Description: Our engineering team continues to investigate the issue with GKE at us-central1-c, starting Tuesday, 2021-10-19 11:40 US/Pacific

We will provide an update by Tuesday, 2021-10-19 16:30 US/Pacific with latest details.

Diagnosis: The following operations may fail with 500/503 errors: create-cluster, delete-cluster, create-nodepool, delete-nodepool, and node-pool resizes

Workaround: The failed operations may succeed on retrying them

19 Oct 2021 14:50 PDT

Summary: us-central1-c: GKE experiencing issues with some cluster and nodepool operations. Investigations ongoing.

Description: Our engineering team continues to investigate the issue with GKE at us-centra1-c, starting Tuesday, 2021-10-19 11:40 US/Pacific

We will provide an update by Tuesday, 2021-10-19 15:30 US/Pacific with latest details.

Diagnosis: The following operations may fail with 500/503 errors: create-cluster, delete-cluster, create-nodepool, delete-nodepool, and node-pool resizes

Workaround: The failed operations may succeed on retrying them

19 Oct 2021 14:28 PDT

Summary: us-central1-c: GKE experiencing issues with some cluster and nodepool operations

Description: We are experiencing an issue with Google Kubernetes Engine beginning at Tuesday, 2021-10-19 11:40 US/Pacific.

Our engineering team continues to investigate the issue.

Existing workloads should continue to work while attempts to resize nodepools, create or delete nodepools and clusters fail

We will provide an update by Tuesday, 2021-10-19 15:00 US/Pacific with current details.

We apologize to all who are affected by the disruption.

Diagnosis: create-cluster, delete-cluster, create-nodepool, and delete-nodepool, node-pool resizes operations fail with 500 and 503 errors.

Workaround: The failed operations may succeed on retrying them

19 Oct 2021 14:06 PDT

Summary: us-central1-c: Create / Delete Cluster and NodePool operations failing

Description: We are experiencing an issue with Google Kubernetes Engine beginning at Tuesday, 2021-10-19 11:40 US/Pacific.

Our engineering team continues to investigate the issue.

We will provide an update by Tuesday, 2021-10-19 14:35 US/Pacific with current details.

We apologize to all who are affected by the disruption.

Diagnosis: create-cluster, delete-cluster, create-nodepool, and delete-nodepool, node-pool resizes operations are failing with 500 and 503 errors.

Workaround: The failed operations may succeed on retrying them