Service Health

This page provides status information on the services that are part of Google Cloud. Check back here to view the current status of the services listed below. If you are experiencing an issue not listed here, please contact Support. Learn more about what's posted on the dashboard in this FAQ. For additional information on these services, please visit https://cloud.google.com/.

Incident affecting Google Kubernetes Engine

europe-central2, europe-west3: Elevated error rates for GKE control plane

Incident began at 2021-09-09 12:06 and ended at 2021-09-09 12:54 (all times are US/Pacific).

Date Time Description
9 Sep 2021 15:25 PDT

We apologize for the inconvenience this service disruption may have caused. We would like to provide some information about this incident below. Please note, this information is based on our best knowledge at the time of posting and is subject to change as our investigation continues. If you have experienced impact outside of what is listed below, please reach out to Google Support by opening a case using https://cloud.google.com/support

(All Times US/Pacific)

Incident Start: 09 September 2021 10:46

Incident End: 09 September 2021 12:44

Duration: 1 hour, 58 minutes

Affected Services and Features:

  • Google Kubernetes Engine (GKE) - Cluster operations

Regions/Zones: europe-west3 , europe-central2

Description:

Google Kubernetes Engine (GKE) experienced elevated error rates intermittently in europe-west3 and europe-central2 for a duration of 1 hour, 58 minutes. From preliminary analysis, the issue was caused by an ongoing rollout in the backend authentication service. The backend authentication service was rolled back, ending impact at 12:44.

Customer Impact:

  • Less than 0.5% of cluster operations that require an authentication token experienced elevated error rates.
  • Elevated error rates on GKE master pods that required authorization to change GCE resources.

Additional details:

A retry of the failed operation would have completed successfully.

9 Sep 2021 12:54 PDT

The issue with Google Kubernetes Engine has been resolved for all affected projects as of Thursday, 2021-09-09 12:53 US/Pacific.

There is no further impact to our customers.

We thank you for your patience while we worked on resolving the issue.

9 Sep 2021 12:48 PDT

Summary: europe-central2, europe-west3: Elevated error rates for GKE control plane

Description: Our engineering team is currently working on the mitigation of the issue.

The affected locations have reduced and updated for the incident post.

We do not have an ETA for mitigation at this point.

We will provide more information by Thursday, 2021-09-09 14:00 US/Pacific.

Diagnosis: Customers may experience issue with cluster availability and errors with CreateToken. Service account may intermittently succeed or fail.

Workaround: Customers who experience failure can retry.

9 Sep 2021 12:29 PDT

Summary: europe-central2, europe-west3, europe-west4, us-east4, asia-northeast1: Elevated error rates for GKE control plane

Description: We are experiencing an issue with Google Kubernetes Engine.

Our engineering team continues to investigate the issue.

We will provide an update by Thursday, 2021-09-09 14:00 US/Pacific with current details.

We apologize to all who are affected by the disruption.

Diagnosis: Customers may experience issue with cluster availability and errors with CreateToken.

Workaround: None at this time.

9 Sep 2021 12:18 PDT

Summary: europe-central2, europe-west3, europe-west4, us-east4, asia-northeast1: Elevated error rates for GKE control plane

Description: We are investigating a potential issue with Google Kubernetes Engine.

We will provide more information by Thursday, 2021-09-09 12:35 US/Pacific.

Diagnosis: We are currently investigating the impact of this incident.

Workaround: None at this time.

9 Sep 2021 12:06 PDT

Summary: europe-west3, europe-west4, us-east4, asia-northeast1: Elevated error rates for GKE control plane

Description: We are investigating a potential issue with Google Kubernetes Engine.

We will provide more information by Thursday, 2021-09-09 12:35 US/Pacific.

Diagnosis: We are currently investigating the impact of this incident.

Workaround: None at this time.