Service Health

This page provides status information on the services that are part of Google Cloud. Check back here to view the current status of the services listed below. If you are experiencing an issue not listed here, please contact Support. Learn more about what's posted on the dashboard in this FAQ. For additional information on these services, please visit https://cloud.google.com/.

Incident affecting Google Kubernetes Engine

GPU device plugin component version 0.1.11-gke.1 causing missing GPU features in 1.23 and 1.24

Incident began at 2023-07-11 15:05 and ended at 2023-07-12 18:47 (all times are US/Pacific).

Previously affected location(s)

Taiwan (asia-east1)Hong Kong (asia-east2)Tokyo (asia-northeast1)Osaka (asia-northeast2)Seoul (asia-northeast3)Mumbai (asia-south1)Delhi (asia-south2)Singapore (asia-southeast1)Jakarta (asia-southeast2)Sydney (australia-southeast1)Melbourne (australia-southeast2)Warsaw (europe-central2)Finland (europe-north1)Madrid (europe-southwest1)Belgium (europe-west1)Turin (europe-west12)London (europe-west2)Frankfurt (europe-west3)Netherlands (europe-west4)Zurich (europe-west6)Milan (europe-west8)Paris (europe-west9)Doha (me-central1)Tel Aviv (me-west1)Montréal (northamerica-northeast1)Toronto (northamerica-northeast2)São Paulo (southamerica-east1)Santiago (southamerica-west1)Iowa (us-central1)South Carolina (us-east1)Northern Virginia (us-east4)Columbus (us-east5)Dallas (us-south1)Oregon (us-west1)Los Angeles (us-west2)Salt Lake City (us-west3)Las Vegas (us-west4)

Date Time Description
12 Jul 2023 18:47 PDT

The issue with Google Kubernetes Engine has been resolved for all affected projects as of Wednesday, 2023-07-12 17:51 US/Pacific.

We thank you for your patience while we worked on resolving the issue.

11 Jul 2023 17:03 PDT

Summary: GPU device plugin component version 0.1.11-gke.1 causing missing GPU features in 1.23 and 1.24

Description: Mitigation work is currently underway by our engineering team.

We have identified a mitigation and started implementing it. The mitigation is expected to complete by Monday, 2023-07-17. We will continue to provide updates on any status changes. Permanent fix is expected to be released by 2023-07-19.

We will provide more information by Monday, 2023-07-17 13:00 US/Pacific.

Diagnosis: Customers impacted may not able to use time-sharing feature.

Workaround: Upgrade the cluster version to 1.25 and 1.26

11 Jul 2023 15:31 PDT

Summary: GPU device plugin component version 0.1.11-gke.1 causing missing GPU features in 1.23 and 1.24

Description: We are experiencing an issue with Google Kubernetes Engine.

The following GPU features are missing on clusters on 1.23 and 1.24 using component version 0.1.11-gke.1:

  • prometheus metrics library version upgrade
  • health checker for multi-instance GPU
  • nvidia-modest device configuration
  • GPU time-sharing

Our engineering team continues to investigate the issue.

We will provide an update by Tuesday, 2023-07-11 17:35 US/Pacific with current details.

We apologize to all who are affected by the disruption.

Diagnosis: None at this time.

Workaround: None at this time.

11 Jul 2023 15:05 PDT

Summary: GPU device plugin component version 0.1.11-gke.1 causing missing GPU features in 1.23 and 1.24

Description: We are experiencing an issue with Google Kubernetes Engine.

The following GPU features are missing on clusters on 1.23 and 1.24 using component version 0.1.11-gke.1:

  • prometheus metrics library version upgrade
  • health checker for multi-instance GPU
  • nvidia-modest device configuration
  • GPU time-sharing

Our engineering team continues to investigate the issue.

We will provide an update by Tuesday, 2023-07-11 15:35 US/Pacific with current details.

We apologize to all who are affected by the disruption.

Diagnosis: None at this time.

Workaround: None at this time.