Service Health
Incident affecting Google Kubernetes Engine
Google Kubernetes Engine customers with Workload Identity enabled may see high application logging rate
Incident began at 2023-11-02 10:45 and ended at 2023-11-10 13:37 (all times are US/Pacific).
Previously affected location(s)
Taiwan (asia-east1)Hong Kong (asia-east2)Tokyo (asia-northeast1)Osaka (asia-northeast2)Seoul (asia-northeast3)Mumbai (asia-south1)Delhi (asia-south2)Singapore (asia-southeast1)Jakarta (asia-southeast2)Sydney (australia-southeast1)Melbourne (australia-southeast2)Warsaw (europe-central2)Finland (europe-north1)Madrid (europe-southwest1)Belgium (europe-west1)Berlin (europe-west10)Turin (europe-west12)London (europe-west2)Frankfurt (europe-west3)Netherlands (europe-west4)Zurich (europe-west6)Milan (europe-west8)Paris (europe-west9)Doha (me-central1)Dammam (me-central2)Tel Aviv (me-west1)Montréal (northamerica-northeast1)Toronto (northamerica-northeast2)São Paulo (southamerica-east1)Santiago (southamerica-west1)Iowa (us-central1)South Carolina (us-east1)Northern Virginia (us-east4)Columbus (us-east5)Dallas (us-south1)Oregon (us-west1)Los Angeles (us-west2)Salt Lake City (us-west3)Las Vegas (us-west4)
Date | Time | Description | |
---|---|---|---|
| 10 Nov 2023 | 13:37 PST | The issue with Google Kubernetes Engine has been resolved for all affected users as of Thursday, 2023-11-09 23:00 US/Pacific. We thank you for your patience while we worked on resolving the issue. |
| 8 Nov 2023 | 14:06 PST | Summary: Google Kubernetes Engine customers with Workload Identity enabled may see high application logging rate Description: There is no impact to cluster functionality or performance. The only impact is extra logs in Cloud Logging. gke-metadata-server is a GKE-managed system workload that is part of the GKE Workload Identity feature. Versions 0.4.272 to 0.4.280 of gke-metadata-server contain an incorrect configuration that results in a high rate of debug logs that contain the string "Unable to sync sandbox". These logs are then ingested into Cloud Logging, consuming Cloud Logging ingestion quota, and causing excess billable usage when exceeding the free monthly allotment. A rollout containing a fix to no longer ingest the excess logs to Cloud Logging is about 25% complete. We will provide an update by Friday, 2023-11-10 14:00 US/Pacific. Diagnosis: Customers can determine whether their cluster is impacted by inspecting the gke-metadata-server daemonset with kubectl get daemonset -n kube-system -l k8s-app=gke-metadata-server -o yaml and looking at the components.gke.io/component-version annotation in .spec.template.metadata.annotations. If the value is a version between 0.4.272 and 0.4.280 (inclusive), then the cluster is currently affected. Workaround:
|
| 6 Nov 2023 | 13:47 PST | Summary: Google Kubernetes Engine customers with Workload Identity enabled may see high application logging rate Description: There is no impact to cluster functionality or performance. The only impact is extra logs in Cloud Logging. gke-metadata-server is a GKE-managed system workload that is part of the GKE Workload Identity feature. Versions 0.4.272 to 0.4.280 of gke-metadata-server contain an incorrect configuration that results in a high rate of debug logs that contain the string "Unable to sync sandbox". These logs are then ingested into Cloud Logging, consuming Cloud Logging ingestion quota, and causing excess billable usage when exceeding the free monthly allotment. The root cause has been identified and we are starting the rollout of a fix across the fleet. We will provide more information by Wednesday, 2023-11-08 14:00 US/Pacific. Diagnosis: Customers can determine whether their cluster is impacted by inspecting the gke-metadata-server daemonset with kubectl get daemonset -n kube-system -l k8s-app=gke-metadata-server -o yaml and looking at the components.gke.io/component-version annotation in .spec.template.metadata.annotations. If the value is a version between 0.4.272 and 0.4.280 (inclusive), then the cluster is currently affected. Workaround:
|
| 3 Nov 2023 | 14:08 PDT | Summary: Google Kubernetes Engine customers using gke-metadata-server versions 0.4.272 to 0.4.280 may see high application logging rate Description: There is no impact to cluster functionality or performance. The only impact is extra logs in Cloud Logging. gke-metadata-server is a GKE-managed system workload that is part of the GKE Workload Identity feature. Versions 0.4.272 to 0.4.280 of gke-metadata-server contain an incorrect configuration that results in a high rate of debug logs that contain the string "Unable to sync sandbox". These logs are then ingested into Cloud Logging, consuming Cloud Logging ingestion quota, and causing excess billable usage when exceeding the free monthly allotment. The root cause has been identified and we are working on rolling out a fix across the fleet. Customers can determine whether their cluster is impacted by inspecting the gke-metadata-server daemonset with kubectl get daemonset -n kube-system -l k8s-app=gke-metadata-server -o yaml and looking at the components.gke.io/component-version annotation in .spec.template.metadata.annotations. If the value is a version between 0.4.272 and 0.4.280 (inclusive), then the cluster is currently affected. We will provide more information by Monday, 2023-11-06 14:00 US/Pacific. Diagnosis: * Impact is limited to gke-metadata-server versions 0.4.272 to 0.4.280
Workaround:
|
| 2 Nov 2023 | 14:09 PDT | Summary: Google Kubernetes Engine customers using gke-metadata-server versions 0.4.272 to 0.4.280 may see high application logging rate Description: gke-metadata-server is a GKE-managed system workload that is part of the GKE Workload Identity feature. This feature is opt-in on GKE Standard, and always enabled on GKE Autopilot. Versions 0.4.272 to 0.4.280 of gke-metadata-server contain a bug that results in a high rate of debug logs that contain the string "Unable to sync sandbox". These logs are then ingested into Cloud Logging, consuming Cloud Logging ingestion quota, and causing excess billable usage when exceeding the free monthly allotment. gke-metadata-server and customer applications that depend on it should continue to work, with some increased latency on metadata server calls. The root cause has been identified and we are working on rolling out a fix across the fleet. Currently, a fixed gke-metadata-server version is available on Rapid channel, by upgrading your cluster's control plane to 1.28.2-gke.1157000 or above. You can determine whether your cluster is impacted by inspecting the gke-metadata-server daemonset with kubectl get daemonset -n kube-system -l k8s-app=gke-metadata-server -o yaml and looking at the components.gke.io/component-version annotation in .spec.template.metadata.annotations. If the value is a version between 0.4.272 and 0.4.280 (inclusive), then your cluster is currently affected. We will provide more information by Friday, 2023-11-03 14:00 US/Pacific. Diagnosis: * Affected clusters should continue to work, with possible reduced service from gke-metadata-server because it is spending CPU on the excessive logging.
Workaround: None at this time. |
| 2 Nov 2023 | 13:32 PDT | Summary: Google Kubernetes Engine customers using gke-metadata-server versions 0.4.272 to 0.4.280 may see high application logging rate Description: We are experiencing an issue with Google Kubernetes Engine. Our engineering team continues to investigate the issue. We will provide an update by Thursday, 2023-11-02 15:00 US/Pacific with current details. We apologize to all who are affected by the disruption. Diagnosis: None at this time. Workaround: None at this time. |
- All times are US/Pacific