Google Cloud Service Health

Google Cloud Service Health
Incidents
Global: Calico enabled GKE clusters’ pods may get stuck Terminating or Pending after upgrading to 1.22+

This page provides status information on the services that are part of Google Cloud. Check back here to view the current status of the services listed below. If you are experiencing an issue not listed here, please contact Support. Learn more about what's posted on the dashboard in this FAQ. For additional information on these services, please visit https://cloud.google.com/.

Available
Service information
Service disruption
Service outage

Incident affecting Google Kubernetes Engine

Global: Calico enabled GKE clusters’ pods may get stuck Terminating or Pending after upgrading to 1.22+

Incident began at 2022-09-29 14:53 and ended at 2022-10-01 15:24 (all times are US/Pacific).

Previously affected location(s)

Taiwan (asia-east1)Hong Kong (asia-east2)Tokyo (asia-northeast1)Osaka (asia-northeast2)Seoul (asia-northeast3)Mumbai (asia-south1)Delhi (asia-south2)Singapore (asia-southeast1)Jakarta (asia-southeast2)Sydney (australia-southeast1)Melbourne (australia-southeast2)Warsaw (europe-central2)Finland (europe-north1)Madrid (europe-southwest1)Belgium (europe-west1)London (europe-west2)Frankfurt (europe-west3)Netherlands (europe-west4)Zurich (europe-west6)Milan (europe-west8)Paris (europe-west9)Montréal (northamerica-northeast1)Toronto (northamerica-northeast2)São Paulo (southamerica-east1)Santiago (southamerica-west1)Iowa (us-central1)South Carolina (us-east1)Northern Virginia (us-east4)Columbus (us-east5)Dallas (us-south1)Oregon (us-west1)Los Angeles (us-west2)Salt Lake City (us-west3)Las Vegas (us-west4)

Date	Time	Description
1 Oct 2022	15:24 PDT	The issue with Google Kubernetes Engine has been resolved as of Saturday, 2022-10-01 01:30 US/Pacific. The fix is now available in all locations in the following GKE versions, GKE v1.24.4-gke.500+, 1.23.11-gke.300+, and 1.22.14-gke.300+. Customers can manually upgrade to the fixed version, or, clusters on the RAPID, REGULAR or STABLE release channels using 1.22, 1.23 or 1.24 will upgrade automatically over coming weeks. We thank you for your patience while we worked on resolving the issue.
1 Oct 2022	07:10 PDT	Summary: Global: Calico enabled GKE clusters’ pods may get stuck Terminating or Pending after upgrading to 1.22+ Description: The following GKE versions are vulnerable to a race condition when using the Calico Network Policy, resulting in pods stuck Terminating or Pending: All 1.22 GKE versions All 1.23 GKE versions 1.24 versions before 1.24.4-gke.800 Only a small number of GKE clusters have actually experienced stuck pods. Use of cluster autoscaler can increase the chance of hitting the race condition. A fix is available in GKE v1.24.4-gke.800 or later. The fix is also being made available in v1.23 and v1.22, as part of the next release, which has now started. Once available, customers can manually upgrade to the fixed version. Or, Clusters on the RAPID, REGULAR or STABLE release channels using 1.22 or 1.23 will upgrade automatically over coming weeks. We will provide an update by Monday, 2022-10-03 10:00 US/Pacific with current details. The issue was introduced in the Calico component, and GKE has been working closely with the Calico project to produce a fix. We apologize to all who are affected by the disruption. Diagnosis: The Calico CNI plugin shows the following error terminating Pods: “Warning FailedKillPod 36m (x389 over 121m) kubelet error killing pod: failed to "KillPodSandbox" for "af9ab8f9-d6d6-4828-9b8c-a58441dd1f86" with KillPodSandboxError: "rpc error: code = Unknown desc = networkPlugin cni failed to teardown pod "myclient-pod-6474c76996" network: error getting ClusterInformation: connection is unauthorized: Unauthorized" Workaround: Customers currently experiencing the issue, are requested to take one of the following actions: [Recommended] Manually upgrade to GKE v1.24.4-gke.800 or later (if viable), or reach out to Google Cloud Support to have an internal patch applied Restart the kubelet and calico-node to get the pods unstuck.
30 Sep 2022	19:41 PDT	Summary: Global: Calico enabled GKE clusters’ pods may get stuck Terminating or Pending after upgrading to 1.22+ Description: The following GKE versions are vulnerable to a race condition when using the Calico Network Policy, resulting in pods stuck Terminating or Pending: All 1.22 GKE versions All 1.23 GKE versions 1.24 versions before 1.24.4-gke.800 Only a small number of GKE clusters have actually experienced stuck pods. Use of cluster autoscaler can increase the chance of hitting the race condition. A fix is available in GKE v1.24.4-gke.800 or later. The fix is also being made available in v1.23 and v1.22, as part of the next release, which has now started. Once available, customers can manually upgrade to the fixed version. Or, Clusters on the RAPID, REGULAR or STABLE release channels using 1.22 or 1.23 will upgrade automatically over coming weeks. We will provide an update by Saturday, 2022-10-01 08:00 US/Pacific with current details. The issue was introduced in the Calico component, and GKE has been working closely with the Calico project to produce a fix. We apologize to all who are affected by the disruption. Diagnosis: The Calico CNI plugin shows the following error terminating Pods: “Warning FailedKillPod 36m (x389 over 121m) kubelet error killing pod: failed to "KillPodSandbox" for "af9ab8f9-d6d6-4828-9b8c-a58441dd1f86" with KillPodSandboxError: "rpc error: code = Unknown desc = networkPlugin cni failed to teardown pod "myclient-pod-6474c76996" network: error getting ClusterInformation: connection is unauthorized: Unauthorized" Workaround: Customers currently experiencing the issue, are requested to take one of the following actions: [Recommended] Manually upgrade to GKE v1.24.4-gke.800 or later (if viable), or reach out to Google Cloud Support to have an internal patch applied Restart the kubelet and calico-node to get the pods unstuck.
30 Sep 2022	15:27 PDT	Summary: Global: Calico enabled GKE clusters’ pods may get stuck Terminating or Pending after upgrading to 1.22+ Description: The following GKE versions are vulnerable to a race condition when using the Calico Network Policy, resulting in pods stuck Terminating or Pending: All 1.22 GKE versions All 1.23 GKE versions 1.24 versions before 1.24.4-gke.800 Only a small number of GKE clusters have actually experienced stuck pods. Use of cluster autoscaler can increase the chance of hitting the race condition. A fix is available in GKE v1.24.4-gke.800 or later. The fix is also being made available in v1.23 and v1.22, as part of the next release, which has now started. Once available, customers can manually upgrade to the fixed version. Or, Clusters on the RAPID, REGULAR or STABLE release channels using 1.22 or 1.23 will upgrade automatically over coming weeks. We will provide an update by Friday, 2022-09-30 20:00 US/Pacific with current details. The issue was introduced in the Calico component, and GKE has been working closely with the Calico project to produce a fix. We apologize to all who are affected by the disruption. Diagnosis: The Calico CNI plugin shows the following error terminating Pods: “Warning FailedKillPod 36m (x389 over 121m) kubelet error killing pod: failed to "KillPodSandbox" for "af9ab8f9-d6d6-4828-9b8c-a58441dd1f86" with KillPodSandboxError: "rpc error: code = Unknown desc = networkPlugin cni failed to teardown pod "myclient-pod-6474c76996" network: error getting ClusterInformation: connection is unauthorized: Unauthorized" Workaround: Customers currently experiencing the issue, are requested to take one of the following actions: [Recommended] Manually upgrade to GKE v1.24.4-gke.800 or later (if viable), or reach out to Google Cloud Support to have an internal patch applied Restart the kubelet and calico-node to get the pods unstuck.
30 Sep 2022	14:57 PDT	Summary: Global: Calico enabled GKE clusters’ pods may get stuck Terminating or Pending after upgrading to 1.22+ Description: The following GKE versions are vulnerable to a race condition when using the Calico Network Policy, resulting in pods stuck Terminating or Pending: All 1.22 GKE versions All 1.23 GKE versions 1.24 versions before 1.24.4-gke.800 Only a small number of GKE clusters have actually experienced stuck pods. Use of cluster autoscaler can increase the chance of hitting the race condition. A fix is available in GKE v1.24.4-gke.800 or later. The fix is also being made available in v1.23 and v1.22, as part of the next release, which has now started. Once available, customers can manually upgrade to the fixed version. Or, Clusters on the RAPID, REGULAR or STABLE release channels using 1.22 or 1.23 will upgrade automatically over coming weeks. We will provide an update by Friday, 2022-09-30 17:00 US/Pacific with current details. The issue was introduced in the Calico component, and GKE has been working closely with the Calico project to produce a fix. We apologize to all who are affected by the disruption. Diagnosis: The Calico CNI plugin shows the following error terminating Pods: “Warning FailedKillPod 36m (x389 over 121m) kubelet error killing pod: failed to "KillPodSandbox" for "af9ab8f9-d6d6-4828-9b8c-a58441dd1f86" with KillPodSandboxError: "rpc error: code = Unknown desc = networkPlugin cni failed to teardown pod "myclient-pod-6474c76996" network: error getting ClusterInformation: connection is unauthorized: Unauthorized" Workaround: Customers currently experiencing the issue, are requested to take one of the following actions: [Recommended] Manually upgrade to GKE v1.24.4-gke.800 or later (if viable), or reach out to Google Cloud Support to have an internal patch applied Restart the kubelet and calico-node to get the pods unstuck.
30 Sep 2022	08:35 PDT	Summary: Global: Calico enabled GKE clusters’ pods may get stuck Terminating or Pending after upgrading to 1.22+ Description: The following GKE versions are vulnerable to a race condition when using the Calico Network Policy, resulting in pods stuck Terminating or Pending: All 1.22 GKE versions All 1.23 GKE versions 1.24 versions before 1.24.4-gke.800 Only a small number of GKE clusters have actually experienced stuck pods. Use of cluster autoscaler can increase the chance of hitting the race condition. A fix is available in GKE v1.24.4-gke.800 or later. The fix is also being made available in v1.23 and v1.22, as part of the next release, which has now started. Once available, customers can manually upgrade to the fixed version. Or, Clusters on the RAPID, REGULAR or STABLE release channels using 1.22 or 1.23 will upgrade automatically over coming weeks. We will provide an update by Friday, 2022-09-30 15:10 US/Pacific with current details. The issue was introduced in the Calico component, and GKE has been working closely with the Calico project to produce a fix. We apologize to all who are affected by the disruption. Diagnosis: The Calico CNI plugin shows the following error terminating Pods: “Warning FailedKillPod 36m (x389 over 121m) kubelet error killing pod: failed to "KillPodSandbox" for "af9ab8f9-d6d6-4828-9b8c-a58441dd1f86" with KillPodSandboxError: "rpc error: code = Unknown desc = networkPlugin cni failed to teardown pod "myclient-pod-6474c76996" network: error getting ClusterInformation: connection is unauthorized: Unauthorized" Workaround: Customers currently experiencing the issue, are requested to take one of the following actions: [Recommended] Manually upgrade to GKE v1.24.4-gke.800 or later (if viable), or reach out to Google Cloud Support to have an internal patch applied Restart the kubelet and calico-node to get the pods unstuck.
29 Sep 2022	14:53 PDT	Summary: Global: Calico enabled GKE clusters’ pods may get stuck Terminating or Pending after upgrading to 1.22+ Description: The following GKE versions are vulnerable to a race condition when using the Calico Network Policy, resulting in pods stuck Terminating or Pending: All 1.22 GKE versions All 1.23 GKE versions 1.24 versions before 1.24.4-gke.800 Only a small number of GKE clusters have actually experienced stuck pods. Use of cluster autoscaler can increase the chance of hitting the race condition. A fix is available in GKE v1.24.4-gke.800 or later. The fix is also being made available in v1.23 and v1.22, as part of the next release, which has now started. Once available, customers can manually upgrade to the fixed version. Or, Clusters on the RAPID, REGULAR or STABLE release channels using 1.22 or 1.23 will upgrade automatically over coming weeks. We will provide an update by Friday, 2022-09-30 15:00 US/Pacific with current details. The issue was introduced in the Calico component, and GKE has been working closely with the Calico project to produce a fix. We apologize to all who are affected by the disruption. Diagnosis: The Calico CNI plugin shows the following error terminating Pods: “Warning FailedKillPod 36m (x389 over 121m) kubelet error killing pod: failed to "KillPodSandbox" for "af9ab8f9-d6d6-4828-9b8c-a58441dd1f86" with KillPodSandboxError: "rpc error: code = Unknown desc = networkPlugin cni failed to teardown pod "myclient-pod-6474c76996" network: error getting ClusterInformation: connection is unauthorized: Unauthorized" Workaround: Customers currently experiencing the issue, are requested to take one of the following actions: [Recommended] Manually upgrade to GKE v1.24.4-gke.800 or later (if viable), or reach out to Google Cloud Support to have an internal patch applied Restart the kubelet and calico-node to get the pods unstuck.

All times are US/Pacific