Service Health
Incident affecting Google Kubernetes Engine
us-central1-c: GKE experiencing issues with some cluster and nodepool operations. Issue Mitigated.
Incident began at 2021-10-19 11:40 and ended at 2021-10-19 16:21 (all times are US/Pacific).
Date | Time | Description | |
---|---|---|---|
| 21 Oct 2021 | 14:56 PDT | We apologize for the inconvenience this service outage may have caused. We would like to provide some information about this incident below. Please note, this information is based on our best knowledge at the time of posting and is subject to change as our investigation continues. If you have experienced impact outside of what is listed below, please reach out to Google Support by opening a case using https://cloud.google.com/support (All Times US/Pacific) First Impact Incident Start: 19 October 2021 11:40 Incident End: 19 October 2021 16:21 Duration: 4 hours, 41 minutes Second Impact Incident Start: 20 October 2021 02:00 Incident End: 20 October 2021 09:30 Duration: 7 hours, 30 minutes Affected Services and Features: Google Kubernetes Engine Regions/Zones: us-central1-c Description: Google Kubernetes Engine experienced two impacts to operations, the first on 19 October 2021 and the second on 20 October 2021. First Impact: Customers may have experienced up to 100% failure rate for create-cluster, delete-cluster, delete-nodepool operations and node-pool resizes in us-central1-c for 4 hours and 41 minutes. From preliminary analysis, the root cause was resource contention related to an unexpected increase in API operations. Engineers scaled up instances to mitigate the issue. Second Impact: Customers may have experienced up to 80% failure for create-cluster, delete-cluster, delete-nodepool operations and node-pool resizes in us-central1-c for 7 hours and 30 minutes. From preliminary analysis, the root case was created by an unexpected increase of nodepool operations in a single customer cluster. Engineers mitigated the issue through additional quota enforcement. Customer Impact: Customers affected would have experienced 500/503 errors for create-cluster, delete-cluster, create-nodepool, delete-nodepool, and node-pool resizes. |
| 19 Oct 2021 | 16:40 PDT | The issue with GKE cluster and node pool operations, has been resolved for all affected projects as of Tuesday, 2021-10-19 16:21 US/Pacific. We thank you for your patience while we worked on resolving the issue. |
| 19 Oct 2021 | 16:25 PDT | Summary: us-central1-c: GKE experiencing issues with some cluster and nodepool operations. Mitigation underway. Description: Mitigation work is currently underway by our engineering team. We do not have an ETA for mitigation at this point. We will provide more information by Tuesday, 2021-10-19 17:00 US/Pacific. Diagnosis: The following operations may fail with 500/503 errors: create-cluster, delete-cluster, create-nodepool, delete-nodepool, and node-pool resizes Workaround: The failed operations may succeed on retrying them |
| 19 Oct 2021 | 15:32 PDT | Summary: us-central1-c: GKE experiencing issues with some cluster and nodepool operations. Investigations ongoing. Description: Our engineering team continues to investigate the issue with GKE at us-central1-c, starting Tuesday, 2021-10-19 11:40 US/Pacific We will provide an update by Tuesday, 2021-10-19 16:30 US/Pacific with latest details. Diagnosis: The following operations may fail with 500/503 errors: create-cluster, delete-cluster, create-nodepool, delete-nodepool, and node-pool resizes Workaround: The failed operations may succeed on retrying them |
| 19 Oct 2021 | 14:50 PDT | Summary: us-central1-c: GKE experiencing issues with some cluster and nodepool operations. Investigations ongoing. Description: Our engineering team continues to investigate the issue with GKE at us-centra1-c, starting Tuesday, 2021-10-19 11:40 US/Pacific We will provide an update by Tuesday, 2021-10-19 15:30 US/Pacific with latest details. Diagnosis: The following operations may fail with 500/503 errors: create-cluster, delete-cluster, create-nodepool, delete-nodepool, and node-pool resizes Workaround: The failed operations may succeed on retrying them |
| 19 Oct 2021 | 14:28 PDT | Summary: us-central1-c: GKE experiencing issues with some cluster and nodepool operations Description: We are experiencing an issue with Google Kubernetes Engine beginning at Tuesday, 2021-10-19 11:40 US/Pacific. Our engineering team continues to investigate the issue. Existing workloads should continue to work while attempts to resize nodepools, create or delete nodepools and clusters fail We will provide an update by Tuesday, 2021-10-19 15:00 US/Pacific with current details. We apologize to all who are affected by the disruption. Diagnosis: create-cluster, delete-cluster, create-nodepool, and delete-nodepool, node-pool resizes operations fail with 500 and 503 errors. Workaround: The failed operations may succeed on retrying them |
| 19 Oct 2021 | 14:06 PDT | Summary: us-central1-c: Create / Delete Cluster and NodePool operations failing Description: We are experiencing an issue with Google Kubernetes Engine beginning at Tuesday, 2021-10-19 11:40 US/Pacific. Our engineering team continues to investigate the issue. We will provide an update by Tuesday, 2021-10-19 14:35 US/Pacific with current details. We apologize to all who are affected by the disruption. Diagnosis: create-cluster, delete-cluster, create-nodepool, and delete-nodepool, node-pool resizes operations are failing with 500 and 503 errors. Workaround: The failed operations may succeed on retrying them |
- All times are US/Pacific