Service Health
Incident affecting Google Kubernetes Engine, Google Cloud Storage
Multiple products reporting 403 access denied issue in us-west1 and multiregion-us
Incident began at 2023-06-19 07:29 and ended at 2023-06-19 14:14 (all times are US/Pacific).
Previously affected location(s)
Belgium (europe-west1)Multi-region: usOregon (us-west1)Los Angeles (us-west2)
Date | Time | Description | |
---|---|---|---|
| 28 Jun 2023 | 10:10 PDT | Incident ReportSummaryOn 19 June 2023 at 07:29 US/Pacific, some Google Cloud customers in us-west1 and multiregion-us received intermittent “403 access denied errors” while using Google Kubernetes Engine (GKE) or attempting to access Google Cloud Storage (GCS) for a duration of 6 hours and 45 minutes. Root CauseGoogle Cloud’s common authentication infrastructure provides short-lived, forwardable internal credentials for establishing user presence. Production jobs use a public key to verify the signature from this authentication platform. This public key is periodically rotated to increase security. In this case, updated public keys were successfully rolled out to >99.99% of machines but failed to roll out to one machine before the previous keys became stale. Tasks scheduled on machines with stale keys returned an internal authentication failure that was translated to an access-denied error (HTTP 403) in the us-west1 region and the US-multi-region. This resulted in a small number of customers experiencing intermittent access denied errors despite having correct credentials. Remediation and PreventionGoogle engineers were alerted to the issue via a customer support case on 19 June 2023 at 07:29 US/Pacific and immediately started an investigation. Engineers identified an affected machine that was using a stale public key as part of our internal authentication infrastructure. Engineers took standard mitigating actions, by stopping the serving path process within 10 minutes of identifying the affected machine, and confirmed full mitigation within another 4 minutes. Services were fully recovered on 19 June 2023 at 14:14 US/Pacific, Google is committed to preventing future recurrence, and we are taking the following actions:
|
| 20 Jun 2023 | 16:50 PDT | Mini Incident ReportWe apologize for the inconvenience this service disruption/outage may have caused. We would like to provide some information about this incident below. Please note, this information is based on our best knowledge at the time of posting and is subject to change as our investigation continues. If you have experienced impact outside of what is listed below, please reach out to Google Cloud Support using https://cloud.google.com/support (All Times US/Pacific) Incident Start: 19 June 2023 at 07:29 Incident End: 19 June 2023 at 14:14 Duration: 6 hours, 45 minutes Affected Services and Features: Google Kubernetes Engine (GKE), Google Cloud Storage (GCS) Regions/Zones: us-west1, multiregion-us Description: Google Kubernetes Engine in the us-west1 region and Google Cloud Storage services in us-west1 and multiregion-us regions returned intermittent 403 errors for customers for a period of 6 hours and 45 minutes. From our preliminary analysis, the incident was caused by inconsistent keys being used by one of our machines as part of our internal authenticator infrastructure. This led to a failure to verify all requests that were handled by the affected machine. Google will share a full Incident Report in the following days that will provide a detailed root cause. Google is immediately starting an audit of other machines containing authorization tokens to ensure that the incident is not repeated. We are also working on a permanent fix to prevent this situation from happening in the future, and improving our ability to quickly detect this issue if it ever happens again. Customer Impact: Google Cloud Storage customers in the impacted regions would have received intermittent 403 errors on accessing and/or writing into the GCS bucket. Customers may have also received errors stating that the request had invalid authentication credentials, despite using a valid account. A small amount of Google Kubernetes Engine customers in the us-west1 region may have received intermittent 403 errors on GKE node when calling GKE APIs. |
| 19 Jun 2023 | 14:54 PDT | The issue with Google Cloud Storage, Google Kubernetes Engine has been resolved for all affected projects as of Monday, 2023-06-19 14:13 US/Pacific. We thank you for your patience while we worked on resolving the issue. |
| 19 Jun 2023 | 14:42 PDT | Summary: Multiple products reporting 403 access denied issue in us-west1 and multiregion-us Description: We are experiencing an issue with GCS and GKE products beginning on Monday, 2023-06-19 7:30 US/Pacific. Mitigation work is currently underway by our engineering team. We do not have an ETA for mitigation at this point. We will provide more information by Monday, 2023-06-19 15:15 US/Pacific. Diagnosis: Customers may experience 403 access denied errors while accessing these products Workaround: None at this time. |
| 19 Jun 2023 | 14:16 PDT | Summary: Multiple products reporting 403 access denied issue in us-west1 and multiregion-us Description: We are experiencing an issue with GCS and GKE products beginning on Monday, 2023-06-19 7:30 US/Pacific. Our engineering team continues to investigate the root cause of the issue. We have not confirmed any impact on other Google Cloud services. We continue to investigate. We will provide an update by Monday, 2023-06-19 14:45 US/Pacific with current details. Diagnosis: Customers may experience 403 access denied errors while accessing these products Workaround: None at this time. |
| 19 Jun 2023 | 13:42 PDT | Summary: Multiple products reporting 403 access denied issue in us-west1 and multiregion-us Description: We are experiencing an issue with GCS and GKE products beginning on Monday, 2023-06-19 7:30 US/Pacific. Our engineering team continues to investigate the root cause of the issue. We have not confirmed any impact on other Google Cloud services. We continue to investigate. We will provide an update by Monday, 2023-06-19 14:15 US/Pacific with current details. Diagnosis: Customers may experience 403 access denied errors while accessing these products Workaround: None at this time. |
| 19 Jun 2023 | 13:13 PDT | Summary: Multiple products reporting 403 access denied issue in us-west1 and multiregion-us Description: We are experiencing an issue with GCS and GKE products beginning on Monday, 2023-06-19 7:30 US/Pacific. Our engineering team continues to investigate the root cause of the issue. We are also investigating any impact on all our GCP products and making sure they are functioning as expected. We will provide an update by Monday, 2023-06-19 13:45 US/Pacific with current details. Diagnosis: Customers may experience 403 access denied errors while accessing these products Workaround: None at this time. |
| 19 Jun 2023 | 12:43 PDT | Summary: Multiple products reporting 403 access denied issue in us-west1 and multiregion-us Description: We are experiencing an issue with GCS and GKE products beginning on Monday, 2023-06-19 7:30 US/Pacific. Our engineering team continues to investigate the root cause of the issue. We will provide an update by Monday, 2023-06-19 13:15 US/Pacific with current details. Diagnosis: Customers may experience 403 access denied errors while accessing these products Workaround: None at this time. |
| 19 Jun 2023 | 12:14 PDT | Summary: Multiple products reporting 403 access denied issue in us-west1 and multiregion-us Description: We are experiencing an issue with GCS and GKE products beginning at Monday, 2023-06-19 7:30 US/Pacific. Our engineering team continues to investigate the issue. We will provide an update by Monday, 2023-06-19 12:45 US/Pacific with current details. Diagnosis: Customers may experience 403 access denied errors while accessing these products Workaround: None at this time. |
| 19 Jun 2023 | 11:33 PDT | Summary: Multiple products reporting 403 access denied issue in us-west1 and multiregion-us Description: We are experiencing an issue with multiple products beginning at Monday, 2023-06-19 7:30 US/Pacific. Our engineering team continues to investigate the issue. We will provide an update by Monday, 2023-06-19 12:15 US/Pacific with current details. Diagnosis: Customers may experience 403 access denied errors while accessing these products Workaround: None at this time. |
| 19 Jun 2023 | 11:15 PDT | Summary: We are investigating a potential issue with multiple products in us-west1 and us-west2 regions Description: We are experiencing an issue with multiple products beginning at Monday, 2023-06-19 7:30 US/Pacific. Our engineering team continues to investigate the issue. We will provide an update by Monday, 2023-06-19 12:00 US/Pacific with current details. Diagnosis: Customers may experience 403 access denied errors while accessing these products Workaround: None at this time. |
| 19 Jun 2023 | 11:02 PDT | Summary: We are investigating a potential issue with multiple products in us-west1 and us-west2 regions Description: We are experiencing an issue with Google Kubernetes Engine, Google Cloud Storage beginning at Monday, 2023-06-19 7:38 US/Pacific. Our engineering team continues to investigate the issue. We will provide an update by Monday, 2023-06-19 11:59 US/Pacific with current details. Diagnosis: Customers may experience 403 access denied errors while accessing these products Workaround: None at this time. |
- All times are US/Pacific