Service Health
Incident affecting Cloud Build, Cloud Developer Tools, Google Cloud Dataflow, Google Cloud Deploy, Google Cloud SQL, Google Compute Engine, Google Kubernetes Engine
Google Compute Engine (GCE) VM instance creation/ deletion and all related operations for other products were failing in the asia-northeast1 region.
Incident began at 2024-09-07 04:20 and ended at 2024-09-07 06:10 (all times are US/Pacific).
Previously affected location(s)
Tokyo (asia-northeast1)
Date | Time | Description | |
---|---|---|---|
| 13 Sep 2024 | 06:44 PDT | Incident ReportSummaryOn 7 September 2024 starting at 04:20 US/Pacific, several Google Cloud products experienced a service degradation of varying impact or were unavailable in asia-northeast1 region for a period of 1 hour 50 minutes. The list of impacted products and services is detailed below. To our Google Cloud customers whose businesses were impacted during this outage, we sincerely apologize. This is not the level of quality and reliability we strive to offer you. Root CauseMost Google Cloud products and services use a regional metadata store to support their internal operations. The metadata store supports critical functions such as servicing customer requests, load balancing, admin operations, and retrieving/storing metadata including server location information. Google Compute Engine (GCE) internal DNS depends on the regional metadata store for storing instance metadata. A routine update of the metadata store to a new software version had a change which resulted in poor handling of a rare resource contention corner case, which caused the writes from GCE internal DNS to a zonal replica of the metadata store to fail. During such zonal issues, we have automated failover mechanisms to use the healthy replicas from other zones. But during this disruption, a secondary issue caused automated failover to not work, rendering the entire metadata storage unavailable despite two other healthy zones being available. This resulted in disruptions to all GCE instance operations in the asia-northeast1 region. Actions such as creating, deleting, starting, and stopping instances or consuming reservations were affected. This, in turn, affected operations of other services dependent on GCE instances, including GKE, Cloud Build, Cloud Dataflow, Cloud Deploy and Cloud SQL. Remediation and PreventionGoogle engineers were alerted to the issue by internal monitoring on 7 September 2024 at 04:26 US/Pacific and immediately started an investigation. The issue was fully mitigated at 06:10 US/Pacific after failover of the metadata storage operations to the healthy zones was manually initiated by our engineering team. Google is committed to preventing a repeat of this issue in the future and is completing the following actions:
Detailed Description of Impact
|
| 9 Sep 2024 | 06:35 PDT | Mini Incident ReportWe apologize for the inconvenience this service disruption/outage may have caused. We would like to provide some information about this incident below. Please note, this information is based on our best knowledge at the time of posting and is subject to change as our investigation continues. If you have experienced impact outside of what is listed below, please reach out to Google Cloud Support using https://cloud.google.com/support. (All Times US/Pacific) Incident Start: 07 September, 2024 04:20 Incident End: 07 September, 2024 06:10 Duration: 01 hours, 50 minutes Affected Services and Features:
Regions/Zones: asia-northeast1 Description: Several Google Cloud products experienced a service degradation of varying impact, or were unavailable for a duration of 01 hours, 50 minutes in asia-northeast1 region. Google Engineers have identified the cause to be a change rollout to an internal component. This change was subsequently rolled back which mitigated all known impacts. Google will complete a full Incident Report (IR) in the following days that will provide a full root cause. Customer Impact: Through the incident duration, the impacted Google Cloud services experienced different kinds of service degradations as detailed below.
|
| 7 Sep 2024 | 07:15 PDT | The issue with Cloud Build, Google Cloud Dataflow, Google Cloud Deploy, Google Cloud SQL, Google Compute Engine, Google Kubernetes Engine has been resolved for all affected customers as of Saturday, 2024-09-07 06:10 US/Pacific. We will publish an analysis of this incident once we have completed our internal investigation. We thank you for your patience while we worked on resolving the issue. |
| 7 Sep 2024 | 07:13 PDT | Summary: Google Compute Engine (GCE) VM instance creation/ deletion and all related operations for other products were failing in the asia-northeast1 region. Description: Engineering team has rolled out the fix which mitigated the impact on Saturday, 2024-09-07 06:10 US/Pacific. We will provide more information by Saturday, 2024-09-07 07:30 US/Pacific. Diagnosis: GCE Customers impacted by this issue were experiencing "Internal error. Please try again or contact Google Support. (Code: '-1343002181035865699')" while attempting instance creation. Cloud Build customers were observing their builds not being executed. Google Cloud Dataflow customers were unable to create jobs or scale existing jobs. Workaround: Customers may retry their operation in case they experience failures. |
| 7 Sep 2024 | 06:11 PDT | Summary: Google Compute Engine (GCE) VM instance creation/ deletion operations, Google Cloud Dataflow and Cloud Build are failing in the asia-northeast1 region. Description: Mitigation work is currently underway by our engineering teams. We do not have an ETA for mitigation at this point. We will provide more information by Saturday, 2024-09-07 07:30 US/Pacific. Diagnosis: GCE Customers impacted by this issue may experience "Internal error. Please try again or contact Google Support. (Code: '-1343002181035865699')" while attempting instance creation. Cloud Build customers may observe their builds not being executed. Google Cloud Dataflow customers are unable to create jobs or scale existing jobs. Workaround: Customers may attempt their GCVE, Cloud Build and Cloud Dataflow operations in other region. |
- All times are US/Pacific