Service Health

This page provides status information on the services that are part of Google Cloud. Check back here to view the current status of the services listed below. If you are experiencing an issue not listed here, please contact Support. Learn more about what's posted on the dashboard in this FAQ. For additional information on these services, please visit https://cloud.google.com/.

Incident affecting Google Cloud Composer

Creation and Upgrades are failing for some Environments while using Cloud Composer 2

Incident began at 2024-04-16 02:20 and ended at 2024-04-17 03:40 (all times are US/Pacific).

Previously affected location(s)

Taiwan (asia-east1)Hong Kong (asia-east2)Tokyo (asia-northeast1)Osaka (asia-northeast2)Seoul (asia-northeast3)Mumbai (asia-south1)Singapore (asia-southeast1)Jakarta (asia-southeast2)Sydney (australia-southeast1)Melbourne (australia-southeast2)Warsaw (europe-central2)Finland (europe-north1)Belgium (europe-west1)London (europe-west2)Frankfurt (europe-west3)Netherlands (europe-west4)Zurich (europe-west6)Paris (europe-west9)Montréal (northamerica-northeast1)Toronto (northamerica-northeast2)São Paulo (southamerica-east1)Iowa (us-central1)South Carolina (us-east1)Northern Virginia (us-east4)Columbus (us-east5)Oregon (us-west1)Los Angeles (us-west2)Salt Lake City (us-west3)Las Vegas (us-west4)

Date Time Description
19 Apr 2024 10:21 PDT

Incident Report

Summary

Between 16 and 17 April 2024, Cloud Composer users experienced an elevated failure rate when creating, resizing or upgrading to newer versions of Cloud Composer 2 Environments with “Private IP” configuration for a duration of 1 day, 1 hour and 20 minutes.

Existing Private IP environments continued to operate normally if they were not upgraded or resized.

To our Cloud Composer customers whose businesses were impacted during this disruption: we sincerely apologize. This is not the level of quality and reliability we strive to offer you, and we are taking immediate steps to improve the platform’s performance and availability.

Root Cause

The root cause was due to an issue introduced by a recent change to the latest stable Container Operating System (COS) image used by Cloud Composer in one of its workloads.

The new version of the COS image (M113) moved from iptables-legacy to iptables-nft package as the default, which impacted Konlet (system executing containers)’s handling of iptables to break.

Remediation and Prevention

Google engineers were alerted to the outage via our monitoring tools on 16 April at 09:17 US/Pacific and immediately started an investigation. Once the nature and scope of the issue became clear, Google engineers reverted the recently introduced rollout.

Google is committed preventing a repeat of this issue in the future and is completing the following actions:

  • Improve our monitoring systems to ensure that in the future during similar scenarios we would be notified quicker what could help with resolving an issue earlier.
  • Cloud Composer team will modify the approach of ingesting new container base images to regionalize it and be independent of the ongoing rollouts of the base image versions.
  • The process of intake and testing of container base images will be extended to make the testing more extensive.

Detailed Description of Impact

Between 16 April 2024 from 02:20 to 17 April, 03:40 US/Pacific impacted customers might have experienced issues with:

  • Creating new Composer Environments with Private IP configuration failed. In total we observed < 200 customer projects where creations failed due to the issue.
  • Upgrades and resize operations for existing Composer Environments in the Private IP configuration failed. In total we observed < 20 environments with failed upgrades due to the issue.
  • Environments in Private IP configuration might have encountered issues with scaling for increased database-intensive workloads.
  • Existing Composer 2 Environments, if not modified, functioned correctly.
  • Customers who triggered an upgrade during an outage were facing an issue with the Composer environment which was unhealthy, including workloads with failed upgrades which could have caused ongoing performance issues.
  • Once the issue was mitigated, the impacted environments returned to the healthy state.

During the outage customers were asked to refrain from performing upgrade operations until mitigation has been confirmed.

Additional Information for Customers:

  • Fewer than 20 of the customer environments encountered upgrade failures and all of the customer environments with “Private IP” configuration that experienced upgrade failures during the incident will now upgrade successfully.

If you are one of the customers that experienced upgrade failure during the incident and still continue to have issues with further upgrades, please reach out to Google Cloud Support using https://cloud.google.com/support for assistance with recovery.

17 Apr 2024 09:39 PDT

Mini Incident Report

We apologize for the inconvenience this service disruption/outage may have caused. We would like to provide some information about this incident below. Please note, this information is based on our best knowledge at the time of posting and is subject to change as our investigation continues. If you have experienced impact outside of what is listed below, please reach out to Google Cloud Support using https://cloud.google.com/support.

(All Times US/Pacific)

Incident Start: 16 April 2024 02:20

Incident End: 17 April 2024 03:40

Duration: 1 day 1 hours, 20 minutes

Affected Services and Features:

Google Cloud Composer

Regions/Zones:

Global

Description:

Google Cloud Composer users experienced an elevated failure rate when creating, resizing, or upgrading to newer versions of Cloud Composer 2 Environments with “Private IP” configuration. This was due to an inadvertent issue introduced by a recent change to the latest stable Container Operating System (COS) image used by Cloud Composer in one of its workloads.

Existing Private IP environments continued to operate normally if they were not upgraded or resized.

Google engineers executed a rollback of the change to mitigate the issue on 17 April 2024 at 03:40 US/Pacific.

Google will complete a full Incident Report in the following days that will provide a detailed root cause.

Customer Impact:

  • Impacted users experienced issues with creating new Composer Environments, upgrades, and resize operations for existing Composer Environments in Private IP configuration.
  • Composer environments with failed upgrades were unhealthy and workloads could fail or experience performance issues
  • Environments might have encountered issues with scaling for increase database-intensive workloads

Additional details:

  • Less than 0.25% of the customer environments encountered upgrade failures and some of the customer environments with “Private IP” configuration that experienced upgrade failures during the incident will now upgrade successfully.
  • A few environments may still experience failures on further upgrades. These environments are still functioning normally and will continue serving workloads without any issues. Our engineers are working internally to identify these environments and take additional actions so that they can be further upgraded successfully. These actions are expected to be completed by early next week.

If you are one of the customers that experienced upgrade failure during the incident and still continue to have issues with further upgrades, please reach out to Google Cloud Support using https://cloud.google.com/support for assistance with recovery. Alternatively, you can recreate these environments.


17 Apr 2024 04:41 PDT

The issue with Google Cloud Composer has been resolved for all affected users as of Wednesday, 2024-04-17 04:09 US/Pacific.

Users are now able to create new Composer Environment and upgrade existing ones. Some failed upgrades during the duration of the incident may have been automatically recovered. If you're still experiencing issues, please contact us via a customer support case for our repair procedure.

We thank you for your patience while we worked on resolving the issue.

17 Apr 2024 03:46 PDT

Summary: Creation and Upgrades are failing for some Environments while using Cloud Composer 2

Description: Mitigation work is currently underway by our engineering team.

We do not have an ETA for mitigation at this point.

We will provide more information by Wednesday, 2024-04-17 05:30 US/Pacific.

Diagnosis: Impacted customers may experience issues with creating new Composer Environments and upgrades for existing Composer Environments. Existing Composer 2 Environments, if not modified, should function correctly.

Workaround: None at this time.

17 Apr 2024 01:45 PDT

Summary: Creation and Upgrades are failing for some Environments while using Cloud Composer 2

Description: Customers might experience issues with creating or upgrading to newer versions of Cloud Composer 2 Environments.The problem exists for the “Private IP” Composer Environment.

We have identified the root cause of the above issue, and are working on a fix.

Users are requested to refrain from performing upgrade operations until mitigation has been confirmed.

We will provide more information by Wednesday, 2024-04-17 04:00 US/Pacific.

Diagnosis: Impacted customers may experience issues with creating new Composer Environments and upgrades for existing Composer Environments. Existing Composer 2 Environments, if not modified, should function correctly.

Workaround: None at this time.