Service Health
Incident affecting Google Cloud Support
Customers may experience intermittent issues with support case creation
Incident began at 2024-10-02 14:18 and ended at 2024-10-02 16:23 (all times are US/Pacific).
Previously affected location(s)
Global
Date | Time | Description | |
---|---|---|---|
| 7 Oct 2024 | 16:28 PDT | Incident ReportSummaryOn Wednesday, 2 October 2024, Google Cloud and Google Workspace Support experienced intermittent issues with case creation for 2 hours, 5 minutes, between 14:18 US/Pacific to 16:23 US/Pacific. As a result, some users encountered unexpected errors and slight delays in case creation due to this system disruption. Additionally, the backup case creation process (through the UI) experienced elevated latency. To our Google Cloud and Google Workspace customers who experienced delay in case creation during this disruption, we sincerely apologize. This is not the level of quality and reliability we strive to offer you, and we are taking immediate steps to improve the platform’s performance and availability. Root CauseGoogle Support’s ticket persistence layer has a capacity management system in its control plane to ensure availability. Google services that integrate with the ticket persistence layer are expected to perform a controlled ramp of traffic and respond to backpressure signals. If an integration continues sending additional load after backpressure signals, the capacity management system applies a temporary throttle to the specific integration to ensure the health of the system. Beginning at 14:18 on Wednesday, 02 October 2024, an existing internal Google reporting service that integrates with the ticket persistence layer responded to an increase in data volume by rapidly ramping traffic. Due to a latent issue in the reporting service, it sent traffic in a way that bypassed normal backpressure signals. Over time traffic from the reporting service reached an elevated level to cause high query latency for some operations in our support UI. Once load on the persistence layer exceeded pre-set limits, the capacity management system restored service by temporarily throttling the reporting service’s traffic. Due to a latent issue in the reporting service, when the temporary throttle expired the reporting service again rapidly ramped traffic until the capacity management system again applied a temporary throttle. Remediation and PreventionAs each impact period was short, internal monitoring metrics for the support UI did not reach alarm thresholds until 15:01. Once these thresholds were met, Google engineers were alerted to the incident at 15:01 and immediately started working on the issue. After thorough analysis, our engineering team identified the source of the traffic to the ticket persistence layer and applied a long-lived throttle, restoring service. Google is committed to preventing a repeat of this issue in the future and is completing the following actions:
Detailed Description of ImpactBetween 14:18 US/Pacific to 16:23 US/Pacific on Wednesday 02, October 2024, users experienced intermittent delays in case creation (up to a minute). <5% of users who attempted to create a support case saw an error message “Unable to create case. Try again.” Almost all users who re-attempted creation succeeded. A small number of users experienced three consecutive failures and were presented with an alternate support contact form. All customers were able to receive support during this time for existing cases. |
| 3 Oct 2024 | 12:53 PDT | Mini Incident ReportWe apologize for the inconvenience this service disruption/outage may have caused. We would like to provide some information about this incident below. Please note, this information is based on our best knowledge at the time of posting and is subject to change as our investigation continues. If you have experienced impact outside of what is listed below, please reach out to Google Cloud Support using https://cloud.google.com/support or to Google Workspace Support using help article https://support.google.com/a/answer/1047213. (All Times US/Pacific) Incident Start: 2 Oct 2024 14:18 Incident End: 2 Oct 2024 16:23 Duration: 2 hours, 5 minutes Affected Services and Features: Google Cloud Support and Google Workspace Support Regions/Zones: Global Description: For 2 hours and 5 minutes, Google Cloud Support and Google Workspace Support customers attempting to create cases experienced intermittent case creation delays, each lasting less than 5 minutes. During the period of impact, the case creation UI directed customers experiencing the longest delays to create their case using a backup system. We have not received any reports of customers being unable to contact Support during the impacted period. From preliminary analysis, the root cause of the issue was the scaling behavior of a routine data pipeline. The data pipeline ramped use of a common persistence layer in a manner that bypassed normal load shedding and isolation. When the persistence layer became unhealthy, the control plane applied a temporary throttle to the data pipeline, enabling the Support UI to successfully create user cases, until the temporary throttle expired, at which time the pipeline repeated its behavior. Google will complete a full IR in the following days that will provide a full root cause. Customer Impact:
|
| 2 Oct 2024 | 18:50 PDT | The issue with Google Cloud Support case creation has been resolved for all affected customers as of Wednesday, 2024-10-02 16:12 US/Pacific. We thank you for your patience while we worked on resolving the issue. |
| 2 Oct 2024 | 18:07 PDT | Summary: Customers may experience intermittent issues with support case creation Description: We experienced an issue where some customers were unable to create cases. This issue has been mitigated as of 16:12 US/Pacific. The support case creation process is now working successfully, but we continue to investigate the underlying cause and monitor our environment for stability. Initial investigations have determined a temporary issue due to an unexpected increase in traffic. We will provide an update by Wednesday, 2024-10-02 20:00 US/Pacific with current details. Diagnosis: Impacted customers may receive an unexpected error while attempting case creation. Workaround: Retrying the case creation process may be successful due to the intermittent nature of the issue. |
| 2 Oct 2024 | 16:47 PDT | Summary: Customers may experience intermittent issues with support case creation Description: We are experiencing an intermittent issue with Google Cloud Support case creation beginning on Wednesday, 2024-10-02 14:20 US/Pacific. Our engineering team continues to investigate the issue. We will provide an update by Wednesday, 2024-10-02 18:00 US/Pacific with current details. We apologize to all who are affected by the disruption. Diagnosis: Impacted customers may receive an unexpected error while attempting case creation. Workaround: Retrying the case creation process may be successful due to the intermittent nature of the issue. |
- All times are US/Pacific