Service Health
Incident affecting Operations, Cloud Monitoring
Elevated error rate across all monitoring API endpoints globally
Incident began at 2021-08-06 14:25 and ended at 2021-08-06 20:53 (all times are US/Pacific).
Date | Time | Description | |
---|---|---|---|
| 9 Aug 2021 | 09:02 PDT | We apologize for the inconvenience this service disruption/outage may have caused. We would like to provide some information about this incident below. Please note, this information is based on our best knowledge at the time of posting and is subject to change as our investigation continues. If you have experienced impact outside of what is listed below, please reach out to Google Support by opening a case using https://cloud.google.com/support (All Times US/Pacific) Incident Start: 06 August 2021 14:25 Incident End: 06 August 2021 20:53 Duration: 6 hours, 28 minutes Affected Services and Features: Google Cloud Monitoring Regions/Zones: Global Description: Google Cloud Monitoring experienced increased latency and error rates for monitoring endpoints globally for 6 hours, 28 minutes. From preliminary analysis, the root cause of the issue is an overload of a monitoring API dependency that serves metric and monitored resource descriptors. Customer Impact: Requests against the monitoring API would have seen increased timeouts, errors, and latency. Cloud Monitoring dashboards would have failed to load due to timeout. Additional details: This service disruption was mitigated by increasing the resources available to the affected dependency, and we are confident that there will not be a recurrence. |
| 6 Aug 2021 | 21:32 PDT | The issue with Cloud Monitoring has been resolved for all affected users as of Friday, 2021-08-06 20:53 US/Pacific. We thank you for your patience while we worked on resolving the issue. |
| 6 Aug 2021 | 17:51 PDT | Summary: Elevated error rate across all monitoring API endpoints globally Description: Mitigation work is still underway by our engineering team. We do not have an ETA for mitigation at this point. We will provide more information by Friday, 2021-08-06 21:29 US/Pacific. Diagnosis: None at this time. Workaround: None at this time. |
| 6 Aug 2021 | 16:56 PDT | Summary: Elevated error rate across all monitoring API endpoints globally Description: Mitigation work is still underway by our engineering team. We do not have an ETA for mitigation at this point. We will provide more information by Friday, 2021-08-06 17:59 US/Pacific. Diagnosis: None at this time. Workaround: None at this time. |
| 6 Aug 2021 | 15:58 PDT | Summary: Elevated error rate across all monitoring API endpoints globally Description: Mitigation work is currently underway by our engineering team. We do not have an ETA for mitigation at this point. We will provide more information by Friday, 2021-08-06 16:59 US/Pacific. Diagnosis: None at this time. Workaround: None at this time. |
| 6 Aug 2021 | 15:25 PDT | Summary: Elevated error rate across all monitoring API endpoints globally Description: Mitigation work is currently underway by our engineering team. We do not have an ETA for mitigation at this point. We will provide more information by Friday, 2021-08-06 15:59 US/Pacific. Diagnosis: None at this time. Workaround: None at this time. |
| 6 Aug 2021 | 15:11 PDT | Summary: Elevated error rate across all monitoring API endpoints globally Description: We are experiencing an issue with Cloud Monitoring beginning at Friday, 2021-08-06 14:25 US/Pacific US/Pacific. Our engineering team continues to investigate the issue. We will provide the next update by Friday, 2021-08-06 15:40 US/Pacific US/Pacific. Diagnosis: None at this time. Workaround: None at this time. |
- All times are US/Pacific