Service Health

This page provides status information on the services that are part of Google Cloud. Check back here to view the current status of the services listed below. If you are experiencing an issue not listed here, please contact Support. Learn more about what's posted on the dashboard in this FAQ. For additional information on these services, please visit https://cloud.google.com/.

Incident affecting Operations, Cloud Monitoring

Elevated error rate across all monitoring API endpoints globally

Incident began at 2021-08-06 14:25 and ended at 2021-08-06 20:53 (all times are US/Pacific).

Date Time Description
9 Aug 2021 09:02 PDT

We apologize for the inconvenience this service disruption/outage may have caused. We would like to provide some information about this incident below. Please note, this information is based on our best knowledge at the time of posting and is subject to change as our investigation continues. If you have experienced impact outside of what is listed below, please reach out to Google Support by opening a case using https://cloud.google.com/support

(All Times US/Pacific)

Incident Start: 06 August 2021 14:25

Incident End: 06 August 2021 20:53

Duration: 6 hours, 28 minutes

Affected Services and Features:

Google Cloud Monitoring

Regions/Zones: Global

Description:

Google Cloud Monitoring experienced increased latency and error rates for monitoring endpoints globally for 6 hours, 28 minutes. From preliminary analysis, the root cause of the issue is an overload of a monitoring API dependency that serves metric and monitored resource descriptors.

Customer Impact:

Requests against the monitoring API would have seen increased timeouts, errors, and latency. Cloud Monitoring dashboards would have failed to load due to timeout.

Additional details:

This service disruption was mitigated by increasing the resources available to the affected dependency, and we are confident that there will not be a recurrence.

6 Aug 2021 21:32 PDT

The issue with Cloud Monitoring has been resolved for all affected users as of Friday, 2021-08-06 20:53 US/Pacific.

We thank you for your patience while we worked on resolving the issue.

6 Aug 2021 17:51 PDT

Summary: Elevated error rate across all monitoring API endpoints globally

Description: Mitigation work is still underway by our engineering team.

We do not have an ETA for mitigation at this point.

We will provide more information by Friday, 2021-08-06 21:29 US/Pacific.

Diagnosis: None at this time.

Workaround: None at this time.

6 Aug 2021 16:56 PDT

Summary: Elevated error rate across all monitoring API endpoints globally

Description: Mitigation work is still underway by our engineering team.

We do not have an ETA for mitigation at this point.

We will provide more information by Friday, 2021-08-06 17:59 US/Pacific.

Diagnosis: None at this time.

Workaround: None at this time.

6 Aug 2021 15:58 PDT

Summary: Elevated error rate across all monitoring API endpoints globally

Description: Mitigation work is currently underway by our engineering team.

We do not have an ETA for mitigation at this point.

We will provide more information by Friday, 2021-08-06 16:59 US/Pacific.

Diagnosis: None at this time.

Workaround: None at this time.

6 Aug 2021 15:25 PDT

Summary: Elevated error rate across all monitoring API endpoints globally

Description: Mitigation work is currently underway by our engineering team.

We do not have an ETA for mitigation at this point.

We will provide more information by Friday, 2021-08-06 15:59 US/Pacific.

Diagnosis: None at this time.

Workaround: None at this time.

6 Aug 2021 15:11 PDT

Summary: Elevated error rate across all monitoring API endpoints globally

Description: We are experiencing an issue with Cloud Monitoring beginning at Friday, 2021-08-06 14:25 US/Pacific US/Pacific.

Our engineering team continues to investigate the issue.

We will provide the next update by Friday, 2021-08-06 15:40 US/Pacific US/Pacific.

Diagnosis: None at this time.

Workaround: None at this time.