Service Health

This page provides status information on the services that are part of Google Cloud. Check back here to view the current status of the services listed below. If you are experiencing an issue not listed here, please contact Support. Learn more about what's posted on the dashboard in this FAQ. For additional information on these services, please visit https://cloud.google.com/.

Incident affecting Cloud Monitoring, Operations, Google Cloud Pub/Sub

Global: Cloud Monitoring Metrics may be unavailable or underreported for Cloud Pub/Sub

Incident began at 2022-03-31 08:30 and ended at 2022-03-31 15:54 (all times are US/Pacific).

Previously affected location(s)

Global

Date Time Description
18 Apr 2022 13:16 PDT

Summary

On Thursday March 31st, starting at 08:30 PT, Cloud Pub/Sub metrics were missing or were underreported in Cloud Monitoring for some Cloud Pub/Sub customers for a duration of 7 hours, 24 minutes. Google apologizes to customers who were affected by this outage and is taking steps to ensure that this type of outage does not reoccur.

Root Cause

Our investigation found the cause was a backend configuration change to our Cloud Monitoring service. This configuration changed the computation of some metrics not directly related to, but shared by, Cloud Pub/Sub. This configuration change progressively rolled out across all Google Cloud regions over two hours.

This configuration change increased the latency of requests to record metrics sent from Cloud Pub/Sub to Cloud Monitoring and, in some cases, resulted in failures due to write operations timing out.

Remediation and Prevention

Engineers were able to mitigate the issue by reverting the change that caused the issue, restoring services for all customers at 15:54 US/Pacific.

We are taking the following actions to ensure this does not happen again:

  • Improving the monitoring of Cloud Pub/Sub metrics reporting to allow for quicker error detection.
  • Making Cloud Pub/Sub metrics reporting operations more resilient to high latency.
  • Improving internal visibility and vetting of Cloud Monitoring backend configuration changes.

Detailed Description of Impact

On Thursday March 31st, between 08:30 and 15:54 US/Pacific time:

Cloud Pub/Sub Metrics in Cloud Reporting

  • The metric values lost during this timeframe are not recoverable.
  • Any alerting based on these metrics might have fired erroneously or not fired when it should have.
  • Auto scaling of Google Kubernetes Engine (GKE) based on these metrics may not have functioned as expected.
1 Apr 2022 13:13 PDT

We apologize for the inconvenience this service disruption/outage may have caused. We would like to provide some information about this incident below. Please note, this information is based on our best knowledge at the time of posting and is subject to change as our investigation continues. If you have experienced impact outside of what is listed below, please reach out to Google Support by opening a case using https://cloud.google.com/support or help article https://support.google.com/a/answer/1047213.

(All Times US/Pacific)

Incident Start: 31 March 2022 08:30

Incident End: 31 March 2022 15:54

Duration: 7 hours, 24 minutes

Affected Services and Features:

Google Cloud Pub/Sub, Google Cloud Monitoring

Regions/Zones: Global Locale

Description:

Google Cloud Pub/Sub customers experienced issues with metrics in Google Cloud Monitoring for a duration of 7 hours, 24 minutes. The issue was caused by a configuration change to the backend for Cloud Monitoring that affected Cloud Pub/Sub metric recording. The issue was mitigated by reverting this change.

Customer Impact:

  • Cloud Pub/Sub metrics in Cloud Monitoring for times during the incident may be missing or underreported. - The metric values lost in this timeframe will not be recoverable.
  • Any alerting based on these metrics might have fired erroneously or not fired when they should have during the time of the incident.
  • Any auto scaling of Google Kubernetes Engine (GKE) based on these metrics may not have functioned as expected during the time of the incident.
  • Cloud Pub/Sub administrative, publish, and subscribe operations were not affected by the incident.
31 Mar 2022 16:11 PDT

The issue with Google Cloud Pub/Sub monitoring has been resolved for all affected projects as of Thursday, 2022-03-31 15:54 US/Pacific.

We will publish an analysis of this incident once we have completed our internal investigation.

We thank you for your patience while we worked on resolving the issue.

31 Mar 2022 15:23 PDT

Summary: Global: Cloud Monitoring Metrics may be unavailable or underreported for Cloud Pub/Sub

Description: ​​We believe the issue with Google Cloud Pub/Sub monitoring was partially resolved as of 14:57, and are continuing to monitor the recovery of the service.

We do not have an ETA for full resolution at this point.

We will provide an update by Thursday, 2022-03-31 16:30 US/Pacific with current details.

Diagnosis: - Customers impacted by this issue may see Cloud Monitoring metrics for Cloud Pub/Sub that show no or underreported values.

  • Any alerting based on these metrics may fire erroneously.
  • Any auto scaling of GKE based on these metrics may not function as expected due to lack of or underreported values.

Publish and subscribe metrics are currently affected for publishers and subscribers in following regions:

  • asia-east1
  • europe-north1
  • europe-west4
  • us-central1
  • us-central2
  • us-east1
  • us-east4
  • us-east7
  • us-west1
  • us-west4

Backlog metrics for subscriptions and snapshot metrics in all regions are no longer affected in any region.

Workaround: Non-Cloud-Pub/Sub metrics and logs on publish and subscriber clients can be used as a proxy to ensure that publishing and subscribing is still behaving as expected. For example, metrics available for clients running on GCE include:

  • instance/cpu/utilization
  • instance/network/received_bytes_count
  • instance/network/sent_bytes_count
31 Mar 2022 14:49 PDT

Summary: Global: Cloud Monitoring Metrics may be unavailable or underreported for Cloud Pub/Sub

Description: We are experiencing an issue with Cloud Monitoring metrics for Cloud Pub/Sub beginning Thursday, 2022-03-31 09:30 US/Pacific. There is no known impact on Cloud Pub/Sub administrative, publish, or subscribe operations at this time.

Engineering is continuing to investigate the issue.

We will provide an update by Thursday, 2022-03-31 15:25 US/Pacific with current details.

We apologize to all who are affected by the disruption.

Diagnosis: - Customers impacted by this issue may see Cloud Monitoring metrics for Cloud Pub/Sub that show no or underreported values.

  • Any alerting based on these metrics may fire erroneously.
  • Any auto scaling of GKE based on these metrics may not function as expected due to lack of or underreported values.

Publish and subscribe metrics are currently affected for publishers and subscribers in following regions:

  • asia-east1
  • europe-north1
  • europe-west4
  • us-central1
  • us-central2
  • us-east1
  • us-east4
  • us-east7
  • us-west1
  • us-west4

Backlog metrics for subscriptions and snapshot metrics in all regions are no longer affected in any region.

Workaround: Non-Cloud-Pub/Sub metrics and logs on publish and subscriber clients can be used as a proxy to ensure that publishing and subscribing is still behaving as expected. For example, metrics available for clients running on GCE include:

  • instance/cpu/utilization
  • instance/network/received_bytes_count
  • instance/network/sent_bytes_count
31 Mar 2022 14:20 PDT

Summary: Global: Cloud Monitoring Metrics may be unavailable or underreported for Cloud Pub/Sub

Description: We are experiencing an issue with Cloud Monitoring metrics for Cloud Pub/Sub beginning Thursday, 2022-03-31 09:30 US/Pacific. There is no known impact on Cloud Pub/Sub administrative, publish, or subscribe operations at this time.

Engineering is continuing to investigate the issue.

We will provide an update by Thursday, 2022-03-31 14:55 US/Pacific with current details.

We apologize to all who are affected by the disruption.

Diagnosis: - Customers impacted by this issue may see Cloud Monitoring metrics for Cloud Pub/Sub that show no or underreported values.

  • Any alerting based on these metrics may fire erroneously.
  • Any auto scaling of GKE based on these metrics may not function as expected due to lack of or underreported values.

Publish and subscribe metrics are currently affected for publishers and subscribers in following regions:

  • asia-east1
  • europe-north1
  • europe-west4
  • us-central1
  • us-central2
  • us-east1
  • us-east4
  • us-east7
  • us-west1
  • us-west4

Backlog metrics for subscriptions and snapshot metrics in all regions are currently affected.

Workaround: Non-Cloud-Pub/Sub metrics and logs on publish and subscriber clients can be used as a proxy to ensure that publishing and subscribing is still behaving as expected. For example, metrics available for clients running on GCE include:

  • instance/cpu/utilization
  • instance/network/received_bytes_count
  • instance/network/sent_bytes_count
31 Mar 2022 13:49 PDT

Summary: Global: Cloud Monitoring Metrics may be unavailable or underreported for Cloud Pub/Sub

Description: We are experiencing an issue with Cloud Monitoring metrics for Cloud Pub/Sub beginning Thursday, 2022-03-31 09:30 US/Pacific. There is no known impact on Cloud Pub/Sub administrative, publish, or subscribe operations at this time.

Engineering is continuing to investigate the issue.

We will provide an update by Thursday, 2022-03-31 14:25 US/Pacific with current details.

We apologize to all who are affected by the disruption.

Diagnosis: - Customers impacted by this issue may see Cloud Monitoring metrics for Cloud Pub/Sub that show no or underreported values.

  • Any alerting based on these metrics may fire erroneously.
  • Any auto scaling of GKE based on these metrics may not function as expected due to lack of or underreported values.

Publish and subscribe metrics are currently affected for publishers and subscribers in following regions:

  • asia-east1
  • europe-north1
  • europe-west4
  • us-central1
  • us-central2
  • us-east1
  • us-east4
  • us-east7
  • us-west1
  • us-west4

Backlog metrics for subscriptions and snapshot metrics in all regions are currently affected.

Workaround: Non-Cloud-Pub/Sub metrics and logs on publish and subscriber clients can be used as a proxy to ensure that publishing and subscribing is still behaving as expected. For example, metrics available for clients running on GCE include:

  • instance/cpu/utilization
  • instance/network/received_bytes_count
  • instance/network/sent_bytes_count
31 Mar 2022 13:18 PDT

Summary: Global: Cloud Monitoring Metrics may be unavailable or underreported for Cloud Pub/Sub

Description: We are experiencing an issue with Cloud Monitoring and Cloud Pub/Sub beginning Thursday, 2022-03-31 09:30 US/Pacific. There is no known impact on Cloud Pub/Sub administrative, publish, or subscribe operations at this time.

Engineering is continuing to investigate the issue.

We will provide an update by Thursday, 2022-03-31 13:55 US/Pacific with current details.

We apologize to all who are affected by the disruption.

Diagnosis: - Customers impacted by this issue may see Cloud Monitoring metrics for Cloud Pub/Sub that show no or underreported values.

  • Any alerting based on these metrics may fire erroneously.
  • Any auto scaling of GKE based on these metrics may not function as expected due to lack of or underreported values.

Workaround: Non-Cloud-Pub/Sub metrics and logs on publish and subscriber clients can be used as a proxy to ensure that publishing and subscribing is still behaving as expected. For example, metrics available for clients running on GCE include:

  • instance/cpu/utilization
  • instance/network/received_bytes_count
  • instance/network/sent_bytes_count
31 Mar 2022 12:52 PDT

Summary: Global: Cloud Monitoring Metrics may be unavailable or underreported for Cloud Pub/Sub

Description: We are experiencing an issue with Cloud Monitoring and Cloud Pub/Sub beginning Thursday, 2022-03-31 09:30 US/Pacific. There is no known impact on Cloud Pub/Sub administrative, publish, or subscribe operations at this time.

Engineering is continuing to investigate the issue.

We will provide an update by Thursday, 2022-03-31 13:25 US/Pacific with current details.

We apologize to all who are affected by the disruption.

Diagnosis: Customers impacted by this issue may see Cloud Monitoring metrics for Cloud Pub/Sub that show no or underreported values. Any alerting based on these metrics may fire erroneously.

Workaround: Non-Cloud-Pub/Sub metrics and logs on publish and subscriber clients can be used as a proxy to ensure that publishing and subscribing is still behaving as expected. For example, metrics available for clients running on GCE include:

  • instance/cpu/utilization
  • instance/network/received_bytes_count
  • instance/network/sent_bytes_count
31 Mar 2022 12:23 PDT

Summary: Global: Cloud Monitoring Metrics may be unavailable or underreported for Cloud Pub/Sub

Description: We are experiencing an issue with Cloud Monitoring and Cloud Pub/Sub beginning Thursday, 2022-03-31 09:30 US/Pacific. There is no known impact on Cloud Pub/Sub administrative, publish, or subscribe operations at this time.

Engineering is continuing to investigate the issue.

We will provide an update by Thursday, 2022-03-31 12:55 US/Pacific with current details.

We apologize to all who are affected by the disruption.

Diagnosis: Customers impacted by this issue may see Cloud Monitoring metrics for Cloud Pub/Sub that show no or underreported values. Any alerting based on these metrics may fire erroneously.

Workaround: Non-Cloud-Pub/Sub metrics and logs on publish and subscriber clients can be used as a proxy to ensure that publishing and subscribing is still behaving as expected. For example, metrics available for clients running on GCE include:

  • instance/cpu/utilization
  • instance/network/received_bytes_count
  • instance/network/sent_bytes_count
31 Mar 2022 11:58 PDT

Summary: Global: Cloud Monitoring Metrics may be unavailable for Cloud Pub/Sub

Description: We are experiencing an issue with Cloud Monitoring and Cloud Pub/Sub beginning at Thursday, 2022-03-31 09:30 US/Pacific.

Engineering is continuing to investigate the issue.

We will provide an update by Thursday, 2022-03-31 12:30 US/Pacific with current details.

We apologize to all who are affected by the disruption.

Diagnosis: Customers may experience unavailable Metrics in Cloud Monitoring for Cloud Pub/Sub

Workaround: None at this time.

31 Mar 2022 11:40 PDT

Summary: Global: Cloud Monitoring Metrics may be unavailable

Description: We are experiencing an issue with Cloud Monitoring beginning at Thursday, 2022-03-31 09:30 US/Pacific.

Our engineering team continues to investigate the issue.

We will provide an update by Thursday, 2022-03-31 12:00 US/Pacific with current details.

We apologize to all who are affected by the disruption.

Diagnosis: Customers may experience unavailable Metrics in Cloud Monitoring

Workaround: None at this time.