Service Health

This page provides status information on the services that are part of Google Cloud. Check back here to view the current status of the services listed below. If you are experiencing an issue not listed here, please contact Support. Learn more about what's posted on the dashboard in this FAQ. For additional information on these services, please visit https://cloud.google.com/.

Incident affecting Google Cloud Pub/Sub

Multiple regions completely blocked for subscribe for Pubsub

Incident began at 2025-01-08 06:54 and ended at 2025-01-08 08:07 (all times are US/Pacific).

Previously affected location(s)

Mumbai (asia-south1)Delhi (asia-south2)Jakarta (asia-southeast2)Belgium (europe-west1)Berlin (europe-west10)Doha (me-central1)Iowa (us-central1)South Carolina (us-east1)Columbus (us-east5)Dallas (us-south1)

Date Time Description
10 Jan 2025 11:23 PST

Incident Report

Summary

On Wednesday, 8 January 2025 06:54 to 08:07 US/Pacific, Google Cloud Pub/Sub experienced a service outage in multiple regions resulting in customers unable to publish or subscribe to the messages for a duration of 1 hour and 13 minutes.

This outage also resulted in an increased backlog which was identified at 8 January 2025 09:07 US/Pacific for a small subset of customer subscriptions using message ordering[1], which extended beyond the unavailability time window. These subscriptions were repaired and mitigated by 8 January 2025 23:09 US/Pacific.

We deeply regret the disruption this outage caused for our Google Cloud customers. This is not the level of quality and reliability we strive to offer you, and we are taking immediate steps to improve the platform’s availability.

Root Cause

Cloud Pub/Sub uses a regional database for the metadata state of its storage system, including information about published messages and the order in which those messages were published for ordered delivery. The regional metadata database is on the critical path of most of the Cloud Pub/Sub data plane operations. From 8 January 2025 06:54 to 07:30 US/Pacific, a bad service configuration change, which unintentionally over-restricted the permission to access this database, was rolled out to multiple regions. The issue did not surface in our pre-production environment due to a mismatch in the configuration between the two environments. In addition, the change was mistakenly rolled out to multiple regions within a short time period and did not follow the standard rollout process. This change prevented Cloud Pub/Sub from accessing the regional metadata store, leading to publish, subscribe, and backlog metrics failures and unavailability impact, which was mitigated on 8 January 2025 08:07 US/Pacific.

Though the configuration change was rolled back and mitigated on 8 January 2025 08:07 US/Pacific, the database unavailability during the issue exposed a latent bug in the way Cloud Pub/Sub enforces ordered delivery for subscriptions with ordering enabled. In particular, when the database was unavailable for an extended period of time, the metadata pertaining to ordering became inconsistent with the metadata about published messages. This inconsistency prevented the delivery of a subset of messages until the subscriptions were repaired, and they received all backlogged messages in the proper order. Mitigation was completed by 8 January 2025 23:09 US/Pacific. Note that this did not impact ordering or guaranteed delivery.

Remediation and Prevention

Google engineers were alerted to the outage via internal telemetry on 8 January 2025 07:03 US/Pacific, 9 minutes after impact started. The config change that caused the issue was identified and rollback completed by 8 January 2025 08:07 US/Pacific. At 8 January 2025 09:07 US/Pacific, Google engineers were alerted via internal telemetry to the fact that a small subset of ordered subscriptions were unable to consume their backlog and root caused the metadata inconsistency at 8 January 2025 12:20 US/Pacific. Google engineers worked on identifying and repairing all impacted ordered subscriptions, which was completed by 8 January 2025 23:09 US/Pacific.

Google is committed to preventing a repeat of this issue in the future and is completing the following actions:

  • Our engineering team is working on implementing stronger enforcement of parity between pre-production and production environments in order to ensure the impact of configuration changes can be caught before changes move to production. ETA: 31 January 2025.
  • We are reviewing our change management process to ensure that future configuration changes roll out in a progressive fashion aligned with the priority of the change. ETA: 31 January 2025.
  • We are working on implementing additional monitoring that proactively detects ordering metadata inconsistency. ETA: 31 March 2025.
  • We are implementing a fix to the Cloud Pub/Sub ordering metadata management bug, which led to undelivered, ordered messages. ETA: 30 June 2025.

Detailed Description of Impact

On Wednesday 8 January 2025 from 06:54 to 08:07 US/Pacific Google Cloud Pub/Sub, Cloud Logging, and BigQuery Data Transfer Service experienced a service outage in europe-west10, asia-south1, europe-west1, us-central1, asia-southeast2, us-east1, us-east5, asia-south2, us-south1, me-central1 regions.

Customers publishing from other regions may have also experienced the issue if the message storage policies [2] are set to store and process the messages in the above-mentioned regions.

Google Cloud Pub/Sub : Customers were unable to publish or subscribe to the messages in the impacted regions. Publishing the messages from other regions may also have been impacted, if they have any of the impacted regions in their message storage policies. Backlog metrics might have been stale or missing.

Google BigQuery Data Transfer Service : Customers experienced failures with data transfers runs failing to publish to Pub/Sub for a duration of 20 minutes.

Cloud Logging : All Cloud Logs customers exporting logs to Cloud Pub/Sub experienced a delay in the log export for a duration of 26 minutes.

Appendix:

8 Jan 2025 13:23 PST

Mini Incident Report

We apologize for the inconvenience this service disruption/outage may have caused. We would like to provide some information about this incident below. Please note, this information is based on our best knowledge at the time of posting and is subject to change as our investigation continues. If you have experienced impact outside of what is listed below, please reach out to Google Cloud Support using https://cloud.google.com/support

(All Times US/Pacific)

Incident Start: 8 January 2025 6:54

Incident End: 8 January 2025 8:07

Duration: 1 hour, 13 minutes

Affected Services and Features:

  • Google Cloud Pub/Sub
  • Cloud Logging
  • BigQuery Data Transfer Service

Regions/Zones: europe-west10, asia-south1, europe-west1, us-central1, asia-southeast2, us-east1, us-east5, asia-south2, us-south1, me-central1

Customers publishing from other regions may have also experienced the issue if the message storage policies [1] are set to store and process the messages in the above-mentioned regions.

Description:

Google Cloud Pub/Sub experienced a service outage in multiple regions for a duration of 1 hour and 13 minutes resulting in customers unable to publish or subscribe to the messages.

From preliminary analysis, the root cause of the issue was a configuration change which was rolled back to restore the service. Google will complete a full Incident Report in the following days that will provide a full root cause.

Customer Impact:

  • Google Cloud Pub/Sub : Customers were unable to publish or subscribe to the messages in the impacted regions. Publishing the messages from other regions may also have been impacted, if they have any of the impacted regions in their message storage policies. Backlog stats metric might be stale or missing.

  • Google BigQuery Data Transfer Service : Customers experienced failures with data transfers runs.

  • Cloud Logging : All Cloud Logs customers exporting logs to Cloud Pub/Sub experienced a delay in the log export for a duration of 26 minutes.

Reference(s):

[1] https://cloud.google.com/pubsub/docs/resource-location-restriction#message_storage_policy_overview

8 Jan 2025 08:12 PST

The issue with Google Cloud Pub/Sub has been resolved for all affected projects as of Wednesday, 2025-01-08 08:07 US/Pacific.

We will publish an analysis of this incident once we have completed our internal investigation.

We thank you for your patience while we worked on resolving the issue.

8 Jan 2025 08:03 PST

Summary: Multiple regions completely blocked for subscribe for Pubsub

Description: We are experiencing an issue with Google Cloud Pub/Sub, across multiple regions affecting publish and subscribe.

Our engineering team continues to investigate the issue.

We will provide an update by Wednesday, 2025-01-08 09:15 US/Pacific with current details.

Diagnosis: Customers in the impacted regions are unable to subscribe to messages

Workaround: None at this time.

8 Jan 2025 07:56 PST

Summary: Multiple regions completely blocked for publish for Pubsub

Description: We are experiencing an issue with Google Cloud Pub/Sub, across multiple regions affecting publish and subscribe.

Our engineering team continues to investigate the issue.

We will provide an update by Wednesday, 2025-01-08 09:00 US/Pacific with current details.

Diagnosis: Customers in the impacted regions are unable to publish messages

Workaround: None at this time.