Service Health
Incident affecting Google Cloud Pub/Sub
Multiple regions completely blocked for subscribe for Pubsub
Incident began at 2025-01-08 06:54 and ended at 2025-01-08 08:07 (all times are US/Pacific).
Previously affected location(s)
Mumbai (asia-south1)Delhi (asia-south2)Jakarta (asia-southeast2)Belgium (europe-west1)Berlin (europe-west10)Doha (me-central1)Iowa (us-central1)South Carolina (us-east1)Columbus (us-east5)Dallas (us-south1)
Date | Time | Description | |
---|---|---|---|
| 10 Jan 2025 | 11:23 PST | Incident ReportSummaryOn Wednesday, 8 January 2025 06:54 to 08:07 US/Pacific, Google Cloud Pub/Sub experienced a service outage in multiple regions resulting in customers unable to publish or subscribe to the messages for a duration of 1 hour and 13 minutes. This outage also resulted in an increased backlog which was identified at 8 January 2025 09:07 US/Pacific for a small subset of customer subscriptions using message ordering[1], which extended beyond the unavailability time window. These subscriptions were repaired and mitigated by 8 January 2025 23:09 US/Pacific. We deeply regret the disruption this outage caused for our Google Cloud customers. This is not the level of quality and reliability we strive to offer you, and we are taking immediate steps to improve the platform’s availability. Root CauseCloud Pub/Sub uses a regional database for the metadata state of its storage system, including information about published messages and the order in which those messages were published for ordered delivery. The regional metadata database is on the critical path of most of the Cloud Pub/Sub data plane operations. From 8 January 2025 06:54 to 07:30 US/Pacific, a bad service configuration change, which unintentionally over-restricted the permission to access this database, was rolled out to multiple regions. The issue did not surface in our pre-production environment due to a mismatch in the configuration between the two environments. In addition, the change was mistakenly rolled out to multiple regions within a short time period and did not follow the standard rollout process. This change prevented Cloud Pub/Sub from accessing the regional metadata store, leading to publish, subscribe, and backlog metrics failures and unavailability impact, which was mitigated on 8 January 2025 08:07 US/Pacific. Though the configuration change was rolled back and mitigated on 8 January 2025 08:07 US/Pacific, the database unavailability during the issue exposed a latent bug in the way Cloud Pub/Sub enforces ordered delivery for subscriptions with ordering enabled. In particular, when the database was unavailable for an extended period of time, the metadata pertaining to ordering became inconsistent with the metadata about published messages. This inconsistency prevented the delivery of a subset of messages until the subscriptions were repaired, and they received all backlogged messages in the proper order. Mitigation was completed by 8 January 2025 23:09 US/Pacific. Note that this did not impact ordering or guaranteed delivery. Remediation and PreventionGoogle engineers were alerted to the outage via internal telemetry on 8 January 2025 07:03 US/Pacific, 9 minutes after impact started. The config change that caused the issue was identified and rollback completed by 8 January 2025 08:07 US/Pacific. At 8 January 2025 09:07 US/Pacific, Google engineers were alerted via internal telemetry to the fact that a small subset of ordered subscriptions were unable to consume their backlog and root caused the metadata inconsistency at 8 January 2025 12:20 US/Pacific. Google engineers worked on identifying and repairing all impacted ordered subscriptions, which was completed by 8 January 2025 23:09 US/Pacific. Google is committed to preventing a repeat of this issue in the future and is completing the following actions:
Detailed Description of ImpactOn Wednesday 8 January 2025 from 06:54 to 08:07 US/Pacific Google Cloud Pub/Sub, Cloud Logging, and BigQuery Data Transfer Service experienced a service outage in europe-west10, asia-south1, europe-west1, us-central1, asia-southeast2, us-east1, us-east5, asia-south2, us-south1, me-central1 regions. Customers publishing from other regions may have also experienced the issue if the message storage policies [2] are set to store and process the messages in the above-mentioned regions. Google Cloud Pub/Sub : Customers were unable to publish or subscribe to the messages in the impacted regions. Publishing the messages from other regions may also have been impacted, if they have any of the impacted regions in their message storage policies. Backlog metrics might have been stale or missing.Google BigQuery Data Transfer Service : Customers experienced failures with data transfers runs failing to publish to Pub/Sub for a duration of 20 minutes.Cloud Logging : All Cloud Logs customers exporting logs to Cloud Pub/Sub experienced a delay in the log export for a duration of 26 minutes.Appendix: |
| 8 Jan 2025 | 13:23 PST | Mini Incident ReportWe apologize for the inconvenience this service disruption/outage may have caused. We would like to provide some information about this incident below. Please note, this information is based on our best knowledge at the time of posting and is subject to change as our investigation continues. If you have experienced impact outside of what is listed below, please reach out to Google Cloud Support using https://cloud.google.com/support (All Times US/Pacific) Incident Start: 8 January 2025 6:54 Incident End: 8 January 2025 8:07 Duration: 1 hour, 13 minutes Affected Services and Features:
Regions/Zones: europe-west10, asia-south1, europe-west1, us-central1, asia-southeast2, us-east1, us-east5, asia-south2, us-south1, me-central1 Customers publishing from other regions may have also experienced the issue if the message storage policies [1] are set to store and process the messages in the above-mentioned regions. Description: Google Cloud Pub/Sub experienced a service outage in multiple regions for a duration of 1 hour and 13 minutes resulting in customers unable to publish or subscribe to the messages. From preliminary analysis, the root cause of the issue was a configuration change which was rolled back to restore the service. Google will complete a full Incident Report in the following days that will provide a full root cause. Customer Impact:
Reference(s): [1] https://cloud.google.com/pubsub/docs/resource-location-restriction#message_storage_policy_overview |
| 8 Jan 2025 | 08:12 PST | The issue with Google Cloud Pub/Sub has been resolved for all affected projects as of Wednesday, 2025-01-08 08:07 US/Pacific. We will publish an analysis of this incident once we have completed our internal investigation. We thank you for your patience while we worked on resolving the issue. |
| 8 Jan 2025 | 08:03 PST | Summary: Multiple regions completely blocked for subscribe for Pubsub Description: We are experiencing an issue with Google Cloud Pub/Sub, across multiple regions affecting publish and subscribe. Our engineering team continues to investigate the issue. We will provide an update by Wednesday, 2025-01-08 09:15 US/Pacific with current details. Diagnosis: Customers in the impacted regions are unable to subscribe to messages Workaround: None at this time. |
| 8 Jan 2025 | 07:56 PST | Summary: Multiple regions completely blocked for publish for Pubsub Description: We are experiencing an issue with Google Cloud Pub/Sub, across multiple regions affecting publish and subscribe. Our engineering team continues to investigate the issue. We will provide an update by Wednesday, 2025-01-08 09:00 US/Pacific with current details. Diagnosis: Customers in the impacted regions are unable to publish messages Workaround: None at this time. |
- All times are US/Pacific