Service Health
Incident affecting Google Compute Engine
Intermittent Connectivity Issues In us-central1b
Incident began at 2015-10-31 05:52 and ended at 2015-10-31 07:05 (all times are US/Pacific).
Date | Time | Description | |
---|---|---|---|
| 4 Nov 2015 | 22:00 PST | SUMMARY: Between Saturday 31 October 2015 and Sunday 1 November 2015, Google Compute Engine networking in the us-central1-b zone was impaired on 3 occasions for an aggregate total of 4 hours 10 minutes. We apologize if your service was affected in one of these incidents, and we are working to improve the platform’s performance and availability to meet our customer’s expectations. DETAILED DESCRIPTION OF IMPACT (All times in Pacific/US): Outage timeframes for Saturday 31 October 2015: 05:52 to 07:05 for 73 minutes Outage timeframes for Sunday 1 November 2015: 14:10 to 15:30 for 80 minutes, 19:03 to 22:40 for 97 minutes During the affected timeframes, up to 14% of the VMs in us-central1-b experienced up to 100% packet loss communicating with other VMs in the same project. The issue impacted both in-zone and intra-zone communications. ROOT CAUSE: Google network control fabrics are designed to permit simultaneous failure of one or more components. When such failures occur, redundant components on the network may assume new roles within the control fabric. A race condition in one of these role transitions resulted in the loss of flow information for a subset of the VMs controlled by the fabric. REMEDIATION AND PREVENTION: Google engineers began rolling out a change to eliminate this race condition at 18:03 PST on Monday November 2 2015. The rollout completed on at 11:13 PST on Wednesday November 4 2015. Additionally, monitoring is being improved to reduce the time required to detect, identify and resolve problematic changes to the network control fabric. |
| 31 Oct 2015 | 11:52 PDT | The issue with sending and receiving traffic between VMs in us-central1b should have been resolved for all affected instannces as of 07:08 US/Pacific. We will conduct an internal investigation of this issue and make appropriate improvements to our systems to prevent or minimize future recurrence. We sincerely apologize for any affect this disruption had on your applications and/or services. |
| 31 Oct 2015 | 09:32 PDT | The issue with sending and receiving internal traffic in us-central1b should have been resolved for the majority of instances and we expect a full resolution in the near future. We will provide an update with the affected timeframe after our investigation is complete. |
| 31 Oct 2015 | 08:29 PDT | We are continuing to investigate an intermittent issue with sending and receiving internal traffic in us-central1b and will provide another update by 09:30 US/Pacific. |
| 31 Oct 2015 | 07:43 PDT | We are currently investigating a transient issue with sending internal traffic to and from us-central1b. |
- All times are US/Pacific