Service Health
Incident affecting Google Compute Engine
We're currently investigating inbound connectivity issue with Compute Engine towards zones “europe-west1-a” and “europe-west1-b”, which started around 03:15 US/Pacific. The problem seems localized in Finland, Sweden, and Russia. We will provide more information shortly.
Incident began at 2014-06-06 03:12 and ended at 2014-06-06 07:34 (all times are US/Pacific).
Date | Time | Description | |
---|---|---|---|
| 10 Jul 2014 | 02:29 PDT | SUMMARY: On Friday 6 June 2014, approximately 0.4% of network traffic to Google Compute Engine instances was interrupted for a duration of 4 hours and 22 minutes. We apologize for any impact this had on your service or application, and are making changes to ensure this issue does not occur again. DETAILED DESCRIPTION OF IMPACT: On 6 June 2014, from 3:12 AM to 7:34 AM US/Pacific, 0.42% of network traffic directed to GCE instances was unintentionally discarded due to an error in the turn-up of additional network and load-balancing capacity. The traffic loss was most noticeable to customers attempting to reach GCE from ISPs in Russia, Finland and Sweden. ROOT CAUSE: In the process of adding additional network and load balancing capacity to Google Compute Engine, new systems were added to advertise Compute Engine IP addresses, but in their initial testing mode these systems were not configured with the information required for them to forward incoming traffic to Compute Engine instances. This is a safety precaution to prevent unintended traffic from being sent to live Compute Engine instances. Once the systems were verified to be working correctly in test mode, the systems were updated with the information required for them to forward incoming traffic to live instances. However, due to human error these systems did not reload their configuration and as a result continued to operate without the ability to forward traffic to live instances. When the systems were then added to the global IP address advertisements, they received a fraction of incoming live traffic, which was not forwarded correctly. REMEDIATION AND PREVENTION: When Google Engineers identified the issue, the misconfigured systems were temporarily removed from production until their configurations could be updated and verified correct. The systems were then returned to live service, now operating correctly. To prevent recurrence, we have updated our procedures for adding new load balancing capacity to verify that necessary routes are functional before allowing live traffic to be delivered to the systems. This will prevent IP addresses from being advertised without the necessary configuration to deliver packets to Compute Engine instances. In addition, we will add specific monitors to quickly detect and alert on this class of failure in the future. |
| 10 Jul 2014 | 02:28 PDT | Connectivity to Google Compute Engine instances from Finland, Sweden, Russia to “europe-west1-a” and “europe-west1-b” should be restored as of 07:40 US/Pacific. We apologize for any issues this may have caused you or your users and thank you for your patience and continued support. Please rest assured that system reliability is a top priority at Google, and we are making continuous improvements to make our systems better. We will provide a more detailed analysis of this incident once we have completed our internal investigation. |
| 10 Jul 2014 | 02:27 PDT | We're currently investigating inbound connectivity issue with Compute Engine towards zones “europe-west1-a” and “europe-west1-b”, which started around 03:15 US/Pacific. The problem seems localized in Finland, Sweden, and Russia. We will provide more information shortly. |
- All times are US/Pacific