Google Cloud Status

This page provides status information on the services that are part of the Google Cloud Platform. Check back here to view the current status of the services listed below. For additional information on these services, please visit cloud.google.com.

Google Compute Engine Incident #15049

Issue with Network Connectivity on April 10th, 2015

Incident began at 2015-04-10 02:10 and ended at 2015-04-10 02:24 (all times are US/Pacific).

Date Time Description
Apr 13, 2015 04:37

SUMMARY:

On Friday 10 April 2015, Google Compute Engine instances in us-central1 experienced elevated packet loss for a duration of 14 minutes. If your service or application was affected, we apologize — this is not the level of quality and reliability we strive to offer you, and we are taking immediate steps to improve the platform’s performance and availability.

DETAILED DESCRIPTION OF IMPACT:

On Friday 10 April 2015 from 02:10 to 02:24 PDT, instances hosted in Google Compute Engine zone us-central1-b experienced elevated packet loss for internal (VM <-> VM) traffic, and every zone in region us-central1 experienced elevated packet loss for external (Internet <-> VM) traffic. The impact varied on different network paths e.g., for VM to VM and VM to Internet reported packet loss was between 26 to 47% at peak, while for Internet to VM 18 to 34% of total packets were lost.

ROOT CAUSE:

During routine planned maintenance a miscommunication resulted in traffic being sent to a datacenter router that was running a test configuration. This resulted in a saturated link, causing packet loss. The faulty configuration became effective at 02:10 and caused traffic congestion soon after.

REMEDIATION AND PREVENTION:

Google Engineers were notified by our alerting systems at 02:12 and confirmed an unusually high rate of packet loss at 02:18. At 02:21 Google Engineers disabled the problematic router, distributing traffic to other, unsaturated links. Normal operation was restored at 02:24.

To prevent similar incidents in future, we are changing procedure to include additional validation checks while configuring routers during maintenance activities. We are also implementing a higher degree of automation to remove potential human and communication errors when changing router configurations.

SUMMARY:

On Friday 10 April 2015, Google Compute Engine instances in us-central1 experienced elevated packet loss for a duration of 14 minutes. If your service or application was affected, we apologize — this is not the level of quality and reliability we strive to offer you, and we are taking immediate steps to improve the platform’s performance and availability.

DETAILED DESCRIPTION OF IMPACT:

On Friday 10 April 2015 from 02:10 to 02:24 PDT, instances hosted in Google Compute Engine zone us-central1-b experienced elevated packet loss for internal (VM <-> VM) traffic, and every zone in region us-central1 experienced elevated packet loss for external (Internet <-> VM) traffic. The impact varied on different network paths e.g., for VM to VM and VM to Internet reported packet loss was between 26 to 47% at peak, while for Internet to VM 18 to 34% of total packets were lost.

ROOT CAUSE:

During routine planned maintenance a miscommunication resulted in traffic being sent to a datacenter router that was running a test configuration. This resulted in a saturated link, causing packet loss. The faulty configuration became effective at 02:10 and caused traffic congestion soon after.

REMEDIATION AND PREVENTION:

Google Engineers were notified by our alerting systems at 02:12 and confirmed an unusually high rate of packet loss at 02:18. At 02:21 Google Engineers disabled the problematic router, distributing traffic to other, unsaturated links. Normal operation was restored at 02:24.

To prevent similar incidents in future, we are changing procedure to include additional validation checks while configuring routers during maintenance activities. We are also implementing a higher degree of automation to remove potential human and communication errors when changing router configurations.

Apr 10, 2015 02:56

The problem with Google Compute Engine network connectivity was resolved as of 02:24 US/Pacific on 10th April 2015. We apologize for any issue this may have caused you or your users and thank you for your patience and continued support. Please rest assured that system reliability is a top priority at Google and we are constantly working to improve the reliability of our systems.

The problem with Google Compute Engine network connectivity was resolved as of 02:24 US/Pacific on 10th April 2015. We apologize for any issue this may have caused you or your users and thank you for your patience and continued support. Please rest assured that system reliability is a top priority at Google and we are constantly working to improve the reliability of our systems.

Apr 10, 2015 02:46

We are currently investigating an issue with Google Compute Engine network connectivity. We will provide an update by Friday 10th April 2015 03:30 PST.

We are currently investigating an issue with Google Compute Engine network connectivity. We will provide an update by Friday 10th April 2015 03:30 PST.