Service Health
Incident affecting Google BigQuery
Execution of BigQuery query jobs is delayed, queries may take longer than usual to complete
Incident began at 2014-10-13 00:30 and ended at 2014-10-13 13:55 (all times are US/Pacific).
Date | Time | Description | |
---|---|---|---|
| 24 Oct 2014 | 11:07 PDT | SUMMARY: On Monday 13 October 2014, some user-submitted BigQuery jobs experienced increased execution time for a period 13 hours and 18 minutes. If your jobs were affected by this delayed execution, we apologize; we strive to maintain the highest standard of performance and reliability and failed to uphold that standard in this instance. We have implemented changes to both address this issue and monitor and prevent future recurrences of this issue. DETAILED DESCRIPTION OF IMPACT: From 00:37 to 13:55 PDT on Monday 13 October 2014, 1.6% of queries experienced scheduling delays of up to four hours and experienced performance degradation during execution. Affected jobs started to recover by 07:17 and were fully recovered by 13:55. ROOT CAUSE: The BigQuery service received a combination of user queries that led to lock contention in the underlying component responsible for processing large joins and groupings. This lock contention slowed down both scheduling and query execution. REMEDIATION AND PREVENTION: Monitoring systems alerted Google engineers to increased query latency at 02:27. To address the performance of the service, Google engineers restarted several of the service components, and focused on identifying specific affected queries and projects. At 07:27, the engineers redirected traffic to an unaffected datacenter to mitigate the effect on new incoming queries. To prevent further recurrence of this issue, Google engineers have addressed the sources of lock contention responsible for performance degradation, and have added further instrumentation to help on-call engineers quickly identify problematic combinations of user queries. Finally, Google engineers have added more stringent detection and alerting in cases of latency increases. |
| 13 Oct 2014 | 14:14 PDT | The problem with Google BigQuery should be resolved as of Monday, 2014-10-13 13:55 US/Pacific. We apologize for any issues this may have caused to you or your users and thank you for your patience and continued support. Please rest assured that system reliability is a top priority at Google, and we are constantly working to improve the reliability of our systems. We will provide a more detailed analysis of this incident once we have completed our internal investigation. |
| 13 Oct 2014 | 12:35 PDT | Google BigQuery performance has been restored for most query jobs, and we expect resolution for the remaining affected jobs in the near future. For everyone who is affected, we apologize for any inconvenience you are experiencing. We will provide the next update by 2014-10-13 13:00 (Pacific Time) with further details. |
| 13 Oct 2014 | 11:24 PDT | We are continuing work to correct the ongoing issues with Google BigQuery. By now, we have stabilized the system performance and are closely monitoring the situation as we carry on the investigation into root cause. For everyone who is affected, we apologize for any inconvenience you may be experiencing. We will provide another status update by 2014-10-13 12:00 US/Pacific. |
| 13 Oct 2014 | 10:12 PDT | We are still investigating the issue with Google BigQuery service. Execution of some query jobs are taking longer than usual and may timeout. We will provide another status update by 2014-10-13 11:00 US/Pacific. |
| 13 Oct 2014 | 09:05 PDT | We're investigating an issue with BigQuery service beginning at around 2014-10-13 00:30 US/Pacific. Execution of some query jobs is taking longer than usual and might timeout. We will provide more information by 2014-10-13 10:00 US/Pacific. |
- All times are US/Pacific