Service Health
Incident affecting Google BigQuery
BigQuery queries return 500 Internal Errors
Incident began at 2014-10-15 07:00 and ended at 2014-10-15 08:19 (all times are US/Pacific).
Date | Time | Description | |
---|---|---|---|
| 21 Oct 2014 | 10:11 PDT | SUMMARY: On Wednesday 15 October 2014, 20% of queries to the Google BigQuery streaming API service failed with an internal error over a period of 147 minutes. If your application was affected by the unavailability of the streaming API, we sincerely apologize; our goal is to set a high standard of availability and reliability, which we failed to meet in this case. We have taken and are taking immediate action to prevent future recurrences of this issue. DETAILED DESCRIPTION OF IMPACT: From 05:52 to 08:19 PDT on Wednesday 15 October 2014, some users experienced an increase in error rates when querying recently ingested streaming data. 42% of queries executed by 18% of projects during this time were affected by this issue. Impacted users saw API responses indicating a retriable internal error had occurred. ROOT CAUSE: As part of a standard deployment, Google engineers released new configuration directives between a subset of the systems responsible for ingesting streaming data, and those responsible for serving recently streamed data. However, some components were relying on stale cached configuration data, and when that cached configuration data was refreshed on a subset of systems, queries against tables that contained recently streamed data failed with an internal error. REMEDIATION AND PREVENTION: Google engineers received reports from customers regarding errors with the streaming API service at 07:20 and began to direct traffic to alternate data centers that were not affected by this configuration change at 08:14. Error rates began recovering after the redirection of traffic, and the service recovered fully by 08:19. To prevent further recurrences of this issue, Google engineers are adding more thorough testing and monitoring to the configuration deployment process to detect and prevent stale configuration data from being introduced. Google engineers are also adding more stringent monitoring and alerting to detect and address elevated error rates sooner. |
| 15 Oct 2014 | 09:02 PDT | The problem with Google BigQuery service should be resolved as of Wednesday, 2014-10-15 08:17 US/Pacific. We apologize for any issues this may have caused to you or your users and thank you for your patience and continued support. Please rest assured that system reliability is a top priority at Google, and we are constantly working to improve the reliability of our systems. We will provide a more detailed analysis of this incident once we have completed our internal investigation. |
| 15 Oct 2014 | 08:20 PDT | We're investigating an issue with the BigQuery service beginning at around 2014-10-15 07:00 US/Pacific. BigQuery queries return 500 Internal Error when using BigQuery API and other tools. We will provide more information by 2014-10-15 09:00 US/Pacific. |
| 15 Oct 2014 | 08:17 PDT | (empty) |
- All times are US/Pacific