Service Health
Incident affecting Cloud Machine Learning, Vertex AI Training
Vertex AI custom training jobs failing if using more than 2GB ephemeral storage
Incident began at 2024-08-16 11:44 and ended at 2024-08-16 16:23 (all times are US/Pacific).
Previously affected location(s)
Taiwan (asia-east1)Hong Kong (asia-east2)Tokyo (asia-northeast1)Seoul (asia-northeast3)Mumbai (asia-south1)Singapore (asia-southeast1)Sydney (australia-southeast1)Belgium (europe-west1)London (europe-west2)Frankfurt (europe-west3)Netherlands (europe-west4)Zurich (europe-west6)Montréal (northamerica-northeast1)Toronto (northamerica-northeast2)Iowa (us-central1)South Carolina (us-east1)Northern Virginia (us-east4)Oregon (us-west1)Los Angeles (us-west2)
Date | Time | Description | |
---|---|---|---|
| 16 Aug 2024 | 16:23 PDT | The issue with Vertex AI Training has been resolved for all affected users as of Friday, 2024-08-16 16:07 US/Pacific. We thank you for your patience while we worked on resolving the issue. Thank you for choosing us. |
| 16 Aug 2024 | 12:03 PDT | Summary: Vertex AI custom training jobs failing if using more than 2GB ephemeral storage Description: Mitigation work is currently underway by our engineering team. We do not have an ETA for mitigation at this point. We will provide more information by Friday, 2024-08-16 17:30 US/Pacific. Diagnosis: Custom Vertex AI training jobs running on GKE and using more than 2GB of ephemeral storage may fail with the error ""Pod ephemeral local storage usage exceeds the total limit of containers 2Gi." Workaround: None at this time. |
| 16 Aug 2024 | 11:58 PDT | Summary: Vertex AI custom training jobs failing if using more than 2GB ephemeral storage Description: Mitigation work is currently underway by our engineering team. We do not have an ETA for mitigation at this point. We will provide more information by Friday, 2024-08-16 17:00 US/Pacific. Diagnosis: Custom Vertex AI training jobs running on GKE and using more than 2GB of ephemeral storage may fail with the error ""Pod ephemeral local storage usage exceeds the total limit of containers 2Gi." Workaround: None at this time. |
- All times are US/Pacific