AZURE - East US 2 (Virginia) - SI:20191213
Incident Report for Snowflake
Postmortem

Summary

Between 8:04:00 PDT and 08:53 PDT on Dec 13, 2019, some Snowflake customers in AZURE - East US2 (Virginia) experienced the following symptoms:

  1. Queries taking longer than usual or Degraded Performance
  2. Queries waiting on “Warehouse Provisioning” state
  3. Queries failing to complete

Root Cause

A service host running critical internal warehouse maintenance tasks experienced memory pressure slowing down operations. This memory pressure resulted in delays in scheduling queries on warehouses. Snowflake services use health checks to identify unhealthy hosts. However, the health check in this instance was below the threshold, and as a result, the sick host was not replaced automatically.

Resolution

We started to investigate this issue in response to internal alerts and customer cases. We identified the affected host and took it out of circulation. The affected services migrated to alternate hosts and worked through the backlog of pending requests. Service was fully restored by 8.53 PDT.

Improvement/Changes

First, we apologize for the inconvenience caused by this incident.

First, we apologize for the inconvenience caused by this incident.

  • Enhance alerting to improve response times to detect delays in scheduling of queries.
  • Continuous effort by engineering to look at further enhancing the monitoring of services.

Note: The information contained in this report is confidential and is intended solely to promote safety and reduce customer risk.

Posted Dec 17, 2019 - 15:12 PST

Resolved
"The issue is now resolved.

We will post a preliminary Root Cause Analysis report within the next 48 business hours and follow up with a detailed RCA in the next seven business days.

We apologize for the inconvenience. If you have any questions or see any related issues, please send feedback to support@snowflake.com or submit a support request ticket via the Snowflake Lodge Portal.
Posted Dec 13, 2019 - 09:29 PST
Monitoring
We applied a fix to resolve the issue. We are now monitoring the system for any further recurrence of the problem, and we will be in this state for 30 minutes.

Please reach out to Support at support@snowflake.com if you continue to face any issues caused by this issue.
Posted Dec 13, 2019 - 08:53 PST
Investigating
We are investigating an issue with one of the Snowflake services. We will provide more information as soon as we have identified the problem. We will provide an update in 90 minutes.
Posted Dec 13, 2019 - 08:39 PST
This incident affected: AZURE - US East 2 (Virginia) (Snowflake Data Warehouse (Database)).