AWS - US West (Oregon), AWS - US East (N. Virginia) : SI-20200309
Incident Report for Snowflake
Postmortem

Summary

Between the hours of 11:40 and 12:44 on Mar 09, 2020, PDT a few Snowflake customers in AWS - US East (N. Virginia) could not connect to Snowflake services.

Root Cause

The root cause of this issue was a bug in the code that resulted in the exhaustion of resources blocking customers from accessing Snowflake Services.

Resolution

  • First, Snowflake engineers worked towards mitigating the problem by rolling back the code change with the bug.
  • Next, we ran through the steps as defined in production playbooks to restore and normalize the systems.
  • Finally, after completing all the steps outlined in the playbook, the systems were restored to full functionality and customers were able to connect to Snowflake Services and execute queries.

Improvement/Changes

First, we apologize for the inconvenience caused by this incident.

  • Add additional test cases in pre-production systems for asserting the exceptions in the problem area [ Apr 3, 2020 ]
  • Enhance tools to rollback changes in production in order to minimize the exposure and downtime to end-users [ Q2 2020 ]

Note: The information contained in this report is confidential and is intended solely to promote safety and reduce customer risk.

Posted Mar 17, 2020 - 21:58 PDT

Resolved
The issue is now resolved. A detailed RCA will be posted in the next seven business days.

We apologize for the inconvenience.

If you have any questions or see any related issues, please send feedback to support@snowflake.com or submit a support request ticket via the Snowflake Lodge Portal.
Posted Mar 09, 2020 - 14:03 PDT
Monitoring
We applied a fix to resolve the issue.

We are now monitoring the system for any further recurrence of the problem and will maintain this state for 30 minutes.

Please reach out to Support at support@snowflake.com if you continue to face any issues caused by this issue.
Posted Mar 09, 2020 - 13:29 PDT
Identified
We have identified the problem with the Snowflake Service that is interrupting the following services:
1. Cannot connect to Snowflake services
2. Cannot use Snowpipe services
We are working on mitigating the problem and will provide an update as soon as the issue is resolved.
Posted Mar 09, 2020 - 12:42 PDT
Investigating
We are investigating an issue with one of the Snowflake services. We will provide more information as soon as we have identified the problem. We will provide an update in 90 minutes.
Posted Mar 09, 2020 - 12:20 PDT
This incident affected: AWS - US East (N. Virginia) (Snowflake Data Warehouse (Database), Snowpipe (Data Ingestion)) and AWS - US West (Oregon) (Snowflake Data Warehouse (Database), Snowpipe (Data Ingestion)).