AWS EU Frankfurt: Disruption
Incident Report for Snowflake Computing
Postmortem

Summary
On Mar 7, 2019, between the times 9:44 PST to 10:20 PST, some of the Snowflake customers with deployments on AWS EU - Frankfurt were unable to use Snowflake services exhibiting one of the following symptoms:
1. Cannot retrieve or save worksheets from Snowflake UI
2. Failed Queries

Resolution Action
Snowflake Engineering resolved the issue by taking the following steps:
First, identified the clusters that were impacted and isolated them from production immediately
Second, the Engineering team added new clusters to production after successfully verifying the startup sequence and validating the HSM calls
Third, Monitored the affected deployment for errors after HSM v2 is active.

Root Cause(s)
The disruption in service was caused due to an issue with a wrong configuration file that prevented a service module (HSM) from starting up correctly. The incorrect configuration file was as a result of a bug in the incorrect security permissions for the HSM module.

Improvements/Changes
Our investigation of the problem and research shows two areas of opportunity for improvement:
1. Update the JAVA security file with the appropriate permissions for the HSM module
2. Upgrade of HSM module to v2 is a significant upgrade that is being applied sequentially to each of our deployments. So, for the pending updates, we have now added a new step (i.e., in addition to code change identified in step 1 above) of verifying the HSM v2 upgrade by checking the permissions and invoking the call to retrieve a test key.

We apologize for the inconvenience. If you have any questions or issues, please send feedback to support@snowflake.com or submit a support request ticket via the Snowflake Lodge Portal.

Snowflake Support

Posted 8 days ago. Mar 11, 2019 - 00:08 PDT

Resolved
Dear Customer,
All systems are normal.

We have fixed the problem with the Snowflake service that resulted in an interruption to the following:
1. Access via Snowflake UI
2. Queries requesting new data from the data store

The disruption was lasted for about 36 minutes from Mar 7, 2019, 9:44 PST to Mar 7, 2019, 10:20 PST. We will provide a detailed RCA as soon as it is available.

We apologize for the inconvenience. If you have any questions or issues, please send feedback to support@snowflake.com or submit a support request ticket via the Snowflake Lodge Portal.

Snowflake Support
Posted 12 days ago. Mar 07, 2019 - 11:04 PST
Monitoring
Dear Customer,
We have fixed the problem with the Snowflake service that resulted in an interruption to the following:
1. Access via Snowflake UI
2. Queries requesting new data from the data store

The issue impacted a subset of our customers and the system is returning to normalcy. We will continue to monitor the system and provide additional details on the disruption as soon as possible.

We apologize for the inconvenience. If you have any questions or issues, please send feedback to support@snowflake.com or submit a support request ticket via the Snowflake Lodge Portal.

Snowflake Support
Posted 12 days ago. Mar 07, 2019 - 10:41 PST
Update
We are continuing to work on a fix for this issue.
Posted 12 days ago. Mar 07, 2019 - 10:26 PST
Identified
Dear Customer,
We have identified a problem with one of the Snowflake services that is resulting in an interruption to a limited number of actions like:
1. Access via Snowflake UI
2. Queries requesting new data from the data store

The issue is narrowed down to one module and we are working towards a resolution.

We apologize for the inconvenience. If you have any questions or issues, please send feedback to support@snowflake.com or submit a support request ticket via the Snowflake Lodge Portal.

Snowflake Support
Posted 12 days ago. Mar 07, 2019 - 10:05 PST
This incident affected: AWS - EU Frankfurt (Snowflake Data Warehouse (Database), Snowpipe (Data Ingestion)).