On May 16, 2019, at 20:15 PDT few customers could not use Snowflake services returning an error “Unable to get the result due to an internal error. Please contact Snowflake support. This issue interrupted the query execution requests.
Our first task during the incident was to mitigate the problem and minimize the exposure to our customers.
- First, we moved the customers from the affected instances to other instances
- Second, we debugged the problem and identified the root cause of the problem with connections closing.
- We immediately started to focus on resolving this problem and deployed an emergency fix to mitigate the problem in the Front End layer that accepts client close connection requests.
- The rollout of the fix took a little longer than anticipated as were being very careful in the rollout of the fix to minimize the exposure to our customers.
The root cause of this issue was a software bug in Cloud Services layer that serves as the front end for handling all service requests that surfaced while we were performing a routine update on downstream service that holds the Application Metadata:
1. Increase the disk size
2. Increase the memory
This change was being made over many days, and the operation was successful in other regions without any incident.
However, on AWS - US West (Oregon) region, the operation was successful on a couple of instances before it failed on an instance that required a restart. After this restart, the front end had difficulty processing client close connection requests.
First, we apologize for the inconvenience caused by this incident.
We have performed a thorough RCA (Root Cause Analysis) of the problem and areas for improvement to help with:
1. Prevent the problem from recurring
i. Better controls in managing the front end server to redirect traffic from faulty instances
ii. Improved error handling in session handling in front end server
iii. Improved audit trail and logging
2. Reduce the time in identifying the root cause and emergency patch rollouts:
i. Additional logging and alerts to detect issues early
ii. Additional tools to help with isolating the problem instances
iii. Updated internal processes to manage the Service Incidents and provide a streamlined communication via Status Page
If you have any questions or issues, please send feedback to firstname.lastname@example.org or submit a support request ticket via the Snowflake Lodge Portal.
Note: The information contained in this report is confidential and is intended solely to promote safety and reduce customer risk.