On May 13, 2019, at 08:28 PDT, a small number of customers could not use Snowflake services reporting the following symptoms:
Cannot read Schema
Cannot execute queries
A high number of random errors while executing queries
First, we isolated the problem machines and removed them from the cluster.
Second, we increased the size of the cluster from 2 to 4 to handle the backlog.
The above two actions cleared the backlog and we didn’t see an recurrence of the issue.
The root cause was exceptions in our code while reading or writing data from the metadata database. The exceptions were caused by two machines that were in a non-responsive state, and as a result, subsequent job requests were not scheduled to warehouses building up a backlog queue.
First, we apologize for the inconvenience caused by this incident.
We are working on improving resiliency in our code to be able to handle these corner case scenarios where the machines hang without giving any indication that they are down.
If you have any questions or issues, please send feedback to email@example.com or submit a support request ticket via the Snowflake Lodge Portal.