AWS - US East/West, IE Dublin, Europe (Frankfurt) : SI-20190704
Incident Report for Snowflake
Postmortem

Summary

Between 23:54 PDT on July 3, 2019, and 07:54 PDT on July 04, 2019 a small set of our customers could not access Snowflake service using ODBC, JDBC, Snowflake connector for Python, Snowflake Connector for Spark drivers provided by Snowflake.

Resolution

  • First, we debugged the failed connections issue reported and narrowed down the issue to the cache server. The problem was corrected by making a fix to our infrastructure.
  • Second, we found a configuration issue with some of our customers using the Private Link feature that was required to be made by the client after upgrading to the latest drivers. The problem was corrected by working with customers directly and made changes to their configuration as outlined in Release Notes.
  • Third, we found a bug in our ODBC drivers having a version between 2.19.0 and 2.19.6. This bug prevented the drivers to continue to connect with Snowflake with fail-open behavior.

Root Cause

The root cause was a failure with the Cache server that maintains ocsp responses for server certificates. This failure was caused as a result planned and routine infrastructure maintenance work and went unnoticed until after the timestamp on the cache contents expired, and the cache became stale.

Improvement/Changes

First, we apologize for the inconvenience caused by this incident. We have identified the following steps to make our infrastructure more resilient :

  • Released an updated version of ODBC 2.19.7 to fix the bug in the code that continues to work even if the Certificate cache becomes stale.
  • Added a new alert to monitor the cache validity.
  • Additional planned improvements to infrastructure for being more resilient and not depend on a single path for updating the cache servers.

Note: The information contained in this report is confidential and is intended solely to promote safety and reduce customer risk.

Posted 9 days ago. Jul 09, 2019 - 13:16 PDT

Resolved
This issue has been resolved. See Postmortem for RCA report.
Posted 14 days ago. Jul 04, 2019 - 08:28 PDT
Monitoring
This issue has been resolved. See Postmortem for RCA report.
Posted 14 days ago. Jul 04, 2019 - 07:24 PDT
Investigating
This issue has been resolved. See Postmortem for RCA report.
Posted 14 days ago. Jul 04, 2019 - 05:58 PDT
This incident affected: AWS - Europe (Ireland) (Snowflake Data Warehouse (Database)), AWS - US East (N. Virginia) (Snowflake Data Warehouse (Database)), and AWS - US West (Oregon) (Snowflake Data Warehouse (Database)).