Azure - East US 2 (Virginia): INC0105911
Incident Report for Snowflake
Postmortem

Snowflake Engineering has completed the postmortem of this service incident. A detailed Root Cause Analysis (RCA) is available on the Snowflake Community site:
https://community.snowflake.com/s/article/INC0105911

Posted Apr 15, 2024 - 20:22 PDT

Resolved
Current status: We've implemented the fix for this issue and monitored the environment to confirm that service was restored. If you experience additional issues or have questions, please open a support case via Snowflake Community.

Customer experience: Customers hosted in the specified regions may have been unable to use multiple Snowflake services and features.

Incident start time: 09:18 UTC April 04, 2024
Incident end time: 12:30 UTC April 04, 2024

Preliminary root cause: We encountered a potential availability zone connectivity issue with the third-party cloud service provider (CSP) hosting the region, resulting in the metadata database infrastructure entering an unhealthy state. By design, we have availability zone redundancy that initially prevented this issue from impacting customers. However, upon recovery and re-introduction of the affected availability zone, internal automation intended to manage the increase and queue of incurred requests had elevated throttling to an extent that prevented the processing of a majority of requests. As a result, customers hosted in this region may not have been able to access or use multiple Snowflake services.

Additionally, customer impact was prolonged as the internal infrastructure responsible for requesting and deploying virtual machines within the cloud services layer was affected by this issue and required a manual restart of the provisioning servers. Following the recovery of provisioning infrastructure, other cloud services layer systems and query processing systems required a gradual re-introduction over an extended period of time.

A preliminary root cause analysis (RCA) document will be published within two business days, and a complete RCA that includes corrective actions will be provided within seven business days.
Posted Apr 04, 2024 - 06:54 PDT
Monitoring
Current status: We've implemented the fix for this issue, and we'll continue to monitor the environment until we're confident all services are functioning properly.
Customer experience: Customers hosted in the specified regions may have been unable to use Snowflake.
Incident start time: 09:24 UTC April 04, 2024
Incident end time: 12:30 UTC April 04, 2024
Preliminary root cause: Snowflake faced wider server issues, which caused service to become unavailable.
Posted Apr 04, 2024 - 05:44 PDT
Update
Current status: We've identified the source of the issue, and we're continuing with implementing a fix to restore service. We'll provide another update within 60 minutes.
Customer experience: Customers hosted in the specified regions may be unable to use Snowflake.
Incident start time: 09:24 UTC April 04, 2024
Posted Apr 04, 2024 - 04:38 PDT
Identified
Current status: We've identified the source of the issue, and we're developing and implementing a fix to restore service. We'll provide another update within 60 minutes.
Customer experience: Customers hosted in the specified regions may be unable to use Snowflake.
Incident start time: 09:24 UTC April 04, 2024
Posted Apr 04, 2024 - 03:42 PDT
Investigating
Current status: We're investigating an issue with Snowflake Data Cloud. We'll provide an update within 60 minutes.
Customer experience: Customers hosted in the specified regions may be unable to execute queries or use Snowflake.
Incident start time: 09:24 UTC April 04, 2024
Posted Apr 04, 2024 - 02:46 PDT
This incident affected: Azure - East US 2 (Virginia) (Snowflake Data Warehouse (Database), Snowpipe (Data Ingestion), Replication, Snowsight).