Labflow App Outage

Incident Report for Labflow

Postmortem

After further investigation with our database vendor, it was concluded that 2 major factors contributed to this outage.

Database queries were executed with an unexpected consumption of RAM that resulted in cluster’s primary node restarting.
When the primary node failed, the cluster did not fail-over to the secondary node due to a bug in the vendor's cluster software resulting in the cluster disconnecting from Labflow.

Posted Feb 11, 2022 - 22:09 UTC

Resolved

- Labflow App Outage
Time Frame:

Start: 02/10/2022 11:18:11 AM
End: 02/10/2022 11:45:11 AM

Duration:

27 minutes

Root Cause:

Labflow's underlying database service failed. Labflow's engineering team is working with its database vendor to understand why the database service failed. We will update this incident when more information is available.

Remedies:

The database service was restarted allowing all Labflow services to become operational again.

We do not expect future incidents at this point.

Student Impact:

Students were unable to access Labflow at this time.

Posted Feb 10, 2022 - 18:26 UTC

This incident affected: Labflow App.