Service Outage
Incident Report for Labflow
Postmortem

Labflow Service Disruption Incident

Severity: COMPLETE DISRUPTION

Impacted: ALL USERS

Status: Resolved

Time Frame:

Starting at: At 3:10PM (CDT) on 10/8/24
Ending at: At 3:36PM (CDT) on 10/8/24
Duration: 26 minutes

Incident Overview:
At 3:10PM (CDT) on 10/8/24, the entire Labflow platform became unresponsive, impacting all users attempting to access Labflow.com or Labflow.ca.

The Catalyst engineering team determined that the cause was a primary database failure, which did not trigger the expected failover to the secondary server. We immediately engaged our database infrastructure vendor to resolve the issue. Labflow services were fully restored at 3:36PM (CDT).

Root Cause: 
Catalyst is awaiting the final root cause analysis from our database infrastructure vendor. We will provide further updates on the Labflow status page as soon as they become available: http://status.labflow.com 

Mitigations:
While there were no immediate mitigations possible during the outage, our engineering team will continue to monitor the system closely for stability. We are also working with our vendor to prevent similar incidents from occurring in the future.

Student Impact:
Students were unable to access Labflow for 26 minutes, from 3:10PM to 3:36PM (CDT).


We understand the disruption this outage may have caused and sincerely apologize for the inconvenience. If you have any questions or concerns, please don't hesitate to reach out to your faculty success manager.

Posted Oct 08, 2024 - 22:02 UTC

Resolved
The incident has been resolved. We will continue to monitor the system to ensure that service remains available.
Posted Oct 08, 2024 - 21:08 UTC
Monitoring
Labflow site is currently running. We are continuing to monitor the situation.
Posted Oct 08, 2024 - 20:41 UTC
Investigating
The Labflow site is currently unavailable to users. The issue started around 3:10 pm Central.
We are diagnosing the issue and will update when we have more information. We will update status in 30 mins.
Posted Oct 08, 2024 - 20:26 UTC
This incident affected: Labflow App.