Conductor Outage Explained

Author: Jeremy Friesen

This morning, at approximately 3:45am, Conductor's database server stopped responding. At approximately 6:45am, the database server was brought back up.

In the lead up to the outage, no information was lost. During the outage, any updates via the Conductor admin may not have been saved.

I have investigated the cause and have made some initial adjustments to ensure that this particular problem does not happen again.

There is additional work that I will need to do to guarantee this particular problem does not manifest itself again.

What Actually Happened

The /tmp directory on the DB server filled up. I had installed a monitoring tool on the database server; The purpose was to see disk usage and processor statistics. The monitoring tool wrote information to a /tmp/logs directory.

When I installed the script, I did not review where the logs were being written. As such, I assumed the logs would be written to the conventional location (/var/logs). Had the logs been written to /var/log, the currently running log rotation scripts would have ensured that the log file didn't get too big.