The case for turning it off and on again

So the other day we started having issues with our mail server. The symptom was the mail queue showing hundreds of emails with a message like “SMTP Server rejected at greeting”. Amavis (the mail scanner / coordinator) was rejecting mail and ClamAV was not working properly. We found that simply restarting the Amavis daemon and flushing the mail queue would resolve the problem for a short time before it would happen again.

The postfix mail queue spikes
The postfix mail queue spikes

Before we managed to resolve the problem, it happened over and over again, with more and more frequency as you can see in the above graph.

The resolution? I restarted the server and waved goodbye to the 200+ day uptime. Because the file system had not been fsck’d in such a long time, it was forced and low and behold there were busted inodes and file system errors. These problems were fixed and since then the mail server has been happily behaving itself. I am also no longer scratching my head as to why load was so high while processing the mail queue and why Amavis was failing!

Have you tried turning it off and then on again?
The IT Crowd – Have you tried turning it off and then on again?

That said, I will not be reaching for the turning it off and then on again approach to resolve all the problems we encounter, as most of them can be fixed quickly if you look through the logs!