imports "programmer"

MD Raid 5 with 2 dropped disks

So today I was happily going about stuff when there was a brown out. nothing turned off so I was happy to continue going about doing whatever I was doing. Until I got these emails

Subject: Fail event on /dev/md1:server
This is an automatically generated mail message from mdadm
running on server

A Fail event had been detected on md device /dev/md1.

It could be related to component device /dev/sdg.

Faithfully yours, etc.

P.S. The /proc/mdstat file currently contains the following:

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active raid5 sdi[3] sdh[1](F) sdg[0](F)
2930274304 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/1] [__U]

md0 : active raid5 sda[5] sdb[0] sdd[1] sde[3] sdc[4]
7814051840 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/5] [UUUUU]

unused devices: <none>

and

Subject: Fail event on /dev/md1:server

This is an automatically generated mail message from mdadm
running on server

A Fail event had been detected on md device /dev/md1.

It could be related to component device /dev/sdh.

Faithfully yours, etc.

P.S. The /proc/mdstat file currently contains the following:

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active raid5 sdi[3] sdh[1](F) sdg[0](F)
2930274304 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/1] [__U]

md0 : active raid5 sda[5] sdb[0] sdd[1] sde[3] sdc[4]
7814051840 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/5] [UUUUU]

unused devices: <none>

After the small panic attack that my backups raid array had gone walk went about trying to fix it.

Root Cause Analysis

So It turns out that the power brown out had knocked two of my disks offline for a brief second and when they came back online Ubuntu said “oh hey – new disks!” and promptly decided to give them new names. This in turn caused my /dev/md1 array to go bonkers.

The Fix

Well, I wanted this fixed quick – so I turned my Linux server off and then on again – thinking that the disks will renumber themselves back into what they should be and the MD1 array would reassemble.

I was half right, I still needed the array to assemble and run after the reboot. Here is the command I used to get things going again.

mdadm --assemble /dev/md1 --force --run

and there we are, making an MD array run properly after power failure. I am just lucky that there was no writing going on at that time.

Maybe time for a UPS and some ZFS loving?

Posted

April 29, 2012

Linux

Jason Playne

Tags:

md, raid5, sysadmin