Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

Design the backup so that if the backup storage completely implodes totally inexplicably and without warning and has no chance of recovery, doesn't make you a sad panda. That might mean there is a primary "active" backup and then a secondary backup or archive that uses separate storage devices. The main idea of RAID is to avoid downtime in the face of a drive failure, not as a way to avoid a second backup copy of everything important.

Also I'm not totally clear what you mean by virtual RAID 6. Is this Btrfs native raid6? Or is it Btrfs on an mdadm or LVM based RAID 6? The former, Btrfs raid56, isn't production ready and comes with numerous expert level caveats. Using mdadm or LVM RAID 6 is stable. Btrfs requires quite a bit of familiarity if you run into problems, the normal sequence of recovery is different on Btrfs. So you need to be pretty familiar with this, have backups, and you might even hedge your bets and make either the first or second backup on a different file system, such as XFS which at least by default now checksums metadata.

And yes, if you were to create two backups by splitting the same member devices among them, that means if a drive dies, both backups are now degraded. It's not ideal. In my own case, I keep a totally segregated 3rd backup that's mostly offline/shelved except when it's being updated in which case either the primary or secondary is made offline. That way even in the face of user error (very common vector for data loss), one of the storage stacks should survive even my mistakes.

Design the backup so that if the backup storage completely implodes totally inexplicably and without warning and has no chance of recovery, doesn't make you a sad panda. That might mean there is a primary "active" backup and then a secondary backup or archive that uses separate storage devices. The main idea of RAID is to avoid downtime in the face of a drive failure, not as a way to avoid a second backup copy of everything important.

Also I'm not totally clear what you mean by virtual RAID 6. Is this Btrfs native raid6? Or is it Btrfs on an mdadm or LVM based RAID 6? The former, Btrfs raid56, isn't production ready and comes with numerous expert level caveats. Using mdadm or LVM RAID 6 is stable. Btrfs requires quite a bit of familiarity if you run into problems, the normal sequence of recovery is different on Btrfs. So you need to be pretty familiar with this, have backups, and you might even hedge your bets and make either the first or second backup on a different file system, such as XFS which at least by default now checksums metadata.

And yes, if you were to create two backups by splitting the same member devices among them, that means if a drive dies, both backups are now degraded. It's not ideal. In my own case, I keep a totally segregated 3rd backup that's mostly offline/shelved except when it's being updated in which case either the primary or secondary is made offline. That way even in the face of user error (very common vector for data loss), one of the storage stacks should survive even my mistakes.

ADDITION: Also, I suggest reading this thread from linux-raid@ list which mainly applies to md based arrays, but is applicable for LVM, mdadm, and Btrfs arrays. The gist is to make certain that SCT ERC timeout is shorter than the kernel SCSI command timer. On consumer drives, SCT ERC is disabled or unsupported, and that means it's not easy to figure out what the drive timeout is for bad sectors. The estimate for the high end is 180 seconds, so that means for drives not supporting SCT ERC, the kernel command timer needs to be raised to 180 seconds for all block devices used in the array. If SCT ERC is disabled, then it needs to be enabled with a sane timeout value such as 70 deciseconds using 'smartctl -l scterc,70,70 <dev>' and that is per drive. Both of these are not persistent settings: SCT ERC resets if the drive itself is reset or powered off; and the kernel command timer goes back to default of 30 seconds if you reboot.

http://marc.info/?l=linux-raid&m=133665797115876&w=2