As most readers know, a RAID system makes use of two or more hard disk drives to provide a very reliable network-attached storage system. In a sophisticated way, the system stores information redundantly across two or more drives. If any one drive were to fail, the system would be able to continue in its normal function using the remaining drive or drives. Importantly, no data would be lost. This article talks about how you might pick the hard drives that you would plug into your RAID system.
Our favorite RAID systems are the kind that distribute the saved data across four drives. If any one drive were to fail, the remaining three drives keep everything working with no loss of data. It is thus important to keep a spare drive or two around at all times. That way, if one drive were to fail, we could swap out the bad drive, put in a good drive, and the RAID system would rebuild itself automatically and we would once again have our redundancy and reliability.
One very nice thing about most RAID systems is that they keep an eye on the performance of the individual drives. This permits a bit of early warning that a particular drive is being a bit flaky. Not so bad that the drive has failed, but a warning that maybe it is going to fail some time soon.
Each individual drive in a RAID system will have its own MTBF (mean time before failure). With ordinary drives, the MTBF might be 3 or 5 years. But you can pay extra for a so-called “NAS” drive. It will have an MTBF of maybe 100 years.
A 4 terabyte drive might cost $111. An otherwise similar 4TB drive that calls itself NAS might cost $170.
When you are picking drives for your RAID system, the usual precaution is to pick drives made by different companies, or at least drives made a very different times. The idea is that if all of the drives had been made by the same company at the same time, there is the risk of some common failure mode that would make it more likely that two drives might fail at the same time. When I am ordering up such drives on Amazon I will order only one drive at a time and on separate days so that no two drives would be in the same box and could accidentally get dropped in transit in the same way.
The commonly followed model for device reliability is the so-called “bathtub curve“, shown above. The assumption is that early in the service life there might be some infant mortality. Then failures would diminish. After some time, failures would again be observed. Part of what you are paying for when you pay the higher price for a NAS drive is, perhaps, that the right end of the curve is much further in the future — 100 years rather than 5 years, perhaps. And maybe part of what you are paying for with a NAS drive is that some testing and screening happens after the drive is manufactured, to try to catch some of the infant mortality before the drive gets sent to the store for purchase by the customer.
What prompts today’s posting is some recent experience in our office with some NAS drives.
One of our RAID systems has been in service now for about fifteen years, and it has four drives. We try to track things pretty closely, and we find that two of the four drives have been in service for the life of the system. A third drive has needed replacement once in fifteen years, about five years ago. A fourth drive needed replacement about a year ago. We replaced it with a NAS drive.
When I say a drive “needed replacement” I mean that it started reporting disk errors, which prompted us to replace it. In this particular RAID system we have not had a drive fail completely.
Anyway what is a bit interesting and puzzling is that the drive that we plugged into the fourth pay a year ago, despite being a NAS drive that ought to have lived forever, is that it started being flaky just a couple of months later. So we pulled it out of service and swapped in a new NAS drive. After a few months it, too, started being flaky.
It seems sort of counter-intuitive that the NAS drive would fail much sooner than a cheapo non-NAS drive.
Of course one thing is the NAS drives may have a more sensitive error reporting mechanism. A very slight deviation in the NAS drive might get reported to the RAID system as a disk error, where a non-NAS drive maybe would not even count that slight deviation as a disk error at all.
These NAS drives have much longer warranties than non-NAS drives. Three or five years typically. Each of these failed NAS drives, I have sent in for warranty replacement and the manufacturer has replaced it without fuss.
Anyway, despite these two infant mortality experiences with the NAS drives, we plan to continue using NAS drives in our RAID systems going forward.