Update: I am pleased to be able to report that after about six hours, the server finished the “repair”. The new drive is now a full member of the RAID-1 array. The server has carried out the first of what will be a series of periodic SMART tests on the drive and it passed the SMART test. All is once again well with the server.
I was minding my own business when I heard a faint beeping. It turned out to be one of our Synology file servers. I could see on the blinky lights on the front of the server that the “status” light had turned from green to amber, and the light for “disk 1” had turned from green to amber. What did I do next?
What I did next, of course, was to click around so that I could log in at the web-based administrative interface for this server. And what I saw, to my dismay, was that disk 1 had crashed.
This server had been set up as “RAID 1”, meaning that its two drives had been set up to be mirrors of each other. Each item of data that got stored would be stored identically on both drives. And what had happened, unfortunately, was that disk 1 had crashed.
By clicking around in the administrative interface, I was able to see that drive 1 was a 4 terabyte drive. So I looked around among my spare equipment to see if I had somehow managed to plan ahead to bank a spare 4TB hard drive. I was pleasantly surprised to see that the answer was “yes”.
So my next step was to set up a workbench. My plan was to carry out what might be termed open-heart surgery on the server. The first step was to open up the case of the file server, while it was powered up. There in front of me were the two hard drives.
The next step was to return to the administrative interface to make sure that I knew what the serial number was of the drive that had crashed. According to the administrative interface, the serial number of the failed drive ended with “AANRS”. I looked at the two physical drives that were in front of me, and I saw which drive had that serial number. As good luck would have it, this was the easiest drive to reach.
The next thing that I wondered was whether this particular server would do a “hot swap”. The idea of a hot swap is that if you get lucky, you can physically remove the drive that has crashed, and physically insert a new drive, and the hardware and the operating system will handle the changes seamlessly. I actually did not know whether this particular server and its operating system would do a hot swap. But I figured I might as well give it a try. So I unscrewed the four mounting screws for the crashed drive, and pulled it out from the chassis of the server. I then sort of crudely pushed the replacement drive into place in the chassis of the server. I replaced the four mounting screws into the new drive. Then the moment of truth — I went into the administrative interface and clicked around in the “storage manager” screens. The first bit of encouraging news was that the operating system had indeed successfully recognized the presence in the “drive 1” physical position of a new Western Digital drive with 4TB of storage capacity.
I then clicked around a bit and it became clear that the next step was for me to ask the operating system to do a “repair”. This requires first that the system carry out a “format” of the new drive. Next, the system needed to ask the new drive to make nice and agree to be a drive letter in the operating system. Eventually with the passage of some time, the system seemed fairly stable. The repair is still going on. But at least we do have access to the drive that is still functioning.
“I had somehow managed to plan ahead to bank a spare 4TB hard drive.”
Knowing you, this is not surprising in the least.
Hope it makes a full recovery.