Hardware Woes

The date was February 2nd, 2018. I had noticed that one of Kelly’s favorite Chuck Norris movies was curiously truncated to a little over five minutes. No problem. I have been slowly updating our collection from .avi to .mkv anyhow. I put the DVD into my trusty workstation and a few minutes later I had Chuck Norris in better quality than ever before.

All that remained was to copy it to our house’s network storage and we would be able to enjoy Chuck from any computer in the house. But something curious happened. My file wouldn’t copy. In fact, my network storage device was curiously unresponsive to any input. Finally I rebooted it, to find that it contained no files. Since it has previously housed over a terabyte of music, movies, and documents, that seemed potentially troubling.

It did, however, remind me of something. I had set up our network storage with RAID 1, meaning that there are two hard drives that are identical mirrors of each other. If one fails, I can take it out and put in a new one that will become another identical copy. It’s relatively self-explanatory. If you have two copies of the data then as long as both hard drives don’t fail at the same time you will never lose your data. There are a few complications, but that covers the basics.

When I had logged into my network storage dashboard before the Chuck Norris incident, I had noted that my RAID array was labeled as “Degraded”. I ran health checks on the drives, but nothing appeared amiss, so I eventually learned to live with that label. It wasn’t until after my data mysteriously disappeared that I found that “Degraded” in this case meant that, for reasons I still don’t understand, the second hard drive had been marked as a spare instead of becoming a copy. That meant I had been using only one drive, which had failed, while the other sat idly with nothing on it.

Not to despair. I had my data backed up on Amazon Glacier. The only problem is that restoring a terabyte of data is either very slow or very expensive. Instead, I took my workstation offline, hooked up the broken hard drive and launched dd_rescue to clone it to the unused drive. After a few days it became apparent that it would take over two months to recover the whole drive.

Fortunately, I was able to install a newer, faster, version of dd_rescue, which cut the recovery time to a month. In the meantime I ordered new four terabyte drives, specifically built for network storage. However, there was no way to make a four terabyte drive mirror a two terabyte one. I was finally forced to set up a new RAID array with the new drives, then attach the old hard drive to my workstation and copy all the files over the network using rsync.

The day that completed, my household server’s database crashed and refused to start. The website you’re currently reading, along with my Nextcloud instance, Amarok music database, and regular backups stopped. I pulled the flash memory from my server and found that it was corrupted.

By now I was the expert. I fired up dd_rescue and copied the tiny sixteen gigabyte hard drive. Then I swapped in a new flash memory card. That was when I found out that my backups were two months out of date. Fortunately, I was able to mount the disk image from dd_rescue and get newer data, or you would have lost the many exciting developments on this site since December.

Now, finally, I think everything is right in my digital world. Does anyone want a two terabyte hard drive? It’s barely used.

Leave a Comment

Your email address will not be published. Required fields are marked *