My website has been down for 3 days. And no, I’m none too happy about it. I can’t refer clients to it and I can’t generate any new search engine traffic. Heck, I had a potential client email me to send her my demos because my site was down. How embarrassing! I wonder how many other clients had the same experience as her, but didn’t have the interest in emailing me for my demos. How much business did I lose in the past 3 days?
Today I get a note from my ISP explainging the outtage. It goes on an on about technical things I don’t care one wink about and seems to pass the blame to some other company. I only care about getting my site up and running, not why it isn’t running or who’s to blame. Not to mention, it gives no word on financial remuneration to keep me as a loyal customer. In other words, it’s great that you’re sorry, but why the heck should I stay with you instead of switching over to GoDaddy? Oh, and I noticed that your website hasn’t been down. Isn’t that convenient?
So what can we learn from this? As service oriented people, we always need to be thinking and talking in terms of the customer’s interest. This is one of the principles of the Dale Carnegie program (w00t to fellow alums!). Do you have a problem in your signal chain causing noise? Bummer, but your client doesn’t care. Late delivery? Client doesn’t care that your dog ate 3 boxes of Junior mints and you had to rush him to the vet.
The client cares about himself and we need to understand that. We need to offer exceptional service for mistakes and other things that may delay the customer from achieving their goals, even if the fault is not our own. No one can avoid problems or mistakes that affect others, but we need to know how to handle them to keep our customers happy.
Dear Mr. Kafer,
The web server that your site is hosted on has been offline due to some hardware failures in the RAID setup.
RAID stands for “Redundant Array of Independent Disks” and is a technology that employs the simultaneous use of two or more hard disk drives to achieve greater levels of reliability and performance.
Your website is stored across the RAID system twice over different hard drives, if one of the hard drives fails your web site will continue to run. The failed hard drive is replaced and the data that was on the drive copied again from the other drives within the RAID, this is known as rebuilding the RAID, and normally happens seamlessly without any effect to the web hosting server or your website. This is a daily task performed in our data centers and is standard for large data storage systems such as used in the web hosting environment.
In this instance, we replaced the failed drive with a new drive and the RAID started to rebuild. While this was happening the rebuild process failed, corrupting all the data within the RAID set. This should not happen and we have open tickets with the RAID manufacturer to understand what went wrong in this case and to ensure that they can prevent this for the future.
Our system administrators do not rely on the RAID system as our only source of backup. We run a rolling backup of the live system to external backup servers to ensure that in a case like this we have a restore solution.
After the RAID corruption occurred, our engineers analyzed the situation and found that the only solution left to us was to recover the data from our backup systems. At this point the RAID was reinitialized ready to receive data, this process itself takes several hours to perform.
Currently we are copying and restoring the data from our backup systems to the web hosting server that your site runs from. The restore process takes time and is expected to finish early tomorrow morning. When the data is restored to the server we will then turn on the services that deliver your website to the Internet. A small amount of data loss may occur if you uploaded new files to your web space between the time that the backup was made and the failure occurred.
Since the system problems began we have had a dedicated team of administrators working around the clock to monitor the copy of data from our backups and to ensure that all settings are restored so that your website will run again.
We apologize for any inconvenience and thank you for your patience. We will update you again as soon as there is additional information available.
1&1 Internet Inc.