Recent hardware problems?

Logan

Perch
In the past year, JH has had problems with hard drives, RAID controllers, motherboards, and network cards. Where is JH getting this hardware? I have less problems with the $10 second-hand hardware I get from the store down the street. I hope JH has warranty and priority support these hardware vndors they buy from.

Care to comment Yash, Atul?
 
1) Never had an issue with hard disk
2) Has 2 issues with RAID controllers in 2 years
3) Had 1 issue with network card in 2 years
4) Had 2 issues with motherboard in 2 years

All our servers are band-new and have warranties on them. The standard of equipment we use is much higher than alot of hosting companies. For example, we run about 8 Dual Xeon Servers

Each of these Dual XEON servers are $3K to $3.5K a piece when fully loaded. They feature Dual XEON processes (2x the price of P4 HT processors), SCSI hard disks (each hard disk is 6x the price of a SATA one), an adaptec RAID controller, hot-swappable hard drive bays, dual power supplies and dual gigabit network cards. You won't find better hardware anywhere else

We've had 5 hardware failures in 2 years which is normal (or maybe slightly above normal). What amplifies the problem is that the OS can become unreadable or corrupt especially when it crashes during backup or heavy activity. That is a major issue... But these happen rarely and the times they do happen, we have fixed them and have been continually evolving plans to ensure we can fix them even faster the next time

Please remember that these hardware failures have happened on different servers affecting different customers. Most have had very minimal downtime. Only a few have resulted in extended downtime
 
The amazing thing is that the P4 HT SATA servers we use (we have a number of them, powering mssql servers, etc.) have not had a single hardware crash till date. It's only the Dual XEONs which have been troublesome
 
Yash, you should know that the most expensive h/w is not always the best h/w. So quoting the price of the h/w doesn't make me feel any better.

Your comment on the XEON's is interesting. It almost proves my point.
 
We've been dealing with these servers for a long time now.

We also have the tools to better diagnose any issues that can result from a hardware crash quickly and efficiently
 
Logan said:
Yash, you should know that the most expensive h/w is not always the best h/w. So quoting the price of the h/w doesn't make me feel any better.

Your comment on the XEON's is interesting. It almost proves my point.

Don't think its to do with the hardware. Even if a hardware piece crashes (and they do, I have worked as a trainee in a largish datacenter and used to witness a hard disk crash at least once in a week), it can be replaced

When a server shuts down unexpectantly, there have been times where I've tried to diagnose nasty blue screen and OS booting issues. They can be nightmarish and most of the times I've just reinstalled the damn thing.

It's good to see that JodoHost was able to get everything OK on Win6 (and according to an email I exchanged with Yash, they successfully booted the original Win6 server) quicker than what it has taken me at least on some occasions. I can imagine the problem being much worse with servers serving alot of sites and dealing with heavy activity

It's actually very funny that JodoHost has had these incidents so close to each other in a short time (and I wouldn't really blame them for hardware failures). I'd hate to be in Yash's shoes but I'm sure they are doing things right.
 
Logan said:
Where is JH getting this hardware? I have less problems with the $10 second-hand hardware I get from the store down the street.
You must be very lucky.. My home PC uses fairly standard components, and my past 3 harddisks have all failed within +/- 1 year's time (10 months, 13 months, 4 months). It's very tiresome :p

Considering the amount of hardware JodoHost have running, I would not think the amount of hardware failures JodoHost has experienced is anything out of the ordinary.
The vast majority of problems are caused by (handling of) software. There's room for improvement there, but often that will be outside JodoHost's direct influence and improving things is easier said than done.

Another thing is recovering from failures, both hardware and software. There was a problem with the Win5 recovery after a hardware failure, but the backup and restore procedure has been updated twice since then if I'm not miscounting.

I'd like to believe that restoring from a problem such as the one that ocurred on Win6 could be done faster. At least if I wanted to prepare for a situation such as this, each machine would have separate OS and data disks and identical hardware. If the server hardware fails, swap the harddisks to the standby machine. If the OS is corrupted as well, leave in the standby machine's OS disk and use tools to recreate webserver virtual hosts and IP addresses. If there's no hardware failure but only an OS failure, the standby system's OS disk could be inserted into the original system and resources could be recreated likewise.

If identical hardware on servers is not possible, each machine would need a backup OS disk of it's own, and resource recreation would have to be done even if the OS was not corrupted. Still, that could be done pretty quickly (assuming the IP recreation tool works :O)
 
SubSpace said:
I'd like to believe that restoring from a problem such as the one that ocurred on Win6 could be done faster. At least if I wanted to prepare for a situation such as this, each machine would have separate OS and data disks and identical hardware. If the server hardware fails, swap the harddisks to the standby machine. If the OS is corrupted as well, leave in the standby machine's OS disk and use tools to recreate webserver virtual hosts and IP addresses. If there's no hardware failure but only an OS failure, the standby system's OS disk could be inserted into the original system and resources could be recreated likewise.

All standby machines are identical.
It's not as easy as you state. When we were working on Win6, we had 3 simultaneous recovery plans:

1) Get original Win6 to work: After moving hard disks, we gave it a full scandisk and reinstall the system interrupt controller. It worked
2) Setup new Win6 using Disk Images: This was being performed on a standby server
3) Prepare totally new Win6 for a standard recovery: Here, the resources of the server would be rebuild and then files copied over. But this is the longest recovery proceduree due to the time it takes to copied 50GB of data back to its location.
 
Ron said:
...It's good to see that JodoHost was able to get everything OK on Win6 (and according to an email I exchanged with Yash, they successfully booted the original Win6 server) quicker than what it has taken me at least on some occasions. I can imagine the problem being much worse with servers serving alot of sites and dealing with heavy activity

It's actually very funny that JodoHost has had these incidents so close to each other in a short time (and I wouldn't really blame them for hardware failures). I'd hate to be in Yash's shoes but I'm sure they are doing things right.
Don't get me wrong. I was delighted to see that JH recovered from this recent failure in what I would consider record time considering the nature of the failure. I am not happy however, that the failure happened at all.

I don't find it funny at all that JH continues to have problems, being it h/w or or s/w. I am certain this won't be the last.
 
SubSpace said:
You must be very lucky.. My home PC uses fairly standard components, and my past 3 harddisks have all failed within +/- 1 year's time (10 months, 13 months, 4 months). It's very tiresome :p

Considering the amount of hardware JodoHost have running, I would not think the amount of hardware failures JodoHost has experienced is anything out of the ordinary.
The vast majority of problems are caused by (handling of) software. There's room for improvement there, but often that will be outside JodoHost's direct influence and improving things is easier said than done.

Another thing is recovering from failures, both hardware and software. There was a problem with the Win5 recovery after a hardware failure, but the backup and restore procedure has been updated twice since then if I'm not miscounting.
I don't consider myself lucky, nor do I consider all my friends lucky, nor my previous hosts (of which there have been many). None has experienced so many failures than JH has in the past year. Do I blame JH? Of course. I'm paying for service, and when it fails, it's the fault of the service provider. That's basic outsourcing concepts. JH has to their credit take responsibility for most if not all failures. My primary problem is that failures continue (stability is rarely reached), and my secondary problem is the recovery track record. The latter is improving.
 
We've always taken responsbility

But I'd like to point out that all our servers have been performing very well (baring just yesterday's Win6 incident).

Win5 has had trouble. We are putting an awful amount of effort to ensure the servers are up 100%... That will eventually show in the coming months, especially on Win5.
 
Back
Top