Win19 Maintenance for RAID HD problems tonight between 8pm and 10pm

Stephen

US Operations
Staff member
Win19 was built from the same batch of 8 SCSI Hard Drives as WinVPS3.

Tonight during backup this server had a drive failure that took it offline.
We are scheduling the the datacenter to be at the cage between 8pm and 10pm EST tonight Wednesday December 13th to replace this hard drive and perform a RAID rebuild.

We do not anticipate a problem but do wish to let you know of this maintenance occuring.

We are also ensuring we have the past weeks backups locked away and in very safe keeping.
 
Re: Win19 Maintenance for RAID HD problems

The hard drive that went offline is now showing as online. The server is trying to rebuild it. I am making this operation CEASE as there is no need to rebuild a failed hard drive and it is a large risk the other drives on the server.
 
Re: Win19 Maintenance for RAID HD problems

I have set this drive to OFFLINE now, so it continuing to go on and off will not casue the other drives continued high IO usage without reason.
The raid5 is in degraded status and we will be having a tech in the datacenter TONIGHT to replace this bad drive with a new one and rebuild.

Just as a note: When a drive is going on and off, it is best to not have the other drives attempt a rebuild as they have very high disk IO trying to sync data. When rebuilding and a drive continually is going on/off it makes the other drives be used more and also at risk of failing them. I am trying to actually prevent something like what happened with winvps3 going offline, coming back up for 30 minutes trying to rebuild a disk that went offline and came back on, and a 2nd HD failing in the meantime. We have learned many lessons from that situation, and putting them into use here now and taking steps to plan for prevention.
Backups have been stored away, should any issue happen we have backups from Wednesday morning of the entire RAID array, and could be performed after a windows installation we could pull the images across and restore, it would not involve all the permissions changes of restores in the paste in the best case.
 
Re: Win19 Maintenance for RAID HD problems

This maintenance will be starting in the next 30 minutes.
I want to note that tonight no backup will be done on Win19 UNTIL the raid is done rebuilding. The rebuild IO+IIS and ASP.NET is already very stressful on the servers hard drives. We will start the backup as soon as the RAID rebuild is complete which may take 4-5 hours.
 
Re: Win19 Maintenance for RAID HD problems

The RAID rebuild has been running since 10:10PM EST now, and going well it is at 30% at this moment. I expect it will be done rebuilding before the server hits the eastern time rush hour traffic, it should be done between 5:00-5:45am
 
Re: Win19 Maintenance for RAID HD problems

41% now on the rebuild it may not be done at exactly the time I estimated before, but it will be done before 8am when the server gets busy.

We are monitoring it closely and getting 0 errors so far on the rebuild process. I am leaving other admins to watch it while I go to get some rest, it is also setup in the raid controller to email me of any errors on an urgent basis.
 
Re: Win19 Maintenance for RAID HD problems

Win19 rebuild slowed dramatically after 2am and log analysis started. I have stopped hsphere services on win19 until the rebuild is complete. They will be restarted afterwards. Thank you for your understanding in this situation.
 
Re: Win19 Maintenance for RAID HD problems

RAID has been in a verifying state for some time now, but going good :)
 
Re: Win19 Maintenance for RAID HD problems

OK we are completed 100% Rebuilt and verified. Win19 Hpshere services being turned back on now.
 
Re: Win19 Maintenance for RAID HD problems

Maintenance fully completed, Win19 back to top condition.
 
Re: Win19 Maintenance for RAID HD problems

Wow, Win19 has had a 2nd hard drive fail now, only days after the first. I just got alerts from the RAID card of this failure.
We have another batch of drives now, as the batch that was originally in Win19 and WinVPS3(order of 8x146GB), was obviously very very bad in quality. WinVPS3 has already had replacements of a different brand, model, etc with near the same capacity(I think 20MB less total in size) when we rebuilt.
Win19 will have this second drive replaced ASAP I will try to do this again in the low time of day traffic wise so that we do not have slowdowns on the server.
 
Re: Win19 Maintenance for RAID HD problems

The Array is again rebuilding, we do not expect any major problems during this time, but wish to make you aware of the fact that the rebuild is happening.
 
Re: Win19 Maintenance for RAID HD problems

68% done with the rebuild, moving right along.
 
Back
Top