CL1-WIN22 and CL1-WIN30 issue with RAID

Stephen

US Operations
Staff member
We are having a similar issue to HVM4 with both drives on RAID having bad sectors in the same place. WIN22 is the only one being impacted now.

Both drives are still in proper RAID at the moment, and I hope to be able to repair a drives and move them ASAP to a new storage system.

Bad Sector accessed on port 5 target 1 at LBA 05b164b
Bad Sector accessed on port 6 target 1 at LBA 05b164b

these two events occur, and then WIN22 does a self reboot.

We are going to let WIN22 stay up as much as it can while I work on getting the bad sectors repaired on the mirrored drive.

Win30 is not impacted at this time. But is on the same RAID so will need to move to new storage as well. Should be minimal impact overall.
 
We are going to stop FTP on WIN22 so content change is as small as possible. The sector scan has found and corrected one bad sector as was marked in above report, and that is all so far, hoping no more widespread and that was just somehow a very important sector (it was about 11GB into the scan which would be most likely an area with some commonly accessed file/s)

If no more bad sectors are found the scan should complete in 1 hour or so, if it had more can sectors somewhere on the disk, it will take incrementally longer based on how many bad are found.
 
Well more bad sectors, there are vast portions of this hard drive unused however, so hoping for the best and that they will be minimal in impact once we go to move the drive, however it has slowed the scan substantially.
 
Win22 has stabilized a bit and up over 45 minutes now (good thing has been it has some very fast reboots)
 
Win22 is quite stable now, but we still need to work to move the drives. In about 8 hours we will do a maintenance to move them to a new node that is powered by Hyper-V 3.0 and has a lot of new features, type that actually allow us to make these moves with only a couple pings down, even live.
 
Win22 is stable, but for the first time since all this, Win30 just rebooted. I am keeping a close eye on it, and still plan the move as stated.
 
Win30 is not even loading up windows right now, this is not very good. Working to try to make that stable because the other is not quite ready yet, it is in a disk clone operation.
 
That is done, but still failing to boot up, goes right into a blue screen error. This does not look too good, the other drive is currently on 33% of its clone process.

At this time WIN30 is down while I evaluate the options. If I do much else, win22 will also be down and it is up and stable at the moment.

We've got good, valid backups, in worst case, but that route will likely take longer than moving the drive waiting on the clone operation that is going now.
 
CL1-MSSQL4 needs a quick maintenance to move its config files off the affected RAID array, then right back up.
 
MSSQL4 config moved now WIN22 will be stopped and going to attempt to run these off another node with a bit more flexible configs (Windows 2012 Hyper-V 3)
 
Win22 is up on the new node (still old disk was trying a couple tricks to get win30 up for stopgap) but win30, will not start now.
 
While I am fully confident we have win30 data intact and restorable from the live data, in order to ensure 100% that we have the server up ASAP I am going to start the restore process on the server now.
 
The HD clone is at 60%. Win30 restore is in process for if needed, it will be available.
 
WIn30 is UP now from about a 48 hour old backup, I have DISABLED FTP I don't intend to keep this version live, but I think 48 hours old is better than nothing.

I used a new method today to restore and I have to say it was very very fast for over 250Gb of data restored! I can't promise it will work so well every time, but it was quite speedy today!
 
Clone is done, win22 has successfully copied to its new home on a new RAID, we aren't live off that just yet as I am moving the old logging drive now as well.
 
That was not successful, it blue screened as well, starting win22 again from the non raid while I run a complete chkdsk on this volume.
 
We are just getting so many errors here, I think on win22, I am going to restore from backups like with win30, and we will then sync any file changes since the backup from the live drive.

(btw both servers are up)
 
Sorry for lack of updates, Win30 is running stable and on RAID protected storage, not quite in its final home but we'll move it quickly this weekend.
Win22 is still running non RAID but stable, and we are getting it setup to run properly on RAID by weekend switchover as well, it has synced to backup storage to reduce IO on the non raid drive, then we will sync to the live drive and update all contents before it goes live.

Stephen
 
Back
Top