Win18 - Urgent File system maintenance

Stephen

US Operations
Staff member
There is a problem on the file system of Win18 server, and when IIS hits a users file it makes the service stop, and sometimes even the server to restart with an error.

We are having to perform an emergency maintenance chkdsk on the server now.
 
Initially here does not look good, it is actually showing that partition has some issue on the MFT showing it as free space, which isn't a good thing...working to correct it.
 
This looks like it may take a little while. I am hoping we don't have to go into a full restore process, but it may be required. I will update in the next 30 minutes hopefully, but earliest looks to be 2 hours before up.
 
At this point I have broken the RAID array and doing some drive level checks. The result so far at not good in the same sector areas one has 15 bad sectors and one has so far 40 and counting. This is the cause for the issues and working on reallocating those to find some solution.
 
I am not sure how win18 was operating at all at this point, or the other servers using it occasionally, one drive is over 500 bad sectors and the other right at 178 but still moving upwards.

the benefits of RAID are that generally bad sectors don't impact them as long as they do not overlap, in this case, there are some overlapping.
 
421 bad sectors on the better drive now... over 1400 on the other. I am going to work on cloning the drive off to another ASAP. It is going better now and on 80% completed. Longest period with no issues in the scan.
 
CLoning is going now at 45% IF all goes well should be able to turn it on in 45 minutes or so
 
Success there, but chkdsk is having to run again now since the sectors cleaned/managed/moved/repaired...but is going much faster!
 
:( came up, ping up etc....then it all came BSOD crashing down. All the thoughts of progress there may be fading.
 
Moved the cloned drive to a different node and it is coming up, this will be non RAId but up is good and we can work on a proper migration (may as well go on to 2008 like we are doing others)

Installing latest drivers on it now, its up but getting some request time out due to the driver updates needed.
 
Current status: about 15% packetloss, but better than down.

Drivers installing and hopefully we'll make that packetloss go away as well, will need one more reboot for that.
 
Server is rebooting now and think we have the packetloss issue as well, it wasn't drivers it was setting that allowed it to have a dynamic MAC address made by host node, which was not the right setting, and was conflicting with win31 server due to this. We have corrected and it is set properly now so should not be seeing the conflict once the reboot is done for the drivers. (which were still needed)
 
It is on stage 3, nearing done! And it made several corrections I'd seen and been trying to get it to make on the other node, so all this is looking good.
 
It has loaded and been stable ping, and IIS for about 7 minutes now. Going to watch it close another 10 personally, then step out and have some dinner while techs watch it and ensure stability!
 
Back
Top