HVM13 Issue - Resolved

It seems to be rebooting itself 15-20 min after it started the VMs, it is up now but I am in checking details on it.
 
The error seems to be driver related, and I am doing some updates now, so this is an urgent reboot we are performing.
 
ok updates all applied.

At this point we are hoping the proper updates were set, and this issue is resolved, there were a couple driver related.

We will be monitoring it closely and if needed install the remaining drivers.
At this time all VMs have been up for a bit now.
 
I think I have found the issue here, it is with one Virtual Machine, it is hanging up and making the others hang, it seems to be something to do with the RAID array it is on, but all the others on it work fine, quite an odd manner, the only difference is this one has snapshots (something we highly recommend not using) and it is possible corrupt in some manner making the problems. First time to see anything like this.
 
Node is up and I have configured the one troublesome VPS to not start on reboot now, and rebooting the host node hopefully one last time here. I am going to work then to move that one to another RAID array and repair it if possible.
 
There are a handful of VPSes down as we took the troublesome RAID array offline since it was hanging all the others.
 
I have all but the troublesome VPS starting up on a no RAId config, we will need to plan an export/move time with each user for moving their VM to a new RAID array.

I do see what was causing all the fuss now, the one that had snapshots, is tying to merge the snapshots on its own now, so it was something scheduled on reboot. and this with some issue on the array, was making the hangs.
 
Due to the fact that this array is running no RAID and a merge is running (I didn't direct it to and would pause if possible), the VPSes are booting rather slow and will have slow DISK IO till it is done, it is at 10% right now, so should not be an incredibly long time.
 
We have a new RAID array building, it should be totally ready in a couple hours, in the meantime virtual machines are running on the temporary arrangement.

the one VPS is still merging and at about 50% so about 1 hour there may still be some slow IO on machines with this same storage.
 
Back
Top