One Node with issue- several servers rebooting

Stephen · Mar 24, 2014

We are checking a node as it was on a blue screen, and servers are rebooting now.
There are about 7 servers down, I will get list ASAP.

Stephen · Mar 24, 2014

All servers are back up for about 10 minutes now, except CL1-WINCF2 and it is in chkdsk and will be likley 10-20 minutes.

Stephen · Mar 24, 2014

The node is having some packetloss now, and checking into this.

Stephen · Mar 24, 2014

it looks like we could have a network card issue here.

Stephen · Mar 24, 2014

And with that, trying to load more info we have a tcpip.sys error come up, and will be checking more info on this ASAP, but working to get node live first.

Stephen · Mar 24, 2014

Have the node coming back up, and we are working on making a replacement node now as well. We had a new node for Linux VPS about to be live but not live yet, now working to bring this on live here as a Windows node and move these off this node to there. I intend to do this as live migrations, not offline and down server migrations.

Stephen · Mar 24, 2014

This is really frustrating, after up an hour packetloss returns to the server and having to reboot it again to resolve. This is not a good situation at the moment and working to get it resolved ASAP in every way possible.

Stephen · Mar 24, 2014

All servers up except Cluster1 Win22 and WINCF2 and both of them are in chkdsk.
We are working on updates and config of the new node to move servers over.

Stephen · Mar 24, 2014

WINCF2 is up, only win22 remains.

Stephen · Mar 24, 2014

Win22 is on stage 3 of chkdsk, CL2-WIN12 is having an issue on its network now seems it disabled itself. It could be related to the packetloss issue we'd seen on the node level but checking into it now.

Stephen · Mar 24, 2014

Win22 chkdsk completed and server up.

Stephen · Mar 24, 2014

We've surpassed an hour up without packetloss yet, since CL2-Win12 is down, and we continue to check into that server, others are all running properly at the moment and we hope to keep it this way.

Stephen · Mar 24, 2014

We continue to have no packetloss now since Win12 network is disabled and it is doing a handful of updates, they are almost complete and we will allow it on network, if it reboots again we will keep win12 offline and move it to a new node and run it only from there.

One Node with issue- several servers rebooting

Stephen

US Operations

Stephen

US Operations

Stephen

US Operations

Stephen

US Operations

Stephen

US Operations

Stephen

US Operations

Stephen

US Operations

Stephen

US Operations

Stephen

US Operations

Stephen

US Operations

Stephen

US Operations

Stephen

US Operations

Stephen

US Operations