We have got a total mess with power now, electricians heading back - RESOLVED

Stephen

US Operations
Staff member
Electricians are heading back here, I don't know if they did something just before leaving or not as I had left the site.

However 4 breakers have tripped and we are working to resolve ASAP This has a LOT down.
 
Not 100% is down, but a LOT is down, due to this, our routing and network is on two feeds, but one is out and the servers on that other are out.
 
This problem is not good at all, the power subpanel where one breaker blew this morning is out now, I am quite sure it was caused by a fault up the chain. This is after the UPS so UPS is not helpful.
 
good progress but a couple need to go down will detail all when able in 30-45 min
 
We are having to shut some servers down and move them, I will explain all when done moving, there is still a significant amount down due to this, but much more up now.
 
We have all our techs here now working hands on with servers. We are moving as fast as possible here, and others waiting for us to get some up.
 
the final servers are coming up then confirmations on all the others will start as we've already seen there are some problems!
 
will give a short summary, last nights blown breaker, and the switch to the other that was a good thing to have, was actually an issue as it was on the wrong phase inside the panel as wired by the electrician when setup. This caused an overage on the main ups breaker to the subpanel, that made ALL the cirucits on the subpanel blow. We had it up before seeing this issue and had to turn each server off on the effected circuit as quickly as possible before all blew again.

We will update more, but have now moved to another phase as it was supposed to be from the start. We have the electrician still here on site as we do this for if any issue occurs.
 
Through this now it looks like it may have taken out a switch or two as well, as they show to be configured properly, but simply not working to pass traffic for subnets.
 
mostly we are back up but a handful of windows servers not, and we've got web1 cl2 RAId out, it is a RAID1 that is affected so we'll be able to make this work somehow, working on it.
 
we have 3 nodes with issues now one is just having this cl2-web1, one is having 2 windows servers and another is having 4 servers and wer'e working to get up ASAP

cl2-mail4 had fsck that went wild on reboot and looks to have managled some configs, we are working on it as well

servers still with some issue are cl2-web1, cl1-win19 and win21, and cl1-win26/27/28/29
 
Web1 is up, win19 and 21 are up now as well, working on some slownes on them due to log drive missing, which was one of the issues we decided to bring up without logs and fix it once live.
 
Log issue fixed. And I will be working to release a document about what happened, how, why, and what WAS ALREADY DONE to resolve it.

I will work to get it out by late tomorrow night, I've got to get out of here and relax a moment today, high stress would be putting it....mildly.
 
Restoration of Mail4 server still going on as it has much data (in GB's) which it is restoring from the backup server. Once it'll be done we'll update here.
 
I am closing out this thread and a new one for MAIL4 CL2 only will be made as it is the only one currently with issues.
 
Back
Top