On Friday Feb 1, 2013 about 12:15PM we suffered a major power outage in a significant portion of the datacenter suite. Upon arrival I checked the basic matters with our breaker subpanel, and also our breakers feeding from the UPS unit. 4 circuits were still live with power, these circuits were a direct power feed from the UPS itself, it was determined the rest were down due to our subpanel in suite being down. From here we immediately called the electricians as they had just been onsite working a blown breaker from about 12 hours prior, and within an hour of leaving all on the same subpanel they had worked was down.
After the electrician arrived (not the one that had done the work prior in the day), we determined that the master set of breakers for the 3 phase power feeding the subpanel was the culprit, and enabled them. Upon doing so power started being restored as the power distribution units (PDU) in our space powered up the servers. This started some items live, but as the electrician was watching each individual feed of the 3 phases he noticed one was going well over the allowed amperage limits. We had to respond quickly to servers already live/coming up by shutting down an entire circuit worth of power, about 15 servers as it turned out to be, a couple of these nameservers. We did this as quickly as possible and the electrician moved the circuit to another phase of power that was hardly loaded at all. Upon this move we re-powered all server and brought all services back online.
The cause of this was when we added 2 additional circuits about 6 months back for expansion, the electrician that did the install had put more circuits on one of the phases than it could support adequately, even in an under load situation on each circuit. This outage was triggered by the breaker malfunction early in the day, and the move of power off of the breaker to the this circuit on the phase that was overused was holding production level loads of service. While having the 'extra circuit' seemed wise and good at the time to restore power quickly, it turned out to be quite the problem as it was wired to the wrong phase of power in the subpanel.
What have we done to resolve the problem?
1. We have taken precautions with the electrician on-site to install blank breakers to fill the space on the phases of power that are used, so another cannot even make a mistake in the future to fill the wrong location.
2. We have labeled each breaker not only like prior with a number and ID like before, but phase it is installed on, for easy quick reference.
3. The load on the phase was reduced to acceptable levels and rebalanced across other phases of power.
4. While I am not an electrician I have learned a lot through this and will ask questions and ensure a complete understanding of the process when future power installs are ordered, and will attempt to have the original electrician that did the installation of the subpanel to complete them all even if we have to wait for him to do the work. (He is the one that was onsite and helping me during the problems, all of the electricians we use are part of the same company)
Note that all of this happened in the segments of power after the UPS units, generators, and utility, so none it was able to protect in this case.
I've used my phone to draw up a little diagram similar to what has happened here. It is a bit lacking in 'details' but gives and overview, the unlabeled little box between the generators and the utility power feeds is an Auto Transfer Switch, there are actually multiples of some of these items but I've simplified it down to just one to make it understandable.
The 4 squares on the bottom left are the 4 racks that were working, as they had a feed directly from the UPS room to their racks, whereas those going via the subpanel (on the bottom right) were without power due to the master breakers on the UPS tripping due to one of the phases being over amperage limits. The circle with the Fail is the single phase that was over, being fed to the subpanel.
Just as a FYI for those curious, yes there are 4 wires going to the subpanel, as one is a neutral, since we are using them as single phases it is needed to run the wiring from 3 phase to single phase on the floor.
After the electrician arrived (not the one that had done the work prior in the day), we determined that the master set of breakers for the 3 phase power feeding the subpanel was the culprit, and enabled them. Upon doing so power started being restored as the power distribution units (PDU) in our space powered up the servers. This started some items live, but as the electrician was watching each individual feed of the 3 phases he noticed one was going well over the allowed amperage limits. We had to respond quickly to servers already live/coming up by shutting down an entire circuit worth of power, about 15 servers as it turned out to be, a couple of these nameservers. We did this as quickly as possible and the electrician moved the circuit to another phase of power that was hardly loaded at all. Upon this move we re-powered all server and brought all services back online.
The cause of this was when we added 2 additional circuits about 6 months back for expansion, the electrician that did the install had put more circuits on one of the phases than it could support adequately, even in an under load situation on each circuit. This outage was triggered by the breaker malfunction early in the day, and the move of power off of the breaker to the this circuit on the phase that was overused was holding production level loads of service. While having the 'extra circuit' seemed wise and good at the time to restore power quickly, it turned out to be quite the problem as it was wired to the wrong phase of power in the subpanel.
What have we done to resolve the problem?
1. We have taken precautions with the electrician on-site to install blank breakers to fill the space on the phases of power that are used, so another cannot even make a mistake in the future to fill the wrong location.
2. We have labeled each breaker not only like prior with a number and ID like before, but phase it is installed on, for easy quick reference.
3. The load on the phase was reduced to acceptable levels and rebalanced across other phases of power.
4. While I am not an electrician I have learned a lot through this and will ask questions and ensure a complete understanding of the process when future power installs are ordered, and will attempt to have the original electrician that did the installation of the subpanel to complete them all even if we have to wait for him to do the work. (He is the one that was onsite and helping me during the problems, all of the electricians we use are part of the same company)
Note that all of this happened in the segments of power after the UPS units, generators, and utility, so none it was able to protect in this case.
I've used my phone to draw up a little diagram similar to what has happened here. It is a bit lacking in 'details' but gives and overview, the unlabeled little box between the generators and the utility power feeds is an Auto Transfer Switch, there are actually multiples of some of these items but I've simplified it down to just one to make it understandable.
The 4 squares on the bottom left are the 4 racks that were working, as they had a feed directly from the UPS room to their racks, whereas those going via the subpanel (on the bottom right) were without power due to the master breakers on the UPS tripping due to one of the phases being over amperage limits. The circle with the Fail is the single phase that was over, being fed to the subpanel.
Just as a FYI for those curious, yes there are 4 wires going to the subpanel, as one is a neutral, since we are using them as single phases it is needed to run the wiring from 3 phase to single phase on the floor.