Web 10, Web 11 and Web 17 cluster 1, PgSQL 1 cluster 2 are down - Resolved

These servers were all on a large disk RAID10 array, last night it went degraded with one drive fault, and we swapped that drive. Upon swapping two others immediately went 'missing' in the RAID10 array. We decided to away the one put in rebuilt, which completed somewhere around midnight, and then to work on exporting the servers to another disk array for very quick re-import if any trouble. Well we had trouble before then and another disk dropped and the RAID entered a fault state. We have backups of all the servers and may have to move toward restoring, we are trying to do anything possible to get the RAID back now but will soon move into restoring if that fails.
 
We got it back up for a little while, but the same drive that went offline then in rebuilding, died yet again. Yes it is time to swap all the drives and build a new RAID and work on restore. The process is already going now.
 
Im trying to make new RAID array here with a new set of drives, and the RAID controller keeps having a kernel fault, I am in serious belief that the RAID card is at fault for all of this now which was suspect for all the drives going off on a simple hotswap in the first place.
 
The facebook page is linked here and all updates will be put here, I am too involved and busy with the hardware solutions and getting the servers restoring to answer facebook posts and messages. Any updates will be here publicly. We are looking at least 3 hours before any server is up and likely 12 hours for complete resolution. There is always possibility for more, as there are several servers at a time we are working on here today.
 
We have all the servers in process of restoring now, the web servers are priority services to be restored first. The SQL remote access port will also be restored ASAP, but the web servers are a bit up on the priority list over that even.
 
The redirection services for SQL external access are up and running again now.
The web servers are in restore and going well, Postgres SQL server is in progress now as well.
 
Web17 is having an issue in apache configs not reposting properly that is causing some problems for some users. This is what's being worked on web17 now.
 
Back
Top