Network switch issue - A few servers affected

tanmaya

APAC Operations
Staff member
We are seeing problem with one of our gig switches and some of the servers are not available on network. We are looking into it and will post an update soon.
 
It seems to be a packetloss issue, as servers are up and down, the link is stil passing a lot of traffic but not quite normal levels.
 
The logs were truncated enough to not let us pinpoint to the root cause. We are still suspecting a configuration to be a possible reason and will plan a maintenance before end of this month to do the necessary changes.
 
This has happened again to the same servers, the switch is up, it is not down at all, only a few ports are having trouble. We are checking and will have them up ASAP. I have the logs and other data now and will be correcting this issue if possible with the least amount of downtime in the next 3-5 minutes.
 
All are fine now, I captured the logs and we are looking into why this is happening on only a few ports.
This is the new switch that we put in to replace the switch that died, and this is a very odd issue to be happening.
 
ok we believe this is caused by a software fault, I will be on hand at the datacenter on Tuesday night and we will work on some changes during this time. I will post an announcement regarding this and it should be quite minimal in time.
I want to personally be there when these changes are made and am flying into Miami on Tuesday evening, I am going to give a few hours leeway in time and then have this on the schedule.
 
The issue has re-surfaced without any prior indication of the issue.
We are looking into it.
 
I think I am cursed :( It worked perfect the entire time I was in Miami after we made some software tweaks, and right now we are getting some hardware errors upon bootup of the swtich, I can see it booting on our management console.

I left Miami yesterday to attend a meeting here in TX and it all dies. We have staff on the way to the datacenter, our new staffer will be officially introduced in April as well :)

We have him going in to put in another spare switch, this problem is very rare but seems to be happening a bit recently for us, we are in contact with the switch manufacture regarding it already, and will be escalating this matter.
 
We have the switch up, we are going to continue with a replacement tonight in order to prevent this from happening further.
 
We are doing this switch change in an orderly manner and no more than 30 seconds per server should happen during the changeover.

I will probably not be giving each port update as we move it but will give status updates as we have servers up on new switch.
We will be starting in approx 20 minutes by my best guess now.
 
Back
Top