What happened to Win1 today?

riley

Perch
I tried to check my sites at 11:00AM ET and got Server Not Found errors. Sites were back up by about 11:15AM ET. I looked in my logs and found some 500 Internal Server Error messages and some Out Of Memory Error messages from a little earlier. What's going on?

riley
 
Stephen said:
Win1? It is not even around anymore.

I realize that Win1 was "migrated" recently, but I'm not sure the total scope of "migrated". To the best of my knowledge, my sites are on Win1. When I use FTP to upload/download, the host is win1.jodoshared.com.

So I guess now I have 2 questions:
What server are my sites on?
What was going on with that server this morning?

riley
 
I have not been on duty. I am checking on the problems with the other admins. That out of memory error you metnioned is scaring me. However the patch MS phone support sent us seems to be working very well, looks like it might be time to apply it on win5 too. We will be sure to monitor it more closely.

Edit: and it is win5 you are on. ping win1 then ping win5 and it will be the same IP, we forced the win1 to have the same IP so it would continue to work as usual. :)
 
Stephen said:
I have not been on duty. I am checking on the problems with the other admins. That out of memory error you metnioned is scaring me. However the patch MS phone support sent us seems to be working very well, looks like it might be time to apply it on win5 too. We will be sure to monitor it more closely.

Edit: and it is win5 you are on. ping win1 then ping win5 and it will be the same IP, we forced the win1 to have the same IP so it would continue to work as usual. :)

Ok, Win5 it is.

When I noticed that my sites were down, I did a little more looking around. By the time I was ready to open ticket, my sites were up again, so I didn't bother.

This morning is the first time I've seen these Out Of Memory errors. Based on what I see in the log, the exception was returned while attempting to access the database. Other resources (images, etc.) were successfully accessed, but the aspx page ultimately returned an HTTP 500 error.

riley
 
It does look like IIS had to be restarted according to my monitoring software. It looks like it was just a few minutes at the max that it was down. The patch that was installed on win6 will be installed on win5 tonight and should prevent this from happening further.
 
Win5 did have a small problem today as indentified by Riley. IIS had stopped responding and required a restart.

I doubt there were any out-of-memory errors similar to the one on Win6 returned because there is no ticket regarding that. there are 500 errors but that did happen at the same time IIS stopped responding.

I'd like to stress that such problems can occur on production servers from time to time. It is impossible to maintain 100% uptime on a production windows server.
 
LegalAlien said:
so win 5 was having problems today? i noticed it went down for a few minutes around 2:40pm EST...

Not down down at 2:40pm, I just talked to the admins and the only outage was at 11:08am. Exact times:

***** Nagios 1.0 *****

Notification Type: PROBLEM

Service: HTTP win5
Host: win5
Address: win5
State: CRITICAL

Date/Time: Mon Jul 26 11:08:52 EDT 2004

Additional Info:

Connection refused by host

***** Nagios 1.0 *****

Notification Type: RECOVERY

Service: HTTP win5
Host: win5
Address: win5
State: OK

Date/Time: Mon Jul 26 11:11:52 EDT 2004

Additional Info:

HTTP ok: HTTP/1.1 200 OK - 0.010 second response time
 
Good Oyster said:
99.5% is hard enough to maintain! :D

No :) Win5 is doing well above 99.5%. It's at 99.8% according to our monitoring system. And this month is almost up.

Win6 had a few problems (which we resolved with the MS hot-fix) this month but uptime is still above 99.5%. Next month should be much better
 
Yash said:
Win5 did have a small problem today as indentified by Riley. IIS had stopped responding and required a restart.

I doubt there were any out-of-memory errors similar to the one on Win6 returned because there is no ticket regarding that. there are 500 errors but that did happen at the same time IIS stopped responding.

I'd like to stress that such problems can occur on production servers from time to time. It is impossible to maintain 100% uptime on a production windows server.

Yash,
I have not experienced a single problem since Win1 was migrated to Win5 (about 1 month now). It's been running great. I'm not complaining about downtime.

I don't know the nature of the problems that Win6 experienced and I'm not implying that today's Win5 problems are the same. I can only provide the following information based on what I see in my logs:

At 09:50:50 IIS returned an HTTP 500 error.

At 10:52:08 ASP.Net returned an "Exception of type System.OutOfMemoryException was thrown" message when trying to read a database. (HTTP status was 200 OK)

At 10:52:13 ASP.Net returned an "Exception of type System.OutOfMemoryException was thrown" message when trying to read a database. (HTTP status was 200 OK)

Based on your logs, IIS stopped responding shortly thereafter and was recycled.

As I stated before, I'm not complaining; the server has been running very well. I'm just providing this information in the hope that it helps you diagnose the incident.

riley
 
Yash said:
Not down down at 2:40pm, I just talked to the admins and the only outage was at 11:08am. Exact times:
...

Yash - I wasn't complaining, but I wasn't making it up either. I've been MORE than happy with the uptime, but something was definately wrong at 2.40pm - though for only a couple of minutes at most...
 
to follow on.. twice today I have got an 'Under Construction' page (the last time just a few minutes ago) - which is followed by the website being unavailable, then it reappears. I'm talking seconds here, obviously no major problem, but nevertheless, I hope this isn't happening often. Is this related to IIS having to be restarted or something? Obviously whatever the problem affects my websites since it will terminate any sessions that are active. Not good.
 
IIS on Win5 was started once. If you suspect frequent problem at your site, write to support to monitor your site.
 
Back
Top