Site has been down all day

Dave

Perch
I am very angry with the ticket support people. I was told moving to a Win2000 server would resolve my longstanding problem of constantly losing sessions. Yesterday, I gave very specific instructions that I wanted the move to take place at March 9 at 7:00 pm Pacific (because they were slow in responding to my ticket when they said they could do it easily last night). I wanted to make sure I was online and ready to work on any problems.

I recieved this confirmation at 11:29 pm yesterday:

Created: Mar 9, 2004 12:27:58 AM
Last Mod: Mar 9, 2004 1:59:38 AM

Assigned To: Customer Support(JodoHost)
[Mar 9, 2004 2:28:52 AM]

A: Hello,

Sorry for the delay. We'll be doing that on 7:00 pm March 9 (Pacific) then. We'll update you as soon as it is done. Regards

I got calls at work, and e-mails at home telling me my site couldn't be found. When I got home from work my site was still down, and I had this e-mail from support which came in at 10:28 this morning (7.5 hours before the move was to take place):

Your support request was answered:
Created: Mar 9, 2004 12:27:58 AM
Last Mod: Mar 9, 2004 1:59:38 AM

Assigned To: Customer Support(JodoHost)

[Mar 9, 2004 1:28:12 PM]

A: Hello,

We have successfully shifted the account voices account 21558 to win5. Please wait till it gets resolved to win5. You can check it from controlpanel.

Regards.

So I put in a ticket right away when I get home, telling support my site has been down all day. AND THIS IS MY REPLY!!!!

Your support request was answered:
Created: Mar 9, 2004 8:42:56 PM
Last Mod: Mar 9, 2004 8:53:16 PM

Assigned To: Customer Support(JodoHost)

[Mar 9, 2004 8:56:05 PM]

A: Hello,

Your both website is running fine on win5.

greatmojo.com
voices-for-change.jodoshared.com

Please check it from
http://www.amegaproxy.com

Regards.

What the hell has amegaproxy.com got to do with my site? Do I tell my visitors my site isn't actually unavailable, just go to amegaproxy.com and find out???

When I go to www.voices-for-change.com I get UNDER CONSTRUCTION. When I go to www.voices-for-change.com/index.asp (my home page) I get PAGE CANNOT BE FOUND.

I guess support thinks I'm too stupid to know the difference between a working site and one that isn't!
 
My site works if I go to www.voices-for-change.jodoshared.com but not when I go to the actual address I use, and all my users use, www.voices-for-change.com . I don't know what the difference is. I knew when the ticket support person told me "The move can be made at anytime you feel comfortable. The move will be completely transparent to you and there will be no downtime as such" that there was going to be problems. I've heard that line before. Which is why I wanted to be online when it took place. Both statements above by the support person were completely untrue.

 
Dave, both sites are working fine from here. Propogation is my least favourite word of the month... you have my sympathy

I like the design by the way ;)
 
After being down for 9+ hours, everything seems to be working again. I'm sure this much downtime would have been avoided if the site was moved when everyone agreed it would be moved.

I am now waiting for a response on my ticket informing me what the problem and fix was.
 
I wonder about that statement too..
The only way Jodohost can affect DNS entry caching is to reduce the TTL on the DNS entry beforehand.
The way the move was performed there is no way it can be completely transparent from an end-user point of view.

It's technically possible do to an almost completely transparent shift (a few minutes downtime max provided there are no incompatibilites in server configuration), but it's not really possible to do so in H-Sphere without an enormous amount of work.

Would be a nice feature for a control panel actually, to do that (mostly) automated :)
 
Dave said:
After being down for 9+ hours, everything seems to be working again. I'm sure this much downtime would have been avoided if the site was moved when everyone agreed it would be moved.

Wouldn't have made any difference, due to the nature of the problem.
 
LegalAlien said:
Dave, both sites are working fine from here. Propogation is my least favourite word of the month... you have my sympathy

I like the design by the way ;)
Thanks, LA.

If there was going to be a propogation issue, I should not have been told transparent/no downtime. I don't know if that was the issue or not. Propagation usually takes 48 hours if I remember correctly. If that was going to be an issue, my old site should not have been taken offline until propagation was completed. That's how it was done before.
 
I don't really understand very much about dns configuration, but can anyone tell me why the TTL seems to be set by default at a high level (86440 or something I seem to remember) - and why do these settings in Hsphere mean anything if your domain registrar is elsewhere? ie Aren't these settings controlled by your registrar?

If I'm really showing my ignorance, then just ignore me...
 
SubSpace said:
Wouldn't have made any difference, due to the nature of the problem.
I'm pretty sure we were told the move from Win1 to Win5 was supposed to be transparent, with little down time, before it was halted. The original plan was to move everyone off. I don't know how many customers were originally on that server, but I can't see anyone expecting that they would all be down for a large chunk of the day.

If the nature of the problem is such that all this downtime was to be expected, I should not have been given false information on which to make my decision to move.
 
Dave, I am very sorry about this...
This was not an issue about DNS propogation. The migration script doesn't correct the DNS records of domain aliases. (this is a bug).

That means you have to recreate your domain aliases from the control panel (takes a minute). I'm very sorry this information was not provided to you.. we recreated it ourselves. Infact our administrator should have recreated it for you in the first place
 
Yash said this was not a DNS propogation issue, but while we're talking about DNS propogation I thought I'd mention this problem again.

Using dnsreport.com I am still seeing this warning on some of my domains:

WARNING: Your SOA REFRESH interval is : 10800 seconds. This seems a bit high. You should consider decreasing this value to about 3600-7200 seconds. RFC1912 2.2 recommends a value between 1200 to 43200 seconds (20 minutes to 12 hours; 12 hours seems very high to us), although some registrars may limit you to 10000 seconds or higher, and if you are using DNS NOTIFY the refresh value is not as important (RIPE recommend 86400 seconds if using DNS NOTIFY). This value determines how often secondary/slave nameservers check with the master for updates. A value that is too high will cause DNS changes to be in limbo for a long time.
And this one:

Warning: Your NS records at your authoritative DNS servers have TTLs that do not match what the parent servers report:

ns2.jodoshared.com. [TTL 172800 at parent; 86400 at 66.36.230.189]
ns1.jodoshared.com. [TTL 172800 at parent; 86400 at 66.36.230.189]

In some cases, this can cause some serious problems. For example, if the parent servers have a 172800 second TTL (48 hours), and your authoritative DNS servers report a TTL of 3600 seconds (1 hour), you are saying that the parent DNS servers do not have the correct information. But, after 1 hour your DNS records may time out. At that point a DNS resolver will need to get fresh NS records. This can cause a serious problem in some cases.
I have seen DNS propogation take a week before here at JodoHost. I have never seen that with any other host. So I'm wondering why this DNS setting has not been corrected yet.
 
brawney, both these values are nothing to be concerned about. Propogation has NOTHING at ALL to do with them...

Propogation means the time it takes for an updated set of name servers for your domain to be updated all around the world...

Our name servers simply server the person of converting a domain name into an IP when a request reaches it.

The SOA refresh value is an internal value. The Slave DNS server periodically updates itself from the Master DNS server. The delay between updates is defined by this variable. It has no bearing at all on propogation.. HSphere updates both name servers anyway during a major change

As for TTL, I have to check this out. From my understanding, this isn't any problem either..

I've seen sites propogate in 24 to 36 hours for new customers.. it might take a week if you are switching from another provider because many times the older name servers linger on for sometime
 
LegalAlien said:
I don't really understand very much about dns configuration, but can anyone tell me why the TTL seems to be set by default at a high level (86440 or something I seem to remember) - and why do these settings in Hsphere mean anything if your domain registrar is elsewhere? ie Aren't these settings controlled by your registrar?

The only thing controlled through your registrar is what nameservers are used.
Basically, when someone tries to visit your page, the following happens (slightly simplified):

Visitor's computer contacts his ISP's nameserver (A), requesting the IP for www.somedomain.com.

The nameserver at the ISP checks if this information is available in it's cache (more on this later), if it's not, it will contact a root server (B), forwarding the query.

The root server will tell the ISP's nameserver (A) that it doesn't know the answer to the query, but it will give the address for the nameserver that does know, ns1.jodoshared.com (C). This knowledge it gets from your domain registrar.

The visitor's ISP's nameserver (A) contacts ns1.jodoshared.com and repeats the query. The response is something like this:
www.somedomain.com
internet address = 64.156.223.130
ttl = 86400 (1 day)


Then 2 things happen. First of all, the ISP's nameserver (A)can finally answer the query it received. Secondly, it will store the hostname and IP address pair in it's cache and will keep it there for 86400 seconds, or 1 day.

So, no matter what happens to the JodoHost nameserver after that, for any user of that ISP the query will give the same response for the next 24 hours without first contacting the authorative nameserver (C).

The irony about this is that because of this, frequent visitors to your site are likely to experience longer apparent downtime than first-time visitors.

The only way to prevent this is to lower the TTL on the authorative nameserver 24 hours before the actual shift. Instead of staying in the ISP's cache for 24 hours, it would stay there a much shorter time, increasing the number of requests on the authorative nameserver, but minimizing the downtime for all visitors.

It's theoretically possible to do decrease the TTL but minimize the impact on the nameserver load by dynamically adjusting the record's TTL over the last 24 hours before the shift. If I ever get around to making my own control panel, I'll be sure to automate that process and make almost completely transparent shifts possible :p
 
Thank you for the explanation SubSpace, our unix administrator told me the same thing (I know very little about dns)

I'll pass this on to PSOFT.
 
I understand how DNS works. I don't understand some of this info reported by dnsreport.com. But I do know that the tool you recommend (dnsreport.com) is telling me some of your settings are questionable.

Why is the TTL set to 48 hours (172800 seconds) on the "parent"?
 
Subspace - you are the first person that has ever made any sense to me in this matter. Thank you.

Yash - learning that the TTL can make such a difference in limiting downtown when moving a website, is it possible for me to change the TTL on the 'A' records on a particular domain to prepare for a move? If not, can we request this from you? If I understand this right, you could bring the TTL down to say 600 and in theory, a website need only be down for 10 minutes, right?

Sounds too good to be true to me.. ;)
 
LegalAlien said:
Subspace - you are the first person that has ever made any sense to me in this matter. Thank you.

Yash - learning that the TTL can make such a difference in limiting downtown when moving a website, is it possible for me to change the TTL on the 'A' records on a particular domain to prepare for a move? If not, can we request this from you? If I understand this right, you could bring the TTL down to say 600 and in theory, a website need only be down for 10 minutes, right?

Sounds too good to be true to me.. ;)
That is the theory. I assume this is what those dynamic DNS host setups do.
 
Yash said:
Dave, I am very sorry about this...
This was not an issue about DNS propogation. The migration script doesn't correct the DNS records of domain aliases. (this is a bug).

That means you have to recreate your domain aliases from the control panel (takes a minute). I'm very sorry this information was not provided to you.. we recreated it ourselves. Infact our administrator should have recreated it for you in the first place
Thanks for the response and explanation, Yash.

Twice I asked support, via my open ticket, to explain what the problem and fix was once the site was working again. The first time I got a response about how Directory Indexes work (which didn't even explain why my index.asp page was not found when it was second on the list). I suppose I could see how the person may have made the mistake about what I was asking because I brought up the fact that I changed the Directory Index order in my reply. I just assumed they would know that I was talking about why my site was down 9 or 10 hours (or whatever length it was). But then I specifically asked a second time to explain the problem and solution. I never did receive a response.
 
Back
Top