frustration from coding / acceptable % of failure?

Stephen

US Operations
Staff member
I had a request to post this for someone:



hi all,

Please forgive the confused title, still I am not sure what to call this one.

let me start...
I pride myself in being a thorough person, especially when it comes to work. Whether the customer pays for it or not, I will always try give them the most efficient site I can give them.

BUT, it has happened to me recently on 2 large systems (unrelated), one an e-commerce site and the the other an order management system, that my work hasn't been 100% full-proof, and that after hours of work, I still cannot find the problem. It all worked every time I tried, on multiple browsers, time after time. But, there was no mistaking - some users were (in the 1st case-) definitely or (in the 2nd case) probably, encountering issues with the system.

I won't go into what the technical issues are, but my question/request for advice, is, whether a system is supposed to work 100% of the time. Is it acceptable for it to ever not work, on a platform, or should I be assuring 100% uptime (as far as i can) of the system - with any possible platform combination a user may be using, and all network/system setups?

It was frustrating for me to have to shrug my shoulders and say to myself (and the client) that 'stuff happens, and there just isn't anything we can do'.
One client employee told me "so how come I never see any mistakes or downtime or crashes on any of the major system on the web?" I answered that Just like I havent actually seen any problem with our system, but there evidently is an issue, so you don't see it on their big site, but it doesnt mean it isn't there. I also mentioned that we can never really know what kind of setup people are on - hardware/software, so there's no way we can test.
But I wasn't convinced altogether of my answer.

So what do you guys think?

> should we be assuring 100% technical uptime (ie not taking into account ISP and hosting downtime)?
> is there an industry accepted rate of failed processes?
and
> is it at all possible to have a 100% technically workable system

all comments welcome!
 
Now I will offer my answer, just becasue THEY don't see sites down does not mean it does not happen. Here in the last month google.com has had numerous outages of 15 minutes or more, and they have clusters of redundancy that is FAR beyond the n+1 standard.

MSN.com and MSN Web search were down for almost 4 hours at one point in December(I believe) last year.

Walmart.com has had many outages, again they are using some huge clusting technologies, and they are the worlds largest retailer, yet they could not process online orders for over a week at one point because of some shopping cart issue, I don't know that it was ever fixed, they moved to another jsp system in the end. It was about 1.5-2 years ago.

Amazon has had outages, I do not visit very much, so I can't give any personal experience.

Paypal.com has had many scheduled(fine) and quite a number of unscheduled outages.

Almost every bank has had many unplanned outages with their online banking systems, as they are one of the most security enhanced, privacy matters, items online, and they have a rather high rate of failure.

My point is, even companies that spend 100's of MILLIONS in clustering their online solutions, they all have failures in code(walmart.com) or in systems(google.com), sometimes solutions are found to code, and sometimes they end up changing the whole system to fix a nagging error.

AS development companies grow, I think there will be a growing need for them to get bonded, much like any reputable locksmith, notary, insurance agent, etc, as a form for their firms if soemthing does go wrong, one project gone wrong can really hurt a great development firm.
 
Hi there,

Interesting post.

First of all there are no systems that are 100% reliable, even the ones that claim they are. There are so many weak points in most applications/hardware setups that you usually end up making alot of considered compromises in your system. Obviously the more important the system, the more money/time/effort can be spent on getting closer to 100% reliability.

Second, testing is sooooo important. Applications need as much testing as can be afforded, budgets quite often underestimate the testing required, especially in smaller projects.
From what you have said, you are testing the system yourself. In an ideal world, you need application usability/testing to be done by people other than the developer!

To quote Douglas Adams, ?A common mistake that people make when trying to design something completely foolproof is to underestimate the ingenuity of complete fools.?

I think it is hard to put a percentage on the reliability expected from an application. I think it would depend on its intended use, budget, complexity etc. The only time users will not be having any issues with an application is if they are not using it :)

For smaller projects our approach is to involve the clients in testing. Once a few cycles of testing/bug fixing have taken place we get the client to "sign off" the project and pretty much anything after that point is charged as new work, especially hunting down obscure problems because user X is a complete loon and having problems :)

Sorry for waffling on a bit:)

Cheers,


Largerabbit.
 
wow, I never knew you were programming during your free time Stephen.

I have programmed quite a bit in Visual basic and C++, written some fairly large applications.

Bugs and failures are always part of any major development project... Even after months of testing problems can still be discovered..

Look at Windows...
 
Yash, that was not my post, I did that for another forum user who asked a favor of me to post it. :)

The second post was mine.
 
Hmm best bet is not to be an optimist if your a programmer. Pessimism all the way. That way you don't get discouraged. :] All programs will have bugs. Period. The best bet against them is to design well and have lots of testing.
 
the way i test appplications im making is 1,

code well in the begining, ie visual look, makes it easier to look though.. anyone can tell you that.

i try and test each feature i add though all the methods, i then give it to a team of betta testers that some try features and some try on test sites, (dsont worry not on your servers), a

nd then i release a version , but i can sometimes release a bug fix within a day to days, but in general i get the application tested for a couple weeks then release it.
 
very interesting post you have! but its too long , next time cut it short and go straight to the point.

we all agree that there is no 100% system both hardware and software but there is such thing as Quality Assurance


have you heard Six Sigma - Simple explanation in 1 minute

Level 6 = 99.99% meaning 3 errors in 1 million try
Level 5 = 99% meaning 1 errors in 1 hundred try
Level 4 = 93% (you need to compute for this :) )

other levels are for non-pro so if we use this in development and in servicing clients im sure you can call your self a pro

:)
 
Back
Top