From ...
From: Erik Naggum <clerik@naggum.no>
Subject: Re: dealing with errors (was: "Programming is FUN again")
Date: 1998/03/26
Message-ID: <3099927027198005@naggum.no>#1/1
X-Deja-AN: 337979141
References: <bc-1003981239050001@17.127.10.22> <w6zpiw475d.fsf@gromit.nextel.no> <MPG.f720d97f8d467ac98997e@news.demon.co.uk> <6e9p0b$h0j@er3.rutgers.edu> <MPG.f735c9b879087bd989982@news.demon.co.uk> <6ec6os$7a@bgtnsc03.worldnet.att.net> <6ecbd5$epg$1@client3.news.psi.net> <6f0m6n$71u$1@news.hal-pc.org> <351692EA.41C6@bogus.acm.org> <351A72C3.5CEF1977@badlands.nodak.edu>
Followup-To: comp.lang.lisp
mail-copies-to: never
Organization: Naggum Software; +47 8800 8879; http://www.naggum.no
Newsgroups: comp.lang.scheme,comp.lang.lisp


* Brent A Ellingson
| In other words, why the hell did the equipment test for the exception if
| it didn't know what to do with it, except crash and destroy the rocket?
| Had it not tested for the exception on the non-critical code, the rocket
| probably would not have failed.

  it is amazing that the view that "don't test for errors you don't know
  how to handle" is _still_ possible in the light of that report, but I
  guess that's the way with _beliefs_.  I cannot fathom how people will
  read all the contradictory evidence they can find and still end up
  believing in some braindamaged myths.

  the problem is: the equipment did _not_ test for the exception.  the
  exception was allowed to propagate unchecked until the "crash-and-burn"
  exception handler took care of it.  this could be viewed as silly, but
  the report clearly states why this was sound design: unhandled exceptions
  should be really serious and should indicate random hardware failure.

  the _unsound_ design was not in the exception handling at all, it was in
  allowing old code from Ariane 4 still run in Ariane 5, notably code that
  should run for a while into the launch sequence on Ariane 4 because it
  would enable shorter re-launch cycles -- which was not necessary at all
  on Ariane 5.  the error was thus not in stopping at the wrong time or
  under the wrong conditions -- it was in _running_ code at the wrong time
  and under the wrong conditions.

  "had it not run the bogus code, the rocket would not have failed in it."

  how can you expect to learn from mistakes when you insist that the errors
  you observe are caused by mistakes you think you have _already_ learned
  from, and that others (the dumbass people who use exceptions, in this
  case) are at fault for not learning from?

  rather than "don't test for error you don't know how to handle", I
  propose "don't run code with errors you aren't prepared to handle".

  did you notice how the report had the brilliant insight that we have
  gotten used to think that code is good until proven faulty and that this
  was the major cultural problem pervading the whole design and deployment
  process?  it's high time this insighed could sink into the right people
  and cause more focus on provably correct code and verification.  with the
  extremely arrogant attitude still held by Brent and many others like him,
  we will continue to produce crappy code that crash rockets for millenia
  to come without learning the problem really is: unchecked assumptions!
  
#:Erik
-- 
  religious cult update in light of new scientific discoveries:
  "when we cannot go to the comet, the comet must come to us."