Reading 05: Learn To Fly

The root cause of the Therac-25 accidents was for three main reasons. First there were serious bugs in the code base, second there were no hardware safety checks, finally an inefficient user interface. The race conditions in the code were serious problems. They allowed code to be executed when it was not ready. Because the program shared variables poorly, it lead to execution before the code and hardware were actually ready to be executed. This meant faulty code could be run without the proper hardware set up with the proper inputs. Also, there were overflow errors in the code that allowed improper application again. This could have been largely protected against with hardware interlocks. If there were hardware checks in place to ensure that the hardware was properly in place before operation many of these problems could have been avoided. Finally the user interface was poorly operated. By having a high number of innocuous sounding error messages, such as “Malfunction 54,” caused the operator to ignore what was a very serious malfunction. In such cases the machine should have probably locked itself to ensure the user did not accidentally kill someone. The error messages could have been more descriptive as well, this would have allowed the technician to see the problem and fix it rather than simply ignoring the malfunction. Thus the root causes of the Therac-25 accidents were because of the bugs in code, lack of hardware safety features, and poor UI design. The challenges of software developers working safety critical conditions is the risk involved. Safety critical conditions can have major effects on people even possibly death, such as with Therac-25, if coded poorly. In these cases, simply rolling out patches when a bug is reported is not enough, people could have already died. The developers must extensively test all possible conditions before releasing the code to ensure no problems arise. One of the articles suggests third party testers to ensure no bias. This sounds like a great idea, in house testers are a great asset, however, may overlook some bugs as unimportant if it will launch the product faster. If a third party is used to ensure that there are no bugs, these biases will not exist. The developer must ensure that their programs in safety critical conditions is as bug free as physically possible. Sending out buggy software could have major negative side effects. These side effects would be the fault of the developer and of the company they work for. In these situations the developer should ensure they have tested their code as much as physically possible. At the same time, the company the program is for must also ensure they do not release products without extensive testing of their own. The company has more resources than the developer and therefore they should ensure to have all safety critical code tested extensively. Therefore, both the developer and the company are both responsible for ensuring that the code and the product are as stable, reliable and non faulty as possible. These steps could save lives.