What is a Fault? And Why does it Matter?

Dr. Ali Mili
New Jersey Institute of Technology


Abstract

The IEEE Standard 7-4.3.2-2003 defines a software fault as an incorrect step, process or data definition in a computer program, while a group of researchers who introduced dependability terminology define it as the adjudged or hypothesized cause of an error. Neither of these definitions can be considered as a sound basis for a formal analysis of faults or related matters (fault avoidance, fault removal, fault tolerance, etc). In this talk, we define software faults by means of the concept of relative correctness, which is the property of a program to be more-correct than another with respect to a given specification. Whereas traditional absolute correctness distinguishes between two classes of candidate programs, i.e. correct and incorrect, relative correctness ranks candidate programs over a rich partial ordering, whose maximal elements are the correct programs. Also, whereas traditionally we use program testing to diagnose and remove faults and we use program proving to prove the absence of faults, relative correctness enables us to remove a fault and prove that it has been removed, all without testing; we call this debugging without testing. Other implications of relative correctness include: programming without refinement; defining faultiness by fault depth; monotonic fault removal; automated program debugging.