Reasoning about action and change is integral to the diagnosis, testing and repair of many artifacts, and yet there is no formal account of diagnostic problem solving which incorporates a theory of action and change. In this abridged report, we provide a situation calculus framework for diagnostic problem solving. Using this framework, we present results towards a characterization of diagnosis for behaviorally static systems whose state can be affected by events which occur in the world, and which require world-altering actions to achieve tests and repairs. Diagnosis is defined more broadly in terms of what happened in addition to the traditional conjccture of what is wrong. Observations of events and actions in the world are used to diagnose some system malfunctions. By developing our characterization in terms of the situation calculus, we are able to contribute towards a formal characterization and semantics for this broader notion of diagnosis, testing and repair, in addition to dealing formally with issues such as the frame problem.
Recent work has raised the challenge of efficient automated troubleshooting in domains where repairing a set of components in a single repair action is cheaper than repairing each of them separately. This corresponds to cases where there is a non-negligible overhead to initiating a repair action and to testing the system after a repair action. In this work we propose several algorithms for choosing which batch of components to repair, so as to minimize the overall repair costs. Experimentally, we show the benefit of these algorithms over repairing components one at a time.
A known limitation of many diagnosis algorithms is that the number of diagnoses they return can be very large. This raises the question of how to use such a large set of diagnoses. For example, presenting hundreds of diagnoses to a human operator (charged with repairing the system) is meaningless. In various settings, including decision support for a human operator and automated troubleshooting processes, it is sufficient to be able to answer a basic diagnostic question: is a given component faulty? We propose a way to aggregate an arbitrarily large set of diagnoses to return an estimate of the likelihood of a given component to be faulty. The resulting mapping of components to their likelihood of being faulty is called the system's health state. We propose two metrics for evaluating the accuracy of a health state and show that an accurate health state can be found without finding all diagnoses. An empirical study explores the question of how many diagnoses are needed to obtain an accurate enough health state, and a simple online stopping criterion is proposed.
Numerous methods for probabilistic reasoning in large, complex belief or decision networks are currently being developed. There has been little research on automating the dynamic, incremental construction of decision models. A uniform value-driven method of decision model construction is proposed for the hierarchical complete diagnosis. Hierarchical complete diagnostic reasoning is formulated as a stochastic process and modeled using influence diagrams. Given observations, this method creates decision models in order to obtain the best actions sequentially for locating and repairing a fault at minimum cost. This method construct decision models incrementally, interleaving probe actions with model construction and evaluation. The method treats meta-level and baselevel tasks uniformly. That is, the method takes a decision-theoretic look at the control of search in causal pathways and structural hierarchies.