Goto

Collaborating Authors

 Bayesian Learning



d i, iii 1ยฐยฐ 11

AI Classics

Case-based reasoning is used extensively by people in A second driving force in the evolutionary history of CBR both expert and commonsense situations. It provides a was dissatisfaction with rule-based reasoning (expert systems wide range of advantages.


Categorical and Probabilistic Reasoning in Medical Diagnosis

AI Classics

How do practicing physicians make clinical decisions? What techniques can we use in the computer to produce programs that exhibit medical expertise? Our interest in these questions is motivated by our desire: 1. to provide (by computer) expert medical consultation to general practitioners or paramedical personnel in communities where such consultation is normally unavailable; 2. to come to understand the reasoning processes of expert doctors so that we may improve the teaching of their skills to medical students; and 3. to advance the techniques of artificial intelligence, especially as applied to medicine (AIM), to support our other goals. In other publications, we have described research by our group on programs to take the history of the present illness of a patient with renal disease (Pauker and Gorry, 1976; Szolovits and Pauker, 1976) and to advise the physician in the administration of the drug digitalis to patients with heart disease (Gorry et al., 1978; Silverman, 1975; Swartout, 1977).


Probabilistic Reasoning and Certainty Factors

AI Classics

The. development of automated assistance for medical diagnosis and decision making is an area of both theoretical and practical interest. Of methods for utilizing evidence to select diagnoses or decisions, probability theory has the firmest appeal. Probability theory in the form of Bayes' Theorem has been used by a number of" workers (Ross, 1972). Notable among recent developments are those of de Dombal and coworkers (de Dombal, 1973; de Dombal et al., 1974; 1975) and Pipberger and coworkers (Pipberger et al., 1975). The usefulness of Bayes' Theorem is limited practical difficulties, principally the lack of data adequate to estimate accurately the a priori and conditional probabilities used in the theorem. One attempt to mitigate this problem has been to assume statistical independence among various pieces of evidence. How seriously this approximation affects results is often unclear, and correction mechanisms have been explored (Ross, 1972; Norusis and Jacquez, 1975a; 1975b). Even the independence assumption requires an unmanageable number of estimates of" probabilities for most applications with realistic complexity.


A Model of Inexact Reasoning in Medicine Edward H. Shortliffe and Bruce G. Buchanan

AI Classics

Questioning of the expert gradually reveals, however, that despite the apparent similarity to a statement regarding a conditional probability, the number 0.7 differs significantly from a probability. The expert may well agree that P(hl]sl & s2 & s:0 0.7, but he becomes uneasy when he attempts to follow the logical conclusion that therefore P( hllS 1 & s 2 & s) 0.3. He claims that the three observations are evidence (to degree 0.7) in favor of the conclusion that the organism is a Streptococcus and should not be construed as evidence (to degree 0.3) against Streptococcus. We shall refer to this problem as Paradox 1 and return to it later in the exposition, after the interpretation of the 0.7 in the rule above has been introduced. It is tempting to conclude that the expert is irrational if he is unwilling to follow the implications of his probabilistic statements to their logical conclusions.


Reasoning Under Uncertainty

AI Classics

Please read it and send me comments, objections, etc. 1) Victor [Yu] has assigned certainty factors to his rules based on the relative strengths of the evidence in these rules. While trying to find a numerical scale that would work as he wanted it to with the system's 0.2 cutoff and combining functions, he had to adjust certainty factors of various rules. Now that this scale has been established, however, he assigns certainty factors using this scale, and does NOT adjust certainty factors of rules if he doesn't like the system's performance. Furthermore, he does NO combinatorial analysis before determining what CF to use; he is satisfied that using the scale he has devised, the system's combining function, and the 0.2 cutoff, the program will arrive at the right results for any combination of factors, and if it doesn't, he looks for missing information to add. 2) Assuming that the parameters IDENT and COVERFOR are disambiguated in Victor's set of rules, Ted [Shortliffe] believes the CF's that Victor uses in his rules, and approves of the idea of using a cutoff for COVERFOR since this is what we've been doing with bacteremia (since it is a binary decision, a cutoff makes sense for COVERFOR). Furthermore, this is quite similar to what clinicians do: they accumulate lots of small bits of clinical evidence, then decide if the total is enough to make them cover [or a particular organism--independent of what the microbiological evidence suggests.



Consistency Analysis of Nearest Subspace Classifier

arXiv.org Machine Learning

The Nearest subspace classifier (NSS) finds an estimation of the underlying subspace within each class and assigns data points to the class that corresponds to its nearest subspace. This paper mainly studies how well NSS can be generalized to new samples. It is proved that NSS is strongly consistent under certain assumptions. For completeness, NSS is evaluated through experiments on various simulated and real data sets, in comparison with some other linear model based classifiers. It is also shown that NSS can obtain effective classification results and is very efficient, especially for large scale data sets.


Bayesian Learning for Low-Rank matrix reconstruction

arXiv.org Machine Learning

We develop latent variable models for Bayesian learning based low-rank matrix completion and reconstruction from linear measurements. For under-determined systems, the developed methods are shown to reconstruct low-rank matrices when neither the rank nor the noise power is known a-priori. We derive relations between the latent variable models and several low-rank promoting penalty functions. The relations justify the use of Kronecker structured covariance matrices in a Gaussian based prior. In the methods, we use evidence approximation and expectation-maximization to learn the model parameters. The performance of the methods is evaluated through extensive numerical simulations.


Efficient Gradient-Based Inference through Transformations between Bayes Nets and Neural Nets

arXiv.org Machine Learning

Hierarchical Bayesian networks and neural networks with stochastic hidden units are commonly perceived as two separate types of models. We show that either of these types of models can often be transformed into an instance of the other, by switching between centered and differentiable non-centered parameterizations of the latent variables. The choice of parameterization greatly influences the efficiency of gradient-based posterior inference; we show that they are often complementary to eachother, we clarify when each parameterization is preferred and show how inference can be made robust. In the non-centered form, a simple Monte Carlo estimator of the marginal likelihood can be used for learning the parameters. Theoretical results are supported by experiments.