Bayesian Inference
A Model of Inexact Reasoning in Medicine Edward H. Shortliffe and Bruce G. Buchanan
Questioning of the expert gradually reveals, however, that despite the apparent similarity to a statement regarding a conditional probability, the number 0.7 differs significantly from a probability. The expert may well agree that P(hl]sl & s2 & s:0 0.7, but he becomes uneasy when he attempts to follow the logical conclusion that therefore P( hllS 1 & s 2 & s) 0.3. He claims that the three observations are evidence (to degree 0.7) in favor of the conclusion that the organism is a Streptococcus and should not be construed as evidence (to degree 0.3) against Streptococcus. We shall refer to this problem as Paradox 1 and return to it later in the exposition, after the interpretation of the 0.7 in the rule above has been introduced. It is tempting to conclude that the expert is irrational if he is unwilling to follow the implications of his probabilistic statements to their logical conclusions.
Reasoning Under Uncertainty
Please read it and send me comments, objections, etc. 1) Victor [Yu] has assigned certainty factors to his rules based on the relative strengths of the evidence in these rules. While trying to find a numerical scale that would work as he wanted it to with the system's 0.2 cutoff and combining functions, he had to adjust certainty factors of various rules. Now that this scale has been established, however, he assigns certainty factors using this scale, and does NOT adjust certainty factors of rules if he doesn't like the system's performance. Furthermore, he does NO combinatorial analysis before determining what CF to use; he is satisfied that using the scale he has devised, the system's combining function, and the 0.2 cutoff, the program will arrive at the right results for any combination of factors, and if it doesn't, he looks for missing information to add. 2) Assuming that the parameters IDENT and COVERFOR are disambiguated in Victor's set of rules, Ted [Shortliffe] believes the CF's that Victor uses in his rules, and approves of the idea of using a cutoff for COVERFOR since this is what we've been doing with bacteremia (since it is a binary decision, a cutoff makes sense for COVERFOR). Furthermore, this is quite similar to what clinicians do: they accumulate lots of small bits of clinical evidence, then decide if the total is enough to make them cover [or a particular organism--independent of what the microbiological evidence suggests.
Consistency Analysis of Nearest Subspace Classifier
The Nearest subspace classifier (NSS) finds an estimation of the underlying subspace within each class and assigns data points to the class that corresponds to its nearest subspace. This paper mainly studies how well NSS can be generalized to new samples. It is proved that NSS is strongly consistent under certain assumptions. For completeness, NSS is evaluated through experiments on various simulated and real data sets, in comparison with some other linear model based classifiers. It is also shown that NSS can obtain effective classification results and is very efficient, especially for large scale data sets.
Bayesian Learning for Low-Rank matrix reconstruction
Sundin, Martin, Rojas, Cristian R., Jansson, Magnus, Chatterjee, Saikat
We develop latent variable models for Bayesian learning based low-rank matrix completion and reconstruction from linear measurements. For under-determined systems, the developed methods are shown to reconstruct low-rank matrices when neither the rank nor the noise power is known a-priori. We derive relations between the latent variable models and several low-rank promoting penalty functions. The relations justify the use of Kronecker structured covariance matrices in a Gaussian based prior. In the methods, we use evidence approximation and expectation-maximization to learn the model parameters. The performance of the methods is evaluated through extensive numerical simulations.
Efficient Gradient-Based Inference through Transformations between Bayes Nets and Neural Nets
Kingma, Diederik P., Welling, Max
Hierarchical Bayesian networks and neural networks with stochastic hidden units are commonly perceived as two separate types of models. We show that either of these types of models can often be transformed into an instance of the other, by switching between centered and differentiable non-centered parameterizations of the latent variables. The choice of parameterization greatly influences the efficiency of gradient-based posterior inference; we show that they are often complementary to eachother, we clarify when each parameterization is preferred and show how inference can be made robust. In the non-centered form, a simple Monte Carlo estimator of the marginal likelihood can be used for learning the parameters. Theoretical results are supported by experiments.
Difficulties applying recent blind source separation techniques to EEG and MEG
High temporal resolution measurements of human brain activity can be performed by recording the electric potentials on the scalp surface (electroencephalography, EEG), or by recording the magnetic fields near the surface of the head (magnetoencephalography, MEG). The analysis of the data is problematic due to the fact that multiple neural generators may be simultaneously active and the potentials and magnetic fields from these sources are superimposed on the detectors. It is highly desirable to un-mix the data into signals representing the behaviors of the original individual generators. This general problem is called blind source separation and several recent techniques utilizing maximum entropy, minimum mutual information, and maximum likelihood estimation have been applied. These techniques have had much success in separating signals such as natural sounds or speech, but appear to be ineffective when applied to EEG or MEG signals. Many of these techniques implicitly assume that the source distributions have a large kurtosis, whereas an analysis of EEG/MEG signals reveals that the distributions are multimodal. This suggests that more effective separation techniques could be designed for EEG and MEG signals.
Minimax Optimal Sparse Signal Recovery with Poisson Statistics
Rohban, Mohammad H., Motamedvaziri, Delaram, Saligrama, Venkatesh
We are motivated by problems that arise in a number of applications such as Online Marketing and Explosives detection, where the observations are usually modeled using Poisson statistics. We model each observation as a Poisson random variable whose mean is a sparse linear superposition of known patterns. Unlike many conventional problems observations here are not identically distributed since they are associated with different sensing modalities. We analyze the performance of a Maximum Likelihood (ML) decoder, which for our Poisson setting involves a non-linear optimization but yet is computationally tractable. We derive fundamental sample complexity bounds for sparse recovery when the measurements are contaminated with Poisson noise. In contrast to the least-squares linear regression setting with Gaussian noise, we observe that in addition to sparsity, the scale of the parameters also fundamentally impacts $\ell_2$ error in the Poisson setting. We show tightness of our upper bounds both theoretically and experimentally. In particular, we derive a minimax matching lower bound on the mean-squared error and show that our constrained ML decoder is minimax optimal for this regime.
Convergent Bayesian formulations of blind source separation and electromagnetic source estimation
Knuth, Kevin H., Vaughan, Herbert G. Jr
We consider two areas of research that have been developing in parallel over the last decade: blind source separation (BSS) and electromagnetic source estimation (ESE). BSS deals with the recovery of source signals when only mixtures of signals can be obtained from an array of detectors and the only prior knowledge consists of some information about the nature of the source signals. On the other hand, ESE utilizes knowledge of the electromagnetic forward problem to assign source signals to their respective generators, while information about the signals themselves is typically ignored. We demonstrate that these two techniques can be derived from the same starting point using the Bayesian formalism. This suggests a means by which new algorithms can be developed that utilize as much relevant information as possible. We also briefly mention some preliminary work that supports the value of integrating information used by these two techniques and review the kinds of information that may be useful in addressing the ESE problem.
Scalable Multi-Output Label Prediction: From Classifier Chains to Classifier Trellises
Read, J., Martino, L., Olmos, P., Luengo, D.
Multi-output inference tasks, such as multi-label classification, have become increasingly important in recent years. A popular method for multi-label classification is classifier chains, in which the predictions of individual classifiers are cascaded along a chain, thus taking into account inter-label dependencies and improving the overall performance. Several varieties of classifier chain methods have been introduced, and many of them perform very competitively across a wide range of benchmark datasets. However, scalability limitations become apparent on larger datasets when modeling a fully-cascaded chain. In particular, the methods' strategies for discovering and modeling a good chain structure constitutes a mayor computational bottleneck. In this paper, we present the classifier trellis (CT) method for scalable multi-label classification. We compare CT with several recently proposed classifier chain methods to show that it occupies an important niche: it is highly competitive on standard multi-label problems, yet it can also scale up to thousands or even tens of thousands of labels.