Uncertainty
Evidence with Uncertain Likelihoods
Halpern, Joseph Y., Pucella, Riccardo
An agent often has a number of hypotheses, and must choose among them based on observations, or outcomes of experiments. Each of these observations can be viewed as providing evidence for or against various hypotheses. All the attempts to formalize this intuition up to now have assumed that associated with each hypothesis h there is a likelihood function μ_h, which is a probability measure that intuitively describes how likely each observation is, conditional on h being the correct hypothesis. We consider an extension of this framework where there is uncertainty as to which of a number of likelihood functions is appropriate, and discuss how one formal approach to defining evidence, which views evidence as a function from priors to posteriors, can be generalized to accommodate this uncertainty.
A Logic for Reasoning about Evidence
Halpern, Joseph Y., Pucella, Riccardo
We introduce a logic for reasoning about evidence that essentially views evidence as a function from prior beliefs (before making an observation) to posterior beliefs (after making the observation). We provide a sound and complete axiomatization for the logic, and consider the complexity of the decision problem. Although the reasoning in the logic is mainly propositional, we allow variables representing numbers and quantification over them. This expressive power seems necessary to capture important properties of evidence.
Universal Algorithmic Intelligence: A mathematical top->down approach
Sequential decision theory formally solves the problem of rational agents in uncertain worlds if the true environmental prior probability distribution is known. Solomonoff's theory of universal induction formally solves the problem of sequence prediction for unknown prior distribution. We combine both ideas and get a parameter-free theory of universal Artificial Intelligence. We give strong arguments that the resulting AIXI model is the most intelligent unbiased agent possible. We outline how the AIXI model can formally solve a number of problem classes, including sequence prediction, strategic games, function minimization, reinforcement and supervised learning. The major drawback of the AIXI model is that it is uncomputable. To overcome this problem, we construct a modified algorithm AIXItl that is still effectively more intelligent than any other time t and length l bounded agent. The computation time of AIXItl is of the order t x 2^l. The discussion includes formal definitions of intelligence order relations, the horizon problem and relations of the AIXI theory to other AI approaches.
Causes and Explanations: A Structural-Model Approach. Part II: Explanations
Halpern, Joseph Y., Pearl, Judea
We propose new definitions of (causal) explanation, using structural equations to model counterfactuals. The definition is based on the notion of actual cause, as defined and motivated in a companion paper. Essentially, an explanation is a fact that is not known for certain but, if found to be true, would constitute an actual cause of the fact to be explained, regardless of the agent's initial uncertainty. We show that the definition handles well a number of problematic examples from the literature.
Reinforcement Learning with Linear Function Approximation and LQ control Converges
Szita, Istvan, Lorincz, Andras
Reinforcement learning is commonly used with function approximation. However, very few positive results are known about the convergence of function approximation based RL control algorithms. In this paper we show that TD(0) and Sarsa(0) with linear function approximation is convergent for a simple class of problems, where the system is linear and the costs are quadratic (the LQ control problem). Furthermore, we show that for systems with Gaussian noise and non-completely observable states (the LQG problem), the mentioned RL algorithms are still convergent, if they are combined with Kalman filtering.
Penalized Likelihood Methods for Estimation of Sparse High Dimensional Directed Acyclic Graphs
Shojaie, Ali, Michailidis, George
Directed acyclic graphs (DAGs) are commonly used to represent causal relationships among random variables in graphical models. Applications of these models arise in the study of physical, as well as biological systems, where directed edges between nodes represent the influence of components of the system on each other. The general problem of estimating DAGs from observed data is computationally NP-hard, Moreover two directed graphs may be observationally equivalent. When the nodes exhibit a natural ordering, the problem of estimating directed graphs reduces to the problem of estimating the structure of the network. In this paper, we propose a penalized likelihood approach that directly estimates the adjacency matrix of DAGs. Both lasso and adaptive lasso penalties are considered and an efficient algorithm is proposed for estimation of high dimensional DAGs. We study variable selection consistency of the two penalties when the number of variables grows to infinity with the sample size. We show that although lasso can only consistently estimate the true network under stringent assumptions, adaptive lasso achieves this task under mild regularity conditions. The performance of the proposed methods is compared to alternative methods in simulated, as well as real, data examples.
Sparse Convolved Multiple Output Gaussian Processes
Álvarez, Mauricio A., Lawrence, Neil D.
Recently there has been an increasing interest in methods that deal with multiple outputs. This has been motivated partly by frameworks like multitask learning, multisensor networks or structured output data. From a Gaussian processes perspective, the problem reduces to specifying an appropriate covariance function that, whilst being positive semi-definite, captures the dependencies between all the data points and across all the outputs. One approach to account for non-trivial correlations between outputs employs convolution processes. Under a latent function interpretation of the convolution transform we establish dependencies between output variables. The main drawbacks of this approach are the associated computational and storage demands. In this paper we address these issues. We present different sparse approximations for dependent output Gaussian processes constructed through the convolution formalism. We exploit the conditional independencies present naturally in the model. This leads to a form of the covariance similar in spirit to the so called PITC and FITC approximations for a single output. We show experimental results with synthetic and real data, in particular, we show results in pollution prediction, school exams score prediction and gene expression data.
Likelihood-based semi-supervised model selection with applications to speech processing
White, Christopher M., Khudanpur, Sanjeev P., Wolfe, Patrick J.
In conventional supervised pattern recognition tasks, model selection is typically accomplished by minimizing the classification error rate on a set of so-called development data, subject to ground-truth labeling by human experts or some other means. In the context of speech processing systems and other large-scale practical applications, however, such labeled development data are typically costly and difficult to obtain. This article proposes an alternative semi-supervised framework for likelihood-based model selection that leverages unlabeled data by using trained classifiers representing each model to automatically generate putative labels. The errors that result from this automatic labeling are shown to be amenable to results from robust statistics, which in turn provide for minimax-optimal censored likelihood ratio tests that recover the nonparametric sign test as a limiting case. This approach is then validated experimentally using a state-of-the-art automatic speech recognition system to select between candidate word pronunciations using unlabeled speech data that only potentially contain instances of the words under test. Results provide supporting evidence for the utility of this approach, and suggest that it may also find use in other applications of machine learning.
Multilingual Part-of-Speech Tagging: Two Unsupervised Approaches
Naseem, T., Snyder, B., Eisenstein, J., Barzilay, R.
We demonstrate the effectiveness of multilingual learning for unsupervised part-of-speech tagging. The central assumption of our work is that by combining cues from multiple languages, the structure of each becomes more apparent. We consider two ways of applying this intuition to the problem of unsupervised part-of-speech tagging: a model that directly merges tag structures for a pair of languages into a single sequence and a second model which instead incorporates multilingual context using latent variables. Both approaches are formulated as hierarchical Bayesian models, using Markov Chain Monte Carlo sampling techniques for inference. Our results demonstrate that by incorporating multilingual evidence we can achieve impressive performance gains across a range of scenarios. We also found that performance improves steadily as the number of available languages increases.
A Hierarchical Bayesian Model for Frame Representation
Chaâri, L., Pesquet, J. -C., Tourneret, J. -Y., Ciuciu, Ph., Benazza-Benyahia, A.
In many signal processing problems, it may be fruitful to represent the signal under study in a frame. If a probabilistic approach is adopted, it becomes then necessary to estimate the hyper-parameters characterizing the probability distribution of the frame coefficients. This problem is difficult since in general the frame synthesis operator is not bijective. Consequently, the frame coefficients are not directly observable. This paper introduces a hierarchical Bayesian model for frame representation. The posterior distribution of the frame coefficients and model hyper-parameters is derived. Hybrid Markov Chain Monte Carlo algorithms are subsequently proposed to sample from this posterior distribution. The generated samples are then exploited to estimate the hyper-parameters and the frame coefficients of the target signal. Validation experiments show that the proposed algorithms provide an accurate estimation of the frame coefficients and hyper-parameters. Application to practical problems of image denoising show the impact of the resulting Bayesian estimation on the recovered signal quality.