A Measure-Free Approach to Conditioning

arXiv.org Artificial Intelligence

In an earlier paper, a new theory of measurefree "conditional" objects was presented. In this paper, emphasis is placed upon the motivation of the theory. The central part of this motivation is established through an example involving a knowledge-based system. In order to evaluate combination of evidence for this system, using observed data, auxiliary at tribute and diagnosis variables, and inference rules connecting them, one must first choose an appropriate algebraic logic description pair (ALDP): a formal language or syntax followed by a compatible logic or semantic evaluation (or model). Three common choices- for this highly non-unique choice - are briefly discussed, the logics being Classical Logic, Fuzzy Logic, and Probability Logic. In all three,the key operator representing implication for the inference rules is interpreted as the often-used disjunction of a negation (b => a) = (b'v a), for any events a,b. However, another reasonable interpretation of the implication operator is through the familiar form of probabilistic conditioning. But, it can be shown - quite surprisingly - that the ALDP corresponding to Probability Logic cannot be used as a rigorous basis for this interpretation! To fill this gap, a new ALDP is constructed consisting of "conditional objects", extending ordinary Probability Logic, and compatible with the desired conditional probability interpretation of inference rules. It is shown also that this choice of ALDP leads to feasible computations for the combination of evidence evaluation in the example. In addition, a number of basic properties of conditional objects and the resulting Conditional Probability Logic are given, including a characterization property and a developed calculus of relations.


Active Learning for Function Approximation

Neural Information Processing Systems

We develop a principled strategy to sample a function optimally for function approximation tasks within a Bayesian framework. Using ideas from optimal experiment design, we introduce an objective function (incorporating both bias and variance) to measure the degree ofapproximation, and the potential utility of the data points towards optimizing this objective. We show how the general strategy canbe used to derive precise algorithms to select data for two cases: learning unit step functions and polynomial functions. In particular, we investigate whether such active algorithms can learn the target with fewer examples. We obtain theoretical and empirical resultsto suggest that this is the case. 1 INTRODUCTION AND MOTIVATION Learning from examples is a common supervised learning paradigm that hypothesizes atarget concept given a stream of training examples that describes the concept. In function approximation, example-based learning can be formulated as synthesizing anapproximation function for data sampled from an unknown target function (Poggio and Girosi, 1990). Active learning describes a class of example-based learning paradigms that seeks out new training examples from specific regions of the input space, instead of passively accepting examples from some data generating source. By judiciously selecting ex- 594 KahKay Sung, Parlha Niyogi amples instead of allowing for possible random sampling, active learning techniques can conceivably have faster learning rates and better approximation results than passive learning methods. This paper presents a Bayesian formulation for active learning within the function approximation framework.


Learning From What You Don't Observe

arXiv.org Artificial Intelligence

The process of diagnosis involves learning about the state of a system from various observations of symptoms or findings about the system. Sophisticated Bayesian (and other) algorithms have been developed to revise and maintain beliefs about the system as observations are made. Nonetheless, diagnostic models have tended to ignore some common sense reasoning exploited by human diagnosticians; In particular, one can learn from which observations have not been made, in the spirit of conversational implicature. There are two concepts that we describe to extract information from the observations not made. First, some symptoms, if present, are more likely to be reported before others. Second, most human diagnosticians and expert systems are economical in their data-gathering, searching first where they are more likely to find symptoms present. Thus, there is a desirable bias toward reporting symptoms that are present. We develop a simple model for these concepts that can significantly improve diagnostic inference.


Convergence Rates of Active Learning for Maximum Likelihood Estimation

arXiv.org Machine Learning

An active learner is given a class of models, a large set of unlabeled examples, and the ability to interactively query labels of a subset of these examples; the goal of the learner is to learn a model in the class that fits the data well. Previous theoretical work has rigorously characterized label complexity of active learning, but most of this work has focused on the PAC or the agnostic PAC model. In this paper, we shift our attention to a more general setting -- maximum likelihood estimation. Provided certain conditions hold on the model class, we provide a two-stage active learning algorithm for this problem. The conditions we require are fairly general, and cover the widely popular class of Generalized Linear Models, which in turn, include models for binary and multi-class classification, regression, and conditional random fields. We provide an upper bound on the label requirement of our algorithm, and a lower bound that matches it up to lower order terms. Our analysis shows that unlike binary classification in the realizable case, just a single extra round of interaction is sufficient to achieve near-optimal performance in maximum likelihood estimation. On the empirical side, the recent work in ~\cite{Zhang12} and~\cite{Zhang14} (on active linear and logistic regression) shows the promise of this approach.


Convergence Rates of Active Learning for Maximum Likelihood Estimation

Neural Information Processing Systems

An active learner is given a class of models, a large set of unlabeled examples, and the ability to interactively query labels of a subset of these examples; the goal of the learner is to learn a model in the class that fits the data well. Previous theoretical work has rigorously characterized label complexity of active learning, but most of this work has focused on the PAC or the agnostic PAC model. In this paper, we shift our attention to a more general setting -- maximum likelihood estimation. Provided certain conditions hold on the model class, we provide a two-stage active learning algorithm for this problem. The conditions we require are fairly general, and cover the widely popular class of Generalized Linear Models, which in turn, include models for binary and multi-class classification, regression, and conditional random fields. We provide an upper bound on the label requirement of our algorithm, and a lower bound that matches it up to lower order terms. Our analysis shows that unlike binary classification in the realizable case, just a single extraround of interaction is sufficient to achieve near-optimal performance in maximum likelihood estimation. On the empirical side, the recent work in (Gu et al. 2012) and (Gu et al. 2014) (on active linear and logistic regression) shows the promise of this approach.