Uncertainty
Gaussian Processes for Machine Learning: Book webpage
The book deals with the supervised-learning problem for both regression and classification, and includes detailed algorithms. A wide variety of covariance (kernel) functions are presented and their properties discussed. Model selection is discussed both from a Bayesian and a classical perspective. Many connections to other well-known techniques from machine learning and statistics are discussed, including support-vector machines, neural networks, splines, regularization networks, relevance vector machines and others. Theoretical issues including learning curves and the PAC-Bayesian framework are treated, and several approximation methods for learning with large datasets are discussed.
Norm-Product Belief Propagation: Primal-Dual Message-Passing for Approximate Inference
In this paper we treat both forms of probabilistic inference, estimating marginal probabilities of the joint distribution and finding the most probable assignment, through a unified message-passing algorithm architecture. We generalize the Belief Propagation (BP) algorithms of sum-product and max-product and tree-rewaighted (TRW) sum and max product algorithms (TRBP) and introduce a new set of convergent algorithms based on "convex-free-energy" and Linear-Programming (LP) relaxation as a zero-temprature of a convex-free-energy. The main idea of this work arises from taking a general perspective on the existing BP and TRBP algorithms while observing that they all are reductions from the basic optimization formula of $f + \sum_i h_i$ where the function $f$ is an extended-valued, strictly convex but non-smooth and the functions $h_i$ are extended-valued functions (not necessarily convex). We use tools from convex duality to present the "primal-dual ascent" algorithm which is an extension of the Bregman successive projection scheme and is designed to handle optimization of the general type $f + \sum_i h_i$. Mapping the fractional-free-energy variational principle to this framework introduces the "norm-product" message-passing. Special cases include sum-product and max-product (BP algorithms) and the TRBP algorithms. When the fractional-free-energy is set to be convex (convex-free-energy) the norm-product is globally convergent for estimating of marginal probabilities and for approximating the LP-relaxation. We also introduce another branch of the norm-product, the "convex-max-product". The convex-max-product is convergent (unlike max-product) and aims at solving the LP-relaxation.
Feature Construction for Relational Sequence Learning
Di Mauro, Nicola, Basile, Teresa M. A., Ferilli, Stefano, Esposito, Floriana
We tackle the problem of multi-class relational sequence learning using relevant patterns discovered from a set of labelled sequences. To deal with this problem, firstly each relational sequence is mapped into a feature vector using the result of a feature construction method. Since, the efficacy of sequence learning algorithms strongly depends on the features used to represent the sequences, the second step is to find an optimal subset of the constructed features leading to high classification accuracy. This feature selection task has been solved adopting a wrapper approach that uses a stochastic local search algorithm embedding a naive Bayes classifier. The performance of the proposed method applied to a real-world dataset shows an improvement when compared to other established methods, such as hidden Markov models, Fisher kernels and conditional random fields for relational sequences.
Learning to Predict Combinatorial Structures
The major challenge in designing a discriminative learning algorithm for predicting structured data is to address the computational issues arising from the exponential size of the output space. Existing algorithms make different assumptions to ensure efficient, polynomial time estimation of model parameters. For several combinatorial structures, including cycles, partially ordered sets, permutations and other graph classes, these assumptions do not hold. In this thesis, we address the problem of designing learning algorithms for predicting combinatorial structures by introducing two new assumptions: (i) The first assumption is that a particular counting problem can be solved efficiently. The consequence is a generalisation of the classical ridge regression for structured prediction. (ii) The second assumption is that a particular sampling problem can be solved efficiently. The consequence is a new technique for designing and analysing probabilistic structured prediction models. These results can be applied to solve several complex learning problems including but not limited to multi-label classification, multi-category hierarchical classification, and label ranking.
Heavy-Tailed Processes for Selective Shrinkage
Wauthier, Fabian L., Jordan, Michael I.
Heavy-tailed distributions are frequently used to enhance the robustness of regression and classification methods to outliers in output space. Often, however, we are confronted with "outliers" in input space, which are isolated observations in sparsely populated regions. We show that heavy-tailed stochastic processes (which we construct from Gaussian processes via a copula), can be used to improve robustness of regression and classification estimators to such outliers by selectively shrinking them more strongly in sparse regions than in dense regions. We carry out a theoretical analysis to show that selective shrinkage occurs, provided the marginals of the heavy-tailed process have sufficiently heavy tails. The analysis is complemented by experiments on biological data which indicate significant improvements of estimates in sparse regions while producing competitive results in dense regions.
Vagueness of Linguistic variable
Raheja, Supriya, Rajpal, Smita
In the area of computer science focusing on creating machines that can engage on behaviors that humans consider intelligent. The ability to create intelligent machines has intrigued humans since ancient times and today with the advent of the computer and 50 years of research into various programming techniques, the dream of smart machines is becoming a reality. Researchers are creating systems which can mimic human thought, understand speech, beat the best human chessplayer, and countless other feats never before possible. Ability of the human to estimate the information is most brightly shown in using of natural languages. Using words of a natural language for valuation qualitative attributes, for example, the person pawns uncertainty in form of vagueness in itself estimations. Vague sets, vague judgments, vague conclusions takes place there and then, where and when the reasonable subject exists and also is interested in something. The vague sets theory has arisen as the answer to an illegibility of language the reasonable subject speaks. Language of a reasonable subject is generated by vague events which are created by the reason and which are operated by the mind. The theory of vague sets represents an attempt to find such approximation of vague grouping which would be more convenient, than the classical theory of sets in situations where the natural language plays a significant role. Such theory has been offered by known American mathematician Gau and Buehrer .In our paper we are describing how vagueness of linguistic variables can be solved by using the vague set theory.This paper is mainly designed for one of directions of the eventology (the theory of the random vague events), which has arisen within the limits of the probability theory and which pursue the unique purpose to describe eventologically a movement of reason.
Human Disease Diagnosis Using a Fuzzy Expert System
Hasan, Mir Anamul, Sher-E-Alam, Khaja Md., Chowdhury, Ahsan Raja
Human disease diagnosis is a complicated process and requires high level of expertise. Any attempt of developing a web-based expert system dealing with human disease diagnosis has to overcome various difficulties. This paper describes a project work aiming to develop a web-based fuzzy expert system for diagnosing human diseases. Now a days fuzzy systems are being used successfully in an increasing number of application areas; they use linguistic rules to describe systems. This research project focuses on the research and development of a web-based clinical tool designed to improve the quality of the exchange of health information between health care professionals and patients. Practitioners can also use this web-based tool to corroborate diagnosis. The proposed system is experimented on various scenarios in order to evaluate it's performance. In all the cases, proposed system exhibits satisfactory results.
A Novel Rough Set Reduct Algorithm for Medical Domain Based on Bee Colony Optimization
Feature selection refers to the problem of selecting relevant features which produce the most predictive outcome. In particular, feature selection task is involved in datasets containing huge number of features. Rough set theory has been one of the most successful methods used for feature selection. However, this method is still not able to find optimal subsets. This paper proposes a new feature selection method based on Rough set theory hybrid with Bee Colony Optimization (BCO) in an attempt to combat this. This proposed work is applied in the medical domain to find the minimal reducts and experimentally compared with the Quick Reduct, Entropy Based Reduct, and other hybrid Rough Set methods such as Genetic Algorithm (GA), Ant Colony Optimization (ACO) and Particle Swarm Optimization (PSO).
Computing p-values of LiNGAM outputs via Multiscale Bootstrap
Komatsu, Yusuke, Shimizu, Shohei, Shimodaira, Hidetoshi
Structural equation models and Bayesian networks have been widely used to study causal relationships between continuous variables. Recently, a non-Gaussian method called LiNGAM was proposed to discover such causal models and has been extended in various directions. An important problem with LiNGAM is that the results are affected by the random sampling of the data as with any statistical method. Thus, some analysis of the statistical reliability or confidence level should be conducted. A common method to evaluate a confidence level is a bootstrap method. However, a confidence level computed by ordinary bootstrap method is known to be biased as a probability-value ($p$-value) of hypothesis testing. In this paper, we propose a new procedure to apply an advanced bootstrap method called multiscale bootstrap to compute confidence levels, i.e., p-values, of LiNGAM outputs. The multiscale bootstrap method gives unbiased $p$-values with asymptotic much higher accuracy. Experiments on artificial data demonstrate the utility of our approach.
Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models
Liu, Han, Roeder, Kathryn, Wasserman, Larry
A challenging problem in estimating high-dimensional graphical models is to choose the regularization parameter in a data-dependent way. The standard techniques include $K$-fold cross-validation ($K$-CV), Akaike information criterion (AIC), and Bayesian information criterion (BIC). Though these methods work well for low-dimensional problems, they are not suitable in high dimensional settings. In this paper, we present StARS: a new stability-based method for choosing the regularization parameter in high dimensional inference for undirected graphs. The method has a clear interpretation: we use the least amount of regularization that simultaneously makes a graph sparse and replicable under random sampling. This interpretation requires essentially no conditions. Under mild conditions, we show that StARS is partially sparsistent in terms of graph estimation: i.e. with high probability, all the true edges will be included in the selected model even when the graph size diverges with the sample size. Empirically, the performance of StARS is compared with the state-of-the-art model selection procedures, including $K$-CV, AIC, and BIC, on both synthetic data and a real microarray dataset. StARS outperforms all these competing procedures.