Genre
A Minimum Relative Entropy Principle for Learning and Acting
Ortega, Pedro A., Braun, Daniel A.
This paper proposes a method to construct an adaptive agent that is universal with respect to a given class of experts, where each expert is an agent that has been designed specifically for a particular environment. This adaptive control problem is formalized as the problem of minimizing the relative entropy of the adaptive agent from the expert that is most suitable for the unknown environment. If the agent is a passive observer, then the optimal solution is the well-known Bayesian predictor. However, if the agent is active, then its past actions need to be treated as causal interventions on the I/O stream rather than normal probability conditions. Here it is shown that the solution to this new variational problem is given by a stochastic controller called the Bayesian control rule, which implements adaptive behavior as a mixture of experts. Furthermore, it is shown that under mild assumptions, the Bayesian control rule converges to the control law of the most suitable expert.
Algebraic Comparison of Partial Lists in Bioinformatics
Jurman, Giuseppe, Riccadonna, Samantha, Visintainer, Roberto, Furlanello, Cesare
The outcome of a functional genomics pipeline is usually a partial list of genomic features, ranked by their relevance in modelling biological phenotype in terms of a classification or regression model. Due to resampling protocols or just within a meta-analysis comparison, instead of one list it is often the case that sets of alternative feature lists (possibly of different lengths) are obtained. Here we introduce a method, based on the algebraic theory of symmetric groups, for studying the variability between lists ("list stability") in the case of lists of unequal length. We provide algorithms evaluating stability for lists embedded in the full feature set or just limited to the features occurring in the partial lists. The method is demonstrated first on synthetic data in a gene filtering task and then for finding gene profiles on a recent prostate cancer dataset.
Faster Algorithms for Max-Product Message-Passing
McAuley, Julian J., Caetano, Tiberio S.
It is well-known that exact inference in tree-structured graphical models can be accomplished efficiently by message-passing operations following a simple protocol making use of the distributive law (Aji and McEliece, 2000; Kschischang et al., 2001). It is also well-known that exact inference in arbitrary graphical models can be solved by the junction-tree algorithm; its efficiency is determined by the size of the maximal cliques after triangulation, a quantity related to the treewidth of the graph. Figure 1 illustrates an attempt to apply the junction-tree algorithm to some graphical models containing cycles. If the graphs are not chordal ((a) and (b)), they need to be triangulated, or made chordal (red edges in (c) and (d)). Their clique-graphs are then guaranteed to be junction-trees, and the distributive law can be applied with the same protocol used for trees; see Aji and McEliece (2000) for a beautiful tutorial on exact inference in arbitrary graphs.
On Tsallis Entropy Bias and Generalized Maximum Entropy Models
Hou, Yuexian, Yan, Tingxu, Zhang, Peng, Song, Dawei, Li, Wenjie
In density estimation task, maximum entropy model (Maxent) can effectively use reliable prior information via certain constraints, i.e., linear constraints without empirical parameters. However, reliable prior information is often insufficient, and the selection of uncertain constraints becomes necessary but poses considerable implementation complexity. Improper setting of uncertain constraints can result in overfitting or underfitting. To solve this problem, a generalization of Maxent, under Tsallis entropy framework, is proposed. The proposed method introduces a convex quadratic constraint for the correction of (expected) Tsallis entropy bias (TEB). Specifically, we demonstrate that the expected Tsallis entropy of sampling distributions is smaller than the Tsallis entropy of the underlying real distribution. This expected entropy reduction is exactly the (expected) TEB, which can be expressed by a closed-form formula and act as a consistent and unbiased correction. TEB indicates that the entropy of a specific sampling distribution should be increased accordingly. This entails a quantitative re-interpretation of the Maxent principle. By compensating TEB and meanwhile forcing the resulting distribution to be close to the sampling distribution, our generalized TEBC Maxent can be expected to alleviate the overfitting and underfitting. We also present a connection between TEB and Lidstone estimator. As a result, TEB-Lidstone estimator is developed by analytically identifying the rate of probability correction in Lidstone. Extensive empirical evaluation shows promising performance of both TEBC Maxent and TEB-Lidstone in comparison with various state-of-the-art density estimation methods.
Ontology-supported processing of clinical text using medical knowledge integration for multi-label classification of diagnosis coding
Waraporn, Phanu, Meesad, Phayung, Clayton, Gareth
Abstract--This paper discusses the knowledge integration of clinical information extracted from distributed medical ontology in order to ameliorate a machine learning-based multi-label coding assignment system. The proposed approach is implemented using a decision tree based cascade hierarchical technique on the university hospital data for patients with Coronary Heart Disease (CHD). The preliminary results obtained show a satisfactory finding. An ontology is a specification of a conceptualization that defines and/or specifies the concepts, relationships, and other distinctions that are relevant for modeling a domain. Such specification takes the form of the definitions of representational vocabulary (classes, relations, and so on), which provide meanings to the vocabulary and formal constraints on its coherent use [3].
A Little More, a Lot Better: Improving Path Quality by a Simple Path Merging Algorithm
Raveh, Barak, Enosh, Angela, Halperin, Dan
Sampling-based motion planners are an effective means for generating collision-free motion paths. However, the quality of these motion paths (with respect to quality measures such as path length, clearance, smoothness or energy) is often notoriously low, especially in high-dimensional configuration spaces. We introduce a simple algorithm for merging an arbitrary number of input motion paths into a hybrid output path of superior quality, for a broad and general formulation of path quality. Our approach is based on the observation that the quality of certain sub-paths within each solution may be higher than the quality of the entire path. A dynamic-programming algorithm, which we recently developed for comparing and clustering multiple motion paths, reduces the running time of the merging algorithm significantly. We tested our algorithm in motion-planning problems with up to 12 degrees of freedom. We show that our algorithm is able to merge a handful of input paths produced by several different motion planners to produce output paths of much higher quality.
Exploratory Analysis of Functional Data via Clustering and Optimal Segmentation
Hébrail, Georges, Hugueney, Bernard, Lechevallier, Yves, Rossi, Fabrice
We propose in this paper an exploratory analysis algorithm for functional data. The method partitions a set of functions into $K$ clusters and represents each cluster by a simple prototype (e.g., piecewise constant). The total number of segments in the prototypes, $P$, is chosen by the user and optimally distributed among the clusters via two dynamic programming algorithms. The practical relevance of the method is shown on two real world datasets.
Inference with Transposable Data: Modeling the Effects of Row and Column Correlations
Allen, Genevera I., Tibshirani, Robert
We consider the problem of large-scale inference on the row or column variables of data in the form of a matrix. Often this data is transposable, meaning that both the row variables and column variables are of potential interest. An example of this scenario is detecting significant genes in microarrays when the samples or arrays may be dependent due to underlying relationships. We study the effect of both row and column correlations on commonly used test-statistics, null distributions, and multiple testing procedures, by explicitly modeling the covariances with the matrix-variate normal distribution. Using this model, we give both theoretical and simulation results revealing the problems associated with using standard statistical methodology on transposable data. We solve these problems by estimating the row and column covariances simultaneously, with transposable regularized covariance models, and de-correlating or sphering the data as a pre-processing step. Under reasonable assumptions, our method gives test statistics that follow the scaled theoretical null distribution and are approximately independent. Simulations based on various models with structured and observed covariances from real microarray data reveal that our method offers substantial improvements in two areas: 1) increased statistical power and 2) correct estimation of false discovery rates.
On the Schoenberg Transformations in Data Analysis: Theory and Illustrations
The class of Schoenberg transformations, embedding Euclidean distances into higher dimensional Euclidean spaces, is presented, and derived from theorems on positive definite and conditionally negative definite matrices. Original results on the arc lengths, angles and curvature of the transformations are proposed, and visualized on artificial data sets by classical multidimensional scaling. A simple distance-based discriminant algorithm illustrates the theory, intimately connected to the Gaussian kernels of Machine Learning.