Genre
PAC-Bayesian Generalization Bound on Confusion Matrix for Multi-Class Classification
Morvant, Emilie, Koço, Sokol, Ralaivola, Liva
In this work, we propose a PAC-Bayes bound for the generalization risk of the Gibbs classifier in the multi-class classification framework. The novelty of our work is the critical use of the confusion matrix of a classifier as an error measure; this puts our contribution in the line of work aiming at dealing with performance measure that are richer than mere scalar criterion such as the misclassification rate. Thanks to very recent and beautiful results on matrix concentration inequalities, we derive two bounds showing that the true confusion risk of the Gibbs classifier is upper-bounded by its empirical risk plus a term depending on the number of training examples in each class. To the best of our knowledge, this is the first PAC-Bayes bounds based on confusion matrices.
Generic identification of binary-valued hidden Markov processes
The generic identification problem is to decide whether a stochastic process $(X_t)$ is a hidden Markov process and if yes to infer its parameters for all but a subset of parametrizations that form a lower-dimensional subvariety in parameter space. Partial answers so far available depend on extra assumptions on the processes, which are usually centered around stationarity. Here we present a general solution for binary-valued hidden Markov processes. Our approach is rooted in algebraic statistics hence it is geometric in nature. We find that the algebraic varieties associated with the probability distributions of binary-valued hidden Markov processes are zero sets of determinantal equations which draws a connection to well-studied objects from algebra. As a consequence, our solution allows for algorithmic implementation based on elementary (linear) algebraic routines.
Optimally fuzzy temporal memory
Shankar, Karthik H., Howard, Marc W.
Any learner with the ability to predict the future of a structured time-varying signal must maintain a memory of the recent past. If the signal has a characteristic timescale relevant to future prediction, the memory can be a simple shift register---a moving window extending into the past, requiring storage resources that linearly grows with the timescale to be represented. However, an independent general purpose learner cannot a priori know the characteristic prediction-relevant timescale of the signal. Moreover, many naturally occurring signals show scale-free long range correlations implying that the natural prediction-relevant timescale is essentially unbounded. Hence the learner should maintain information from the longest possible timescale allowed by resource availability. Here we construct a fuzzy memory system that optimally sacrifices the temporal accuracy of information in a scale-free fashion in order to represent prediction-relevant information from exponentially long timescales. Using several illustrative examples, we demonstrate the advantage of the fuzzy memory system over a shift register in time series forecasting of natural signals. When the available storage resources are limited, we suggest that a general purpose learner would be better off committing to such a fuzzy memory system.
Distributed parameter estimation of discrete hierarchical models via marginal likelihoods
We consider discrete graphical models Markov with respect to a graph $G$ and propose two distributed marginal methods to estimate the maximum likelihood estimate of the canonical parameter of the model. Both methods are based on a relaxation of the marginal likelihood obtained by considering the density of the variables represented by a vertex $v$ of $G$ and a neighborhood. The two methods differ by the size of the neighborhood of $v$. We show that the estimates are consistent and that those obtained with the larger neighborhood have smaller asymptotic variance than the ones obtained through the smaller neighborhood.
Bayesian Extensions of Kernel Least Mean Squares
Park, Il Memming, Seth, Sohan, Van Vaerenbergh, Steven
The kernel least mean squares (KLMS) algorithm is a computationally efficient nonlinear adaptive filtering method that "kernelizes" the celebrated (linear) least mean squares algorithm. We demonstrate that the least mean squares algorithm is closely related to the Kalman filtering, and thus, the KLMS can be interpreted as an approximate Bayesian filtering method. This allows us to systematically develop extensions of the KLMS by modifying the underlying state-space and observation models. The resulting extensions introduce many desirable properties such as "forgetting", and the ability to learn from discrete data, while retaining the computational simplicity and time complexity of the original algorithm.
A Survey of Multi-Objective Sequential Decision-Making
Roijers, D. M., Vamplew, P., Whiteson, S., Dazeley, R.
Sequential decision-making problems with multiple objectives arise naturally in practice and pose unique challenges for research in decision-theoretic planning and learning, which has largely focused on single-objective settings. This article surveys algorithms designed for sequential decision-making problems with multiple objectives. Though there is a growing body of literature on this subject, little of it makes explicit under what circumstances special methods are needed to solve multi-objective problems. Therefore, we identify three distinct scenarios in which converting such a problem to a single-objective one is impossible, infeasible, or undesirable. Furthermore, we propose a taxonomy that classifies multi-objective methods according to the applicable scenario, the nature of the scalarization function (which projects multi-objective values to scalar ones), and the type of policies considered. We show how these factors determine the nature of an optimal solution, which can be a single policy, a convex hull, or a Pareto front. Using this taxonomy, we survey the literature on multi-objective methods for planning and learning. Finally, we discuss key applications of such methods and outline opportunities for future work.
Regularization in Relevance Learning Vector Quantization Using l one Norms
Riedel, Martin, Kästner, Marika, Rossi, Fabrice, Villmann, Thomas
We propose in this contribution a method for l one regularization in prototype based relevance learning vector quantization (LVQ) for sparse relevance profiles. Sparse relevance profiles in hyperspectral data analysis fade down those spectral bands which are not necessary for classification. In particular, we consider the sparsity in the relevance profile enforced by LASSO optimization. The latter one is obtained by a gradient learning scheme using a differentiable parametrized approximation of the $l_{1}$-norm, which has an upper error bound. We extend this regularization idea also to the matrix learning variant of LVQ as the natural generalization of relevance learning.
Kernel Multivariate Analysis Framework for Supervised Subspace Learning: A Tutorial on Linear and Kernel Multivariate Methods
Arenas-García, Jerónimo, Petersen, Kaare Brandt, Camps-Valls, Gustavo, Hansen, Lars Kai
Feature extraction and dimensionality reduction are important tasks in many fields of science dealing with signal processing and analysis. The relevance of these techniques is increasing as current sensory devices are developed with ever higher resolution, and problems involving multimodal data sources become more common. A plethora of feature extraction methods are available in the literature collectively grouped under the field of Multivariate Analysis (MVA). This paper provides a uniform treatment of several methods: Principal Component Analysis (PCA), Partial Least Squares (PLS), Canonical Correlation Analysis (CCA) and Orthonormalized PLS (OPLS), as well as their non-linear extensions derived by means of the theory of reproducing kernel Hilbert spaces. We also review their connections to other methods for classification and statistical dependence estimation, and introduce some recent developments to deal with the extreme cases of large-scale and low-sized problems. To illustrate the wide applicability of these methods in both classification and regression problems, we analyze their performance in a benchmark of publicly available data sets, and pay special attention to specific real applications involving audio processing for music genre prediction and hyperspectral satellite images for Earth and climate monitoring.
On the Internal Topological Structure of Plane Regions
The study of topological information of spatial objects has for a long time been a focus of research in disciplines like computational geometry, spatial reasoning, cognitive science, and robotics. While the majority of these researches emphasised the topological relations between spatial objects, this work studies the internal topological structure of bounded plane regions, which could consist of multiple pieces and/or have holes and islands to any finite level. The insufficiency of simple regions (regions homeomorphic to closed disks) to cope with the variety and complexity of spatial entities and phenomena has been widely acknowledged. Another significant drawback of simple regions is that they are not closed under set operations union, intersection, and difference. This paper considers bounded semi-algebraic regions, which are closed under set operations and can closely approximate most plane regions arising in practice.
On the Suitable Domain for SVM Training in Image Coding
Camps-Valls, Gustavo, Gutiérrez, Juan, Gómez-Pérez, Gabriel, Malo, Jesús
Conventional SVM-based image coding methods are founded on independently restricting the distortion in every image coefficient at some particular image representation. Geometrically, this implies allowing arbitrary signal distortions in an $n$-dimensional rectangle defined by the $\varepsilon$-insensitivity zone in each dimension of the selected image representation domain. Unfortunately, not every image representation domain is well-suited for such a simple, scalar-wise, approach because statistical and/or perceptual interactions between the coefficients may exist. These interactions imply that scalar approaches may induce distortions that do not follow the image statistics and/or are perceptually annoying. Taking into account these relations would imply using non-rectangular $\varepsilon$-insensitivity regions (allowing coupled distortions in different coefficients), which is beyond the conventional SVM formulation. In this paper, we report a condition on the suitable domain for developing efficient SVM image coding schemes. We analytically demonstrate that no linear domain fulfills this condition because of the statistical and perceptual inter-coefficient relations that exist in these domains. This theoretical result is experimentally confirmed by comparing SVM learning in previously reported linear domains and in a recently proposed non-linear perceptual domain that simultaneously reduces the statistical and perceptual relations (so it is closer to fulfilling the proposed condition). These results highlight the relevance of an appropriate choice of the image representation before SVM learning.