AITopics | Genre

Collaborating Authors

Genre

PAC-Bayesian Generalization Bound on Confusion Matrix for Multi-Class Classification

Morvant, Emilie, Koço, Sokol, Ralaivola, Liva

arXiv.org Machine LearningOct-22-2013

In this work, we propose a PAC-Bayes bound for the generalization risk of the Gibbs classifier in the multi-class classification framework. The novelty of our work is the critical use of the confusion matrix of a classifier as an error measure; this puts our contribution in the line of work aiming at dealing with performance measure that are richer than mere scalar criterion such as the misclassification rate. Thanks to very recent and beautiful results on matrix concentration inequalities, we derive two bounds showing that the true confusion risk of the Gibbs classifier is upper-bounded by its empirical risk plus a term depending on the number of training examples in each class. To the best of our knowledge, this is the first PAC-Bayes bounds based on confusion matrices.

artificial intelligence, machine learning, matrix, (14 more...)

arXiv.org Machine Learning

1202.6228

Country: Europe (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)

Add feedback

Generic identification of binary-valued hidden Markov processes

Schönhuth, Alexander

arXiv.org Machine LearningOct-22-2013

The generic identification problem is to decide whether a stochastic process $(X_t)$ is a hidden Markov process and if yes to infer its parameters for all but a subset of parametrizations that form a lower-dimensional subvariety in parameter space. Partial answers so far available depend on extra assumptions on the processes, which are usually centered around stationarity. Here we present a general solution for binary-valued hidden Markov processes. Our approach is rooted in algebraic statistics hence it is geometric in nature. We find that the algebraic varieties associated with the probability distributions of binary-valued hidden Markov processes are zero sets of determinantal equations which draws a connection to well-studied objects from algebra. As a consequence, our solution allows for algorithmic implementation based on elementary (linear) algebraic routines.

artificial intelligence, generic identification, machine learning, (17 more...)

arXiv.org Machine Learning

1101.3712

Country:

North America > United States (0.68)
Europe > United Kingdom > England (0.28)

Genre:

Research Report (0.50)
Instructional Material > Course Syllabus & Notes (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Optimally fuzzy temporal memory

Shankar, Karthik H., Howard, Marc W.

arXiv.org Artificial IntelligenceOct-22-2013

Any learner with the ability to predict the future of a structured time-varying signal must maintain a memory of the recent past. If the signal has a characteristic timescale relevant to future prediction, the memory can be a simple shift register---a moving window extending into the past, requiring storage resources that linearly grows with the timescale to be represented. However, an independent general purpose learner cannot a priori know the characteristic prediction-relevant timescale of the signal. Moreover, many naturally occurring signals show scale-free long range correlations implying that the natural prediction-relevant timescale is essentially unbounded. Hence the learner should maintain information from the longest possible timescale allowed by resource availability. Here we construct a fuzzy memory system that optimally sacrifices the temporal accuracy of information in a scale-free fashion in order to represent prediction-relevant information from exponentially long timescales. Using several illustrative examples, we demonstrate the advantage of the fuzzy memory system over a shift register in time series forecasting of natural signals. When the available storage resources are limited, we suggest that a general purpose learner would be better off committing to such a fuzzy memory system.

artificial intelligence, machine learning, node, (17 more...)

arXiv.org Artificial Intelligence

1211.5189

Country: North America > United States (0.46)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Hardware (0.99)
Information Technology > Data Science (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Distributed parameter estimation of discrete hierarchical models via marginal likelihoods

Massam, Helene, Wang, Nanwei

arXiv.org Machine LearningOct-21-2013

We consider discrete graphical models Markov with respect to a graph $G$ and propose two distributed marginal methods to estimate the maximum likelihood estimate of the canonical parameter of the model. Both methods are based on a relaxation of the marginal likelihood obtained by considering the density of the variables represented by a vertex $v$ of $G$ and a neighborhood. The two methods differ by the size of the neighborhood of $v$. We show that the estimates are consistent and that those obtained with the larger neighborhood have smaller asymptotic variance than the ones obtained through the smaller neighborhood.

artificial intelligence, machine learning, variance, (18 more...)

arXiv.org Machine Learning

1310.5666

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.35)

Add feedback

Bayesian Extensions of Kernel Least Mean Squares

Park, Il Memming, Seth, Sohan, Van Vaerenbergh, Steven

arXiv.org Machine LearningOct-20-2013

The kernel least mean squares (KLMS) algorithm is a computationally efficient nonlinear adaptive filtering method that "kernelizes" the celebrated (linear) least mean squares algorithm. We demonstrate that the least mean squares algorithm is closely related to the Kalman filtering, and thus, the KLMS can be interpreted as an approximate Bayesian filtering method. This allows us to systematically develop extensions of the KLMS by modifying the underlying state-space and observation models. The resulting extensions introduce many desirable properties such as "forgetting", and the ability to learn from discrete data, while retaining the computational simplicity and time complexity of the original algorithm.

artificial intelligence, bayesian inference, machine learning, (16 more...)

arXiv.org Machine Learning

1310.5347

Country:

North America > United States (0.68)
Europe (0.46)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Add feedback

A Survey of Multi-Objective Sequential Decision-Making

Roijers, D. M., Vamplew, P., Whiteson, S., Dazeley, R.

Journal of Artificial Intelligence ResearchOct-18-2013

Sequential decision-making problems with multiple objectives arise naturally in practice and pose unique challenges for research in decision-theoretic planning and learning, which has largely focused on single-objective settings. This article surveys algorithms designed for sequential decision-making problems with multiple objectives. Though there is a growing body of literature on this subject, little of it makes explicit under what circumstances special methods are needed to solve multi-objective problems. Therefore, we identify three distinct scenarios in which converting such a problem to a single-objective one is impossible, infeasible, or undesirable. Furthermore, we propose a taxonomy that classifies multi-objective methods according to the applicable scenario, the nature of the scalarization function (which projects multi-objective values to scalar ones), and the type of policies considered. We show how these factors determine the nature of an optimal solution, which can be a single policy, a convex hull, or a Pareto front. Using this taxonomy, we survey the literature on multi-objective methods for planning and learning. Finally, we discuss key applications of such methods and outline opportunities for future work.

momdp, objective, scalarization function, (15 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.3987

AI Access Foundation

10836

Journal of Artificial Intelligence Research

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Austria > Vienna (0.14)
Europe > Netherlands > North Holland > Amsterdam (0.04)
(10 more...)

Genre:

Overview (1.00)
Research Report > Experimental Study (0.92)

Industry:

Energy > Power Industry (1.00)
Banking & Finance (1.00)
Transportation > Ground > Road (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Regularization in Relevance Learning Vector Quantization Using l one Norms

Riedel, Martin, Kästner, Marika, Rossi, Fabrice, Villmann, Thomas

arXiv.org Machine LearningOct-18-2013

We propose in this contribution a method for l one regularization in prototype based relevance learning vector quantization (LVQ) for sparse relevance profiles. Sparse relevance profiles in hyperspectral data analysis fade down those spectral bands which are not necessary for classification. In particular, we consider the sparsity in the relevance profile enforced by LASSO optimization. The latter one is obtained by a gradient learning scheme using a differentiable parametrized approximation of the $l_{1}$-norm, which has an upper error bound. We extend this regularization idea also to the matrix learning variant of LVQ as the natural generalization of relevance learning.

artificial intelligence, machine learning, relevance profile, (11 more...)

arXiv.org Machine Learning

1310.5095

Country:

North America > United States (0.15)
Europe > Germany (0.15)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.31)

Add feedback

Kernel Multivariate Analysis Framework for Supervised Subspace Learning: A Tutorial on Linear and Kernel Multivariate Methods

Arenas-García, Jerónimo, Petersen, Kaare Brandt, Camps-Valls, Gustavo, Hansen, Lars Kai

arXiv.org Machine LearningOct-18-2013

Feature extraction and dimensionality reduction are important tasks in many fields of science dealing with signal processing and analysis. The relevance of these techniques is increasing as current sensory devices are developed with ever higher resolution, and problems involving multimodal data sources become more common. A plethora of feature extraction methods are available in the literature collectively grouped under the field of Multivariate Analysis (MVA). This paper provides a uniform treatment of several methods: Principal Component Analysis (PCA), Partial Least Squares (PLS), Canonical Correlation Analysis (CCA) and Orthonormalized PLS (OPLS), as well as their non-linear extensions derived by means of the theory of reproducing kernel Hilbert spaces. We also review their connections to other methods for classification and statistical dependence estimation, and introduce some recent developments to deal with the extreme cases of large-scale and low-sized problems. To illustrate the wide applicability of these methods in both classification and regression problems, we analyze their performance in a benchmark of publicly available data sets, and pay special attention to specific real applications involving audio processing for music genre prediction and hyperspectral satellite images for Earth and climate monitoring.

artificial intelligence, machine learning, mva method, (17 more...)

arXiv.org Machine Learning

doi: 10.1109/MSP.2013.2250591

1310.5089

Country:

North America > United States (1.00)
Europe (1.00)

Genre:

Research Report (0.82)
Overview (0.66)

Industry: Education (0.64)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

On the Internal Topological Structure of Plane Regions

Li, Sanjiang

arXiv.org Artificial IntelligenceOct-18-2013

The study of topological information of spatial objects has for a long time been a focus of research in disciplines like computational geometry, spatial reasoning, cognitive science, and robotics. While the majority of these researches emphasised the topological relations between spatial objects, this work studies the internal topological structure of bounded plane regions, which could consist of multiple pieces and/or have holes and islands to any finite level. The insufficiency of simple regions (regions homeomorphic to closed disks) to cope with the variety and complexity of spatial entities and phenomena has been widely acknowledged. Another significant drawback of simple regions is that they are not closed under set operations union, intersection, and difference. This paper considers bounded semi-algebraic regions, which are closed under set operations and can closely approximate most plane regions arising in practice.

artificial intelligence, simple region, spatial reasoning, (16 more...)

arXiv.org Artificial Intelligence

0909.0109

Country: North America > United States (0.28)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)

Add feedback

On the Suitable Domain for SVM Training in Image Coding

Camps-Valls, Gustavo, Gutiérrez, Juan, Gómez-Pérez, Gabriel, Malo, Jesús

arXiv.org Machine LearningOct-18-2013

Conventional SVM-based image coding methods are founded on independently restricting the distortion in every image coefficient at some particular image representation. Geometrically, this implies allowing arbitrary signal distortions in an $n$-dimensional rectangle defined by the $\varepsilon$-insensitivity zone in each dimension of the selected image representation domain. Unfortunately, not every image representation domain is well-suited for such a simple, scalar-wise, approach because statistical and/or perceptual interactions between the coefficients may exist. These interactions imply that scalar approaches may induce distortions that do not follow the image statistics and/or are perceptually annoying. Taking into account these relations would imply using non-rectangular $\varepsilon$-insensitivity regions (allowing coupled distortions in different coefficients), which is beyond the conventional SVM formulation. In this paper, we report a condition on the suitable domain for developing efficient SVM image coding schemes. We analytically demonstrate that no linear domain fulfills this condition because of the statistical and perceptual inter-coefficient relations that exist in these domains. This theoretical result is experimentally confirmed by comparing SVM learning in previously reported linear domains and in a recently proposed non-linear perceptual domain that simultaneously reduces the statistical and perceptual relations (so it is closer to fulfilling the proposed condition). These results highlight the relevance of an appropriate choice of the image representation before SVM learning.

artificial intelligence, coefficient, machine learning, (17 more...)

arXiv.org Machine Learning

1310.5082

Country:

Europe > Spain (0.28)
North America > United States (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.71)

Add feedback