AITopics

New functionals for parameter (model) selection of Support Vector Machines are introduced based on the concepts of the span of support vectors and rescaling of the feature space. It is shown that using these functionals, one can both predict the best choice of parameters of the model and the relative quality of performance for any value of parameter.

leave-one-out procedure, support vector, vector, (13 more...)

Country:

North America > United States > New York (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Burges, Christopher J. C., Crisp, David J.

Uniqueness of the SVM Solution

We give necessary and sufficient conditions for uniqueness of the support vector solution for the problems of pattern recognition and regression estimation, for a general class of cost functions. We show that if the solution is not unique, all support vectors are necessarily at bound, and we give some simple examples of non-unique solutions. We note that uniqueness of the primal (dual) solution does not necessarily imply uniqueness of the dual (primal) solution. We show how to compute the threshold b when the solution is unique, but when all support vectors are at bound, in which case the usual method for determining b does not work. 1 Introduction Support vector machines (SVMs) have attracted wide interest as a means to implement structural risk minimization for the problems of classification and regression estimation. The fact that training an SVM amounts to solving a convex quadratic programming problem means that the solution found is global, and that if it is not unique, then the set of global solutions is itself convex; furthermore, if the objective function is strictly convex, the solution is guaranteed to be unique [1]1.

convex, objective function, support vector, (13 more...)

Country:

Oceania > Australia > South Australia > Adelaide (0.04)
North America > United States > Wisconsin (0.04)
North America > United States > New York (0.04)
(4 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Buhmann, Joachim M., Held, Marcus

Model Selection in Clustering by Uniform Convergence Bounds

Unsupervised learning algorithms are designed to extract structure from data samples. Reliable and robust inference requires a guarantee that extracted structures are typical for the data source, Le., similar structures have to be inferred from a second sample set of the same data source. The overfitting phenomenon in maximum entropy based annealing algorithms is exemplarily studied for a class of histogram clustering models. Bernstein's inequality for large deviations is used to determine the maximally achievable approximation quality parameterized by a minimal temperature. Monte Carlo simulations support the proposed model selection criterion by finite temperature annealing.

algorithm, empirical risk, hypothesis class, (15 more...)

Country:

North America > United States > New York (0.05)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Europe > Germany > North Rhine-Westphalia > Cologne Region > Bonn (0.04)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

A Variational Baysian Framework for Graphical Models

Attias, Hagai

This paper presents a novel practical framework for Bayesian model averaging and model selection in probabilistic graphical models. Our approach approximates full posterior distributions over model parameters and structures, as well as latent variables, in an analytical manner. These posteriors fall out of a free-form optimization procedure, which naturally incorporates conjugate priors. Unlike in large sample approximations, the posteriors are generally non Gaussian and no Hessian needs to be computed. Predictive quantities are obtained analytically. The resulting algorithm generalizes the standard Expectation Maximization algorithm, and its convergence is guaranteed. We demonstrate that this approach can be applied to a large class of models in several domains, including mixture models and source separation. 1 Introduction

algorithm, posterior, quantity, (15 more...)

Country:

Asia > Middle East > Jordan (0.05)
North America > United States > Massachusetts > Plymouth County > Norwell (0.04)
Europe > United Kingdom (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

A Variational Baysian Framework for Graphical Models

Attias, Hagai

algorithm, parameter posterior, posterior, (16 more...)

Country:

Asia > Middle East > Jordan (0.05)
North America > United States > Massachusetts > Plymouth County > Norwell (0.04)
Europe > United Kingdom (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Wan, Eric A., Merwe, Rudolph van der, Nelson, Alex T.

Dual Estimation and the Unscented Transformation

Dual estimation refers to the problem of simultaneously estimating the state of a dynamic system and the model which gives rise to the dynamics.

algorithm, ekf, estimation, (16 more...)

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Oregon > Washington County > Beaverton (0.04)
North America > United States > Florida > Orange County > Orlando (0.04)
North America > United States > California > San Mateo County > San Mateo (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.96)

Sundararajan, S., Keerthi, S. Sathiya

Predictive App roaches for Choosing Hyperparameters in Gaussian Processes

Gaussian Processes are powerful regression models specified by parametrized mean and covariance functions. Standard approaches to estimate these parameters (known by the name Hyperparameters) areMaximum Likelihood (ML) and Maximum APosterior (MAP) approaches. In this paper, we propose and investigate predictive approaches,namely, maximization of Geisser's Surrogate Predictive Probability (GPP) and minimization of mean square error withrespect to GPP (referred to as Geisser's Predictive mean square Error (GPE)) to estimate the hyperparameters. We also derive results for the standard Cross-Validation (CV) error and make a comparison. These approaches are tested on a number of problems and experimental results show that these approaches are strongly competitive to existing approaches. 1 Introduction Gaussian Processes (GPs) are powerful regression models that have gained popularity recently,though they have appeared in different forms in the literature for years.

choosing hyperparameter, gaussian process, rasmussen, (12 more...)

Country:

North America > Canada > Ontario > Toronto (0.15)
Asia > Singapore (0.05)
North America > United States > New York (0.04)
Asia > India > Karnataka > Bengaluru (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.87)

The Relaxed Online Maximum Margin Algorithm

Li, Yi, Long, Philip M.

We describe a new incremental algorithm for training linear threshold functions:the Relaxed Online Maximum Margin Algorithm, or ROMMA. ROMMA can be viewed as an approximation to the algorithm that repeatedly chooses the hyperplane that classifies previously seen examples correctlywith the maximum margin. It is known that such a maximum-margin hypothesis can be computed by minimizing the length of the weight vector subject to a number of linear constraints. ROMMA works by maintaining a relatively simple relaxation of these constraints that can be efficiently updated. We prove a mistake bound for ROMMA that is the same as that proved for the perceptron algorithm. Our analysis implies that the more computationally intensive maximum-margin algorithm alsosatisfies this mistake bound; this is the first worst-case performance guaranteefor this algorithm. We describe some experiments using ROMMA and a variant that updates its hypothesis more aggressively as batch algorithms to recognize handwritten digits. The computational complexity and simplicity of these algorithms is similar to that of perceptron algorithm,but their generalization is much better. We describe a sense in which the performance of ROMMA converges to that of SVM in the limit if bias isn't considered.

algorithm, perceptron algorithm, romma, (9 more...)

Country:

North America > United States > District of Columbia > Washington (0.04)
Asia > Singapore > Central Region > Singapore (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (1.00)

Mjolsness, Eric, Mann, Tobias, Castaño, Rebecca, Wold, Barbara J.

From Coexpression to Coregulation: An Approach to Inferring Transcriptional Regulation among Gene Classes from Large-Scale Expression Data

We provide preliminary evidence that eXlstmg algorithms for inferring small-scale gene regulation networks from gene expression data can be adapted to large-scale gene expression data coming from hybridization microarrays. The essential steps are (1) clustering many genes by their expression time-course data into a minimal set of clusters of co-expressed genes, (2) theoretically modeling the various conditions under which the time-courses are measured using a continious-time analog recurrent neural network for the cluster mean time-courses, (3) fitting such a regulatory model to the cluster mean time courses by simulated annealing with weight decay, and (4) analysing several such fits for commonalities in the circuit parameter sets including the connection matrices. This procedure can be used to assess the adequacy of existing and future gene expression time-course data sets for determ ining transcriptional regulatory relationships such as coregulation.

algorithm, inferring transcriptional regulation, time course, (12 more...)

Country:

North America > United States > California > Los Angeles County > Pasadena (0.05)
North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
North America > United States > Connecticut > New Haven County > New Haven (0.04)
Europe > Denmark (0.04)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

III, John W. Fisher, Ihler, Alexander T., Viola, Paul A.

Learning Informative Statistics: A Nonparametnic Approach

We discuss an information theoretic approach for categorizing and modeling dynamicprocesses. The approach can learn a compact and informative statistic which summarizes past states to predict future observations. Furthermore, the uncertainty of the prediction is characterized nonparametrically bya joint density over the learned statistic and present observation. We discuss the application of the technique to both noise driven dynamical systems and random processes sampled from a density which is conditioned on the past. In the first case we show results in which both the dynamics of random walk and the statistics of the driving noise are captured. In the second case we present results in which a summarizing statistic is learned on noisy random telegraph waves with differing dependencies onpast states. In both cases the algorithm yields a principled approach for discriminating processes with differing dynamics and/or dependencies. Themethod is grounded in ideas from information theory and nonparametric statistics.

artificial intelligence, machine learning, statistics, (17 more...)

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)