AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Neural Information Processing SystemsDec-31-2011

t-divergence Based Approximate Inference

Ding, Nan, Qi, Yuan, Vishwanathan, S.v.n.

Approximate inference is an important technique for dealing with large, intractable graphical models based on the exponential family of distributions. We extend the idea of approximate inference to the t-exponential family by defining a new t-divergence. This divergence measure is obtained via convex duality between the log-partition function of the t-exponential family and a new t-entropy. We illustrate our approach on the Bayes Point Machine with a Student's t-prior.

artificial intelligence, bayesian inference, t-exponential family, (15 more...)

Country: North America > United States (0.14)

Genre: Research Report (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Multiple Kernel Learning and the SMO Algorithm

Sun, Zhaonan, Ampornpunt, Nawanol, Varma, Manik, Vishwanathan, S.v.n.

Our objective is to train $p$-norm Multiple Kernel Learning (MKL) and, more generally, linear MKL regularised by the Bregman divergence, using the Sequential Minimal Optimization (SMO) algorithm. The SMO algorithm is simple, easy to implement and adapt, and efficiently scales to large problems. As a result, it has gained widespread acceptance and SVMs are routinely trained using SMO in diverse real world applications. Training using SMO has been a long standing goal in MKL for the very same reasons. Unfortunately, the standard MKL dual is not differentiable, and therefore can not be optimised using SMO style co-ordinate ascent. In this paper, we demonstrate that linear MKL regularised with the $p$-norm squared, or with certain Bregman divergences, can indeed be trained using SMO. The resulting algorithm retains both simplicity and efficiency and is significantly faster than the state-of-the-art specialised $p$-norm MKL solvers. We show that we can train on a hundred thousand kernels in approximately seven minutes and on fifty thousand points in less than half an hour on a single core.

algorithm, artificial intelligence, machine learning, (13 more...)

Country: Asia (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

Multitask Learning without Label Correspondences

Quadrianto, Novi, Petterson, James, Caetano, Tibério S., Smola, Alex J., Vishwanathan, S.v.n.

We propose an algorithm to perform multitask learning where each task has potentially distinct label sets and label correspondences are not readily available. This is in contrast with existing methods which either assume that the label sets shared by different tasks are the same or that there exists a label mapping oracle. Our method directly maximizes the mutual information among the labels, and we show that the resulting objective function can be efficiently optimized using existing algorithms. Our proposed approach has a direct application for data integration with different label spaces for the purpose of classification, such as integrating Yahoo! and DMOZ web directories.

artificial intelligence, machine learning, yahoo, (16 more...)

Country: North America > United States > Indiana > Tippecanoe County (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.62)

t-logistic regression

Ding, Nan, Vishwanathan, S.v.n.

We extend logistic regression by using t-exponential families which were introduced recently in statistical physics. This gives rise to a regularized risk minimization problem with a non-convex loss function. An efficient block coordinate descent optimization scheme can be derived for estimating the parameters. Because of the nature of the loss function, our algorithm is tolerant to label noise. Furthermore, unlike other algorithms which employ non-convex loss functions, our algorithm is fairly robust to the choice of initial values. We verify both these observations empirically on a number of synthetic and real datasets.

artificial intelligence, logistic regression, machine learning, (16 more...)

Country:

North America > United States (0.29)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre:

Research Report > Experimental Study (0.77)
Research Report > New Finding (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.77)

Lower Bounds on Rate of Convergence of Cutting Plane Methods

Zhang, Xinhua, Saha, Ankan, Vishwanathan, S.v.n.

In a recent paper Joachims (2006) presented SVM-Perf, a cutting plane method (CPM) for training linear Support Vector Machines (SVMs) which converges to an $\epsilon$ accurate solution in $O(1/\epsilon^{2})$ iterations. By tightening the analysis, Teo et al. (2010) showed that $O(1/\epsilon)$ iterations suffice. Given the impressive convergence speed of CPM on a number of practical problems, it was conjectured that these rates could be further improved. In this paper we disprove this conjecture. We present counter examples which are not only applicable for training linear SVMs with hinge loss, but also hold for support vector methods which optimize a \emph{multivariate} performance score. However, surprisingly, these problems are not inherently hard. By exploiting the structure of the objective function we can devise an algorithm that converges in $O(1/\sqrt{\epsilon})$ iterations.

algorithm, artificial intelligence, optimization problem, (17 more...)

Country: North America > Canada > Alberta (0.14)

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Neural Information Processing SystemsDec-31-2008

Bundle Methods for Machine Learning

Le, Quoc V., Smola, Alex J., Vishwanathan, S.v.n.

We present a globally convergent method for regularized risk minimization problems. Ourmethod applies to Support Vector estimation, regression, Gaussian Processes, and any other regularized risk minimization setting which leads to a convex optimization problem. SVMPerf can be shown to be a special case of our approach. In addition to the unified framework we present tight convergence bounds, which show that our algorithm converges in O(1/ɛ) steps to ɛ precision for general convex problems and in O(log(1/ɛ)) steps for continuously differentiable problems.We demonstrate in experiments the performance of our approach.

artificial intelligence, convergence, optimization problem, (17 more...)

Country: Oceania > Australia (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.34)

Neural Information Processing SystemsDec-31-2007

Fast Computation of Graph Kernels

Borgwardt, Karsten M., Schraudolph, Nicol N., Vishwanathan, S.v.n.

Using extensions of linear algebra concepts to Reproducing Kernel Hilbert Spaces (RKHS), we define a unifying framework for random walk kernels on graphs.

artificial intelligence, graph, machine learning, (16 more...)

Country:

Oceania > Australia (0.30)
Europe > Germany (0.28)
North America > United States > Massachusetts > Middlesex County (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.49)

Neural Information Processing SystemsDec-31-2007

implicit Online Learning with Kernels

Cheng, Li, Schuurmans, Dale, Wang, Shaojun, Caelli, Terry, Vishwanathan, S.v.n.

The learner then updates its parameter vector to minimize a riskfunctional, and the process repeats.

algorithm, computer based training, educational technology, (22 more...)

Country:

Oceania > Australia (0.29)
North America > Canada > Alberta (0.14)

Industry: Education > Educational Setting > Online (0.42)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.42)

Neural Information Processing SystemsDec-31-2007

Fast Iterative Kernel PCA

Schraudolph, Nicol N., Günter, Simon, Vishwanathan, S.v.n.

We introduce two methods to improve convergence of the Kernel Hebbian Algorithm (KHA)for iterative kernel PCA. KHA has a scalar gain parameter which is either held constant or decreased as 1/t, leading to slow convergence. Our KHA/et algorithm accelerates KHA by incorporating the reciprocal of the current estimated eigenvalues as a gain vector. We then derive and apply Stochastic Meta-Descent (SMD) to KHA/et; this further speeds convergence by performing gain adaptation in RKHS. Experimental results for kernel PCA and spectral clustering of USPS digits as well as motion capture and image de-noising problems confirm that our methods converge substantially faster than conventional KHA.

algorithm, artificial intelligence, machine learning, (16 more...)

Country: North America > United States (0.90)

Industry: Government > Regional Government (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)