Goto

Collaborating Authors

 Support Vector Machines


Adaptive Stochastic Resource Control: A Machine Learning Approach

Journal of Artificial Intelligence Research

The paper investigates stochastic resource allocation problems with scarce, reusable resources and non-preemtive, time-dependent, interconnected tasks. This approach is a natural generalization of several standard resource management problems, such as scheduling and transportation problems. First, reactive solutions are considered and defined as control policies of suitably reformulated Markov decision processes (MDPs). We argue that this reformulation has several favorable properties, such as it has finite state and action spaces, it is aperiodic, hence all policies are proper and the space of control policies can be safely restricted. Next, approximate dynamic programming (ADP) methods, such as fitted Q-learning, are suggested for computing an efficient control policy. In order to compactly maintain the cost-to-go function, two representations are studied: hash tables and support vector regression (SVR), particularly, nu-SVRs. Several additional improvements, such as the application of limited-lookahead rollout algorithms in the initial phases, action space decomposition, task clustering and distributed sampling are investigated, too. Finally, experimental results on both benchmark and industry-related data are presented.


Decoding Beta-Decay Systematics: A Global Statistical Model for Beta^- Halflives

arXiv.org Machine Learning

Rev. C) Statistical modeling of nuclear data provides a novel approach to nuclear systematics complementary to established theoretical and phenomenological approaches based on quantum theory. More specifically, fully-connected, multilayer feedforward artificial neural network models are developed using the Levenberg-Marquardt optimization algorithm together with Bayesian regularization and cross-validation. The predictive performance of models emerging from extensive computer experiments is compared with that of traditional microscopic and phenomenological models as well as with the performance of other learning systems, including earlier neural network models as well as the support vector machines recently applied to the same problem. In discussing the results, emphasis is placed on predictions for nuclei that are far from the stability line, and especially those involved in the r-process nucleosynthesis. It is found that the new statistical models can match or even surpass the predictive performance of conventional models for beta-decay systematics and accordingly should provide a valuable additional tool for exploring the expanding nuclear landscape. I. INTRODUCTION "Numbers are the within of all things." Among nuclear physicists this need is driven both by the experimental programs of existing and future radioactive ion beam facilities and by the stresses placed on established nuclear structure theory as totally new areas of the nuclear landscape are opened for exploration. For nuclear astrophysicists, such information is intrinsic to an understanding of supernova explosions - the initialization of the explosion, the subsequent neutronization of the core material, and the strength and fate of the shock wave formed - and the nucleosynthesis of heavy elements above Fe, notably the r-process [3, 4, 5]. Both the element distribution on the r-path and the time scale of the r-process are highly sensitive to the β-decay properties of the neutron-rich nuclei involved. Except for a few key nuclei, β decay of r-process nuclei cannot be studied in terrestrial laboratories, so the required information must come from nuclear models. These include the more phenomenological treatments, such as the Gross Theory (GT), as well as microscopic approaches based on the shell model and the proton-neutron Quasiparticle Random-Phase Approximation (pnQRPA) in various versions.


FINE: Fisher Information Non-parametric Embedding

arXiv.org Machine Learning

We consider the problems of clustering, classification, and visualization of high-dimensional data when no straightforward Euclidean representation exists. Typically, these tasks are performed by first reducing the high-dimensional data to some lower dimensional Euclidean space, as many manifold learning methods have been developed for this task. In many practical problems however, the assumption of a Euclidean manifold cannot be justified. In these cases, a more appropriate assumption would be that the data lies on a statistical manifold, or a manifold of probability density functions (PDFs). In this paper we propose using the properties of information geometry in order to define similarities between data sets using the Fisher information metric. We will show this metric can be approximated using entirely non-parametric methods, as the parameterization of the manifold is generally unknown. Furthermore, by using multi-dimensional scaling methods, we are able to embed the corresponding PDFs into a low-dimensional Euclidean space. This not only allows for classification of the data, but also visualization of the manifold. As a whole, we refer to our framework as Fisher Information Non-parametric Embedding (FINE), and illustrate its uses on a variety of practical problems, including bio-medical applications and document classification.



Generalized Maximum Margin Clustering and Unsupervised Kernel Learning

Neural Information Processing Systems

Maximum margin clustering was proposed lately and has shown promising performance in recent studies [1, 2]. It extends the theory of support vector machine to unsupervised learning. Despite its good performance, there are three major problems with maximum margin clustering that question its efficiency for real-world applications. First, it is computationally expensive and difficult to scale to large-scale datasets because the number of parameters in maximum margin clustering is quadratic in the number of examples. Second, it requires data preprocessing to ensure that any clustering boundary will pass through the origins, which makes it unsuitable for clustering unbalanced dataset. Third, it is sensitive to the choice of kernel functions, and requires external procedure to determine the appropriate values for the parameters of kernel functions. In this paper, we propose "generalized maximum margin clustering" framework that addresses the above three problems simultaneously.


Large Margin Component Analysis

Neural Information Processing Systems

Metric learning has been shown to significantly improve the accuracy of k-nearest neighbor (kNN) classification. In problems involving thousands of features, distance learning algorithms cannot be used due to overfitting and high computational complexity. In such cases, previous work has relied on a two-step solution: first apply dimensionality reduction methods to the data, and then learn a metric in the resulting low-dimensional subspace. In this paper we show that better classification performance can be achieved by unifying the objectives of dimensionality reduction and metric learning. We propose a method that solves for the low-dimensional projection of the inputs, which minimizes a metric objective aimed at separating points in different classes by a large margin. This projection is defined by a significantly smaller number of parameters than metrics learned in input space, and thus our optimization reduces the risks of overfitting. Theory and results are presented for both a linear as well as a kernelized version of the algorithm. Overall, we achieve classification rates similar, and in several cases superior, to those of support vector machines.


An Oracle Inequality for Clipped Regularized Risk Minimizers

Neural Information Processing Systems

We establish a general oracle inequality for clipped approximate minimizers of regularized empirical risks and apply this inequality to support vector machine (SVM) type algorithms. We then show that for SVMs using Gaussian RBF kernels for classification this oracle inequality leads to learning rates that are faster than the ones established in [9]. Finally, we use our oracle inequality to show that a simple parameter selection approach based on a validation set can yield the same fast learning rates without knowing the noise exponents which were required to be known a-priori in [9].


An Efficient Method for Gradient-Based Adaptation of Hyperparameters in SVM Models

Neural Information Processing Systems

We consider the task of tuning hyperparameters in SVM models based on minimizing a smooth performance validation function, e.g., smoothed k-fold crossvalidation error, using nonlinear optimization techniques. The key computation in this approach is that of the gradient of the validation function with respect to hyperparameters. We show that for large-scale problems involving a wide choice of kernel-based models and validation functions, this computation can be very efficiently done; often within just a fraction of the training time. Empirical results show that a near-optimal set of hyperparameters can be identified by our approach with very few training rounds and gradient computations. .


Support Vector Machines on a Budget

Neural Information Processing Systems

The standard Support Vector Machine formulation does not provide its user with the ability to explicitly control the number of support vectors used to define the generated classifier. We present a modified version of SVM that allows the user to set a budget parameter B and focuses on minimizing the loss attained by the B worst-classified examples while ignoring the remaining examples. This idea can be used to derive sparse versions of both L1-SVM and L2-SVM. Technically, we obtain these new SVM variants by replacing the 1-norm in the standard SVM formulation with various interpolation-norms. We also adapt the SMO optimization algorithm to our setting and report on some preliminary experimental results.