Technology
A Feature Selection Algorithm Based on the Global Minimization of a Generalization Error Bound
A novel linear feature selection algorithm is presented based on the global minimization of a data-dependent generalization error bound. Feature selection and scaling algorithms often lead to non-convex optimization problems,which in many previous approaches were addressed through gradient descent procedures that can only guarantee convergence to a local minimum. We propose an alternative approach, whereby the global solution of the non-convex optimization problem is derived via an equivalent optimization problem. Moreover, the convex optimization task is reduced to a conic quadratic programming problem for which efficient solversare available. Highly competitive numerical results on both artificial and real-world data sets are reported.
Efficient Out-of-Sample Extension of Dominant-Set Clusters
Pavan, Massimiliano, Pelillo, Marcello
Dominant sets are a new graph-theoretic concept that has proven to be relevant in pairwise data clustering problems, such as image segmentation. Theygeneralize the notion of a maximal clique to edgeweighted graphs and have intriguing, nontrivial connections to continuous quadratic optimization and spectral-based grouping. We address the problem of grouping out-of-sample examples after the clustering process has taken place. This may serve either to drastically reduce the computational burdenassociated to the processing of very large data sets, or to efficiently deal with dynamic situations whereby data sets need to be updated continually. We show that the very notion of a dominant set offers asimple and efficient way of doing this. Numerical experiments on various grouping problems show the effectiveness of the approach.
Approximately Efficient Online Mechanism Design
Parkes, David C., Yanovsky, Dimah, Singh, Satinder P.
Online mechanism design (OMD) addresses the problem of sequential decision making in a stochastic environment with multiple self-interested agents. The goal in OMD is to make value-maximizing decisions despite this self-interest. In previous work we presented a Markov decision process (MDP)-basedapproach to OMD in large-scale problem domains. In practice the underlying MDP needed to solve OMD is too large and hence the mechanism must consider approximations. This raises the possibility thatagents may be able to exploit the approximation for selfish gain. We adopt sparse-sampling-based MDP algorithms to implement ษ- efficient policies, and retain truth-revelation as an approximate Bayesian-Nash equilibrium. Our approach is empirically illustrated in the context of the dynamic allocation of WiFi connectivity to users in a coffeehouse.
Log-concavity Results on Gaussian Process Methods for Supervised and Unsupervised Learning
Log-concavity is an important property in the context of optimization, Laplace approximation, and sampling; Bayesian methods based on Gaussian processpriors have become quite popular recently for classification, regression, density estimation, and point process intensity estimation. Here we prove that the predictive densities corresponding to each of these applications are log-concave, given any observed data. We also prove that the likelihood is log-concave in the hyperparameters controlling the mean function of the Gaussian prior in the density and point process intensity estimationcases, and the mean, covariance, and observation noise parameters in the classification and regression cases; this result leads to a useful parameterization of these hyperparameters, indicating a suitably large class of priors for which the corresponding maximum a posteriori problem is log-concave. Introduction Bayesian methods based on Gaussian process priors have recently become quite popular for machine learning tasks (1). These techniques have enjoyed a good deal of theoretical examination, documenting their learning-theoretic (generalization) properties (2), and developing avariety of efficient computational schemes (e.g., (3-5), and references therein).
Synergistic Face Detection and Pose Estimation with Energy-Based Models
Osadchy, Margarita, Miller, Matthew L., Cun, Yann L.
We describe a novel method for real-time, simultaneous multi-view face detection and facial pose estimation. The method employs a convolutional networkto map face images to points on a manifold, parametrized by pose, and non-face images to points far from that manifold. This network is trained by optimizing a loss function of three variables: image, pose,and face/non-face label. We test the resulting system, in a single configuration, on three standard data sets - one for frontal pose, one for rotated faces, and one for profiles - and find that its performance on each set is comparable to previous multi-view face detectors that can only handle one form of pose variation. We also show experimentally that the system's accuracy on both face detection and pose estimation is improved by training for the two tasks together.
Expectation Consistent Free Energies for Approximate Inference
We propose a novel a framework for deriving approximations for intractable probabilisticmodels. This framework is based on a free energy (negative log marginal likelihood) and can be seen as a generalization of adaptive TAP [1, 2, 3] and expectation propagation (EP) [4, 5]. The free energy is constructed from two approximating distributions which encode different aspects of the intractable model such a single node constraints andcouplings and are by construction consistent on a chosen set of moments. We test the framework on a difficult benchmark problem with binary variables on fully connected graphs and 2D grid graphs. We find good performance using sets of moments which either specify factorized nodesor a spanning tree on the nodes (structured approximation). Surprisingly, the Bethe approximation gives very inferior results even on grids.
Mass Meta-analysis in Talairach Space
We provide a method for mass meta-analysis in a neuroinformatics database containing stereotaxic Talairach coordinates from neuroimaging experiments.Database labels are used to group the individual experiments, e.g., according to cognitive function, and the consistent pattern of the experiments within the groups are determined.
Stable adaptive control with online learning
Learning algorithms have enjoyed numerous successes in robotic control tasks. In problems with time-varying dynamics, online learning methods have also proved to be a powerful tool for automatically tracking and/or adapting to the changing circumstances. However, for safety-critical applications suchas airplane flight, the adoption of these algorithms has been significantly hampered by their lack of safety, such as "stability," guarantees. Rather than trying to show difficult, a priori, stability guarantees forspecific learning methods, in this paper we propose a method for "monitoring" the controllers suggested by the learning algorithm online, andrejecting controllers leading to instability. We prove that even if an arbitrary online learning method is used with our algorithm to control a linear dynamical system, the resulting system is stable.
Detecting Significant Multidimensional Spatial Clusters
Neill, Daniel B., Moore, Andrew W., Pereira, Francisco, Mitchell, Tom M.
Each of these problems can be solved using a spatial scan statistic (Kulldorff, 1997), where we compute the maximum of a likelihood ratio statistic over all spatial regions, and find the significance of this region by randomization. However, computing the scan statistic for all spatial regions is generally computationally infeasible, so we introduce a novel fast spatial scan algorithm, generalizing the 2D scan algorithm of (Neill and Moore, 2004) to arbitrary dimensions. Our new multidimensional multiresolution algorithm allows us to find spatial clusters up to 1400x faster than the naive spatial scan, without any loss of accuracy.