Goto

Collaborating Authors

 Country


Randomized Nonlinear Component Analysis

arXiv.org Machine Learning

Classical methods such as Principal Component Analysis (PCA) and Canonical Correlation Analysis (CCA) are ubiquitous in statistics. However, these techniques are only able to reveal linear relationships in data. Although nonlinear variants of PCA and CCA have been proposed, these are computationally prohibitive in the large scale. In a separate strand of recent research, randomized methods have been proposed to construct features that help reveal nonlinear patterns in data. For basic tasks such as regression or classification, random features exhibit little or no loss in performance, while achieving drastic savings in computational requirements. In this paper we leverage randomness to design scalable new variants of nonlinear PCA and CCA; our ideas extend to key multivariate analysis tools such as spectral clustering or LDA. We demonstrate our algorithms through experiments on real-world data, on which we compare against the state-of-the-art. A simple R implementation of the presented algorithms is provided.


Scalable sparse covariance estimation via self-concordance

arXiv.org Machine Learning

We consider the class of convex minimization problems, composed of a self-concordant function, such as the $\log\det$ metric, a convex data fidelity term $h(\cdot)$ and, a regularizing -- possibly non-smooth -- function $g(\cdot)$. This type of problems have recently attracted a great deal of interest, mainly due to their omnipresence in top-notch applications. Under this \emph{locally} Lipschitz continuous gradient setting, we analyze the convergence behavior of proximal Newton schemes with the added twist of a probable presence of inexact evaluations. We prove attractive convergence rate guarantees and enhance state-of-the-art optimization schemes to accommodate such developments. Experimental results on sparse covariance estimation show the merits of our algorithm, both in terms of recovery efficiency and complexity.


Thresholding Classifiers to Maximize F1 Score

arXiv.org Machine Learning

This paper provides new insight into maximizing F1 scores in the context of binary classification and also in the context of multilabel classification. The harmonic mean of precision and recall, F1 score is widely used to measure the success of a binary classifier when one class is rare. Micro average, macro average, and per instance average F1 scores are used in multilabel classification. For any classifier that produces a real-valued output, we derive the relationship between the best achievable F1 score and the decision-making threshold that achieves this optimum. As a special case, if the classifier outputs are well-calibrated conditional probabilities, then the optimal threshold is half the optimal F1 score. As another special case, if the classifier is completely uninformative, then the optimal behavior is to classify all examples as positive. Since the actual prevalence of positive examples typically is low, this behavior can be considered undesirable. As a case study, we discuss the results, which can be surprising, of applying this procedure when predicting 26,853 labels for Medline documents.


Effects of Sampling Methods on Prediction Quality. The Case of Classifying Land Cover Using Decision Trees

arXiv.org Machine Learning

Clever sampling methods can be used to improve the handling of big data and increase its usefulness. The subject of this study is remote sensing, specifically airborne laser scanning point clouds representing different classes of ground cover. The aim is to derive a supervised learning model for the classification using CARTs. In order to measure the effect of different sampling methods on the classification accuracy, various experiments with varying types of sampling methods, sample sizes, and accuracy metrics have been designed. Numerical results for a subset of a large surveying project covering the lower Rhine area in Germany are shown. General conclusions regarding sampling design are drawn and presented.


Dimensionality reduction for click-through rate prediction: Dense versus sparse representation

arXiv.org Machine Learning

In online advertising, display ads are increasingly being placed based on real-time auctions where the advertiser who wins gets to serve the ad. This is called real-time bidding (RTB). In RTB, auctions have very tight time constraints on the order of 100ms. Therefore mechanisms for bidding intelligently such as clickthrough rate prediction need to be sufficiently fast. In this work, we propose to use dimensionality reduction of the user-website interaction graph in order to produce simplified features of users and websites that can be used as predictors of clickthrough rate. We demonstrate that the Infinite Relational Model (IRM) as a dimensionality reduction offers comparable predictive performance to conventional dimensionality reduction schemes, while achieving the most economical usage of features and fastest computations at run-time. For applications such as real-time bidding, where fast database I/O and few computations are key to success, we thus recommend using IRM based features as predictors to exploit the recommender effects from bipartite graphs.


Circulant Binary Embedding

arXiv.org Machine Learning

Binary embedding of high-dimensional data requires long codes to preserve the discriminative power of the input space. Traditional binary coding methods often suffer from very high computation and storage costs in such a scenario. To address this problem, we propose Circulant Binary Embedding (CBE) which generates binary codes by projecting the data with a circulant matrix. The circulant structure enables the use of Fast Fourier Transformation to speed up the computation. Compared to methods that use unstructured matrices, the proposed method improves the time complexity from $\mathcal{O}(d^2)$ to $\mathcal{O}(d\log{d})$, and the space complexity from $\mathcal{O}(d^2)$ to $\mathcal{O}(d)$ where $d$ is the input dimensionality. We also propose a novel time-frequency alternating optimization to learn data-dependent circulant projections, which alternatively minimizes the objective in original and Fourier domains. We show by extensive experiments that the proposed approach gives much better performance than the state-of-the-art approaches for fixed time, and provides much faster computation with no performance degradation for fixed number of bits.


Accelerating Minibatch Stochastic Gradient Descent using Stratified Sampling

arXiv.org Machine Learning

Stochastic Gradient Descent (SGD) is a popular optimization method which has been applied to many important machine learning tasks such as Support Vector Machines and Deep Neural Networks. In order to parallelize SGD, minibatch training is often employed. The standard approach is to uniformly sample a minibatch at each step, which often leads to high variance. In this paper we propose a stratified sampling strategy, which divides the whole dataset into clusters with low within-cluster variance; we then take examples from these clusters using a stratified sampling technique. It is shown that the convergence rate can be significantly improved by the algorithm. Encouraging experimental results confirm the effectiveness of the proposed method.


SUNNY: a Lazy Portfolio Approach for Constraint Solving

arXiv.org Artificial Intelligence

*** To appear in Theory and Practice of Logic Programming (TPLP) *** Within the context of constraint solving, a portfolio approach allows one to exploit the synergy between different solvers in order to create a globally better solver. In this paper we present SUNNY: a simple and flexible algorithm that takes advantage of a portfolio of constraint solvers in order to compute --- without learning an explicit model --- a schedule of them for solving a given Constraint Satisfaction Problem (CSP). Motivated by the performance reached by SUNNY vs. different simulations of other state of the art approaches, we developed sunny-csp, an effective portfolio solver that exploits the underlying SUNNY algorithm in order to solve a given CSP. Empirical tests conducted on exhaustive benchmarks of MiniZinc models show that the actual performance of SUNNY conforms to the predictions. This is encouraging both for improving the power of CSP portfolio solvers and for trying to export them to fields such as Answer Set Programming and Constraint Logic Programming.


Querying Geometric Figures Using a Controlled Language, Ontological Graphs and Dependency Lattices

arXiv.org Artificial Intelligence

Dynamic geometry systems (DGS) have become basic tools in many areas of geometry as, for example, in education. Geometry Automated Theorem Provers (GATP) are an active area of research and are considered as being basic tools in future enhanced educational software as well as in a next generation of mechanized mathematics assistants. Recently emerged Web repositories of geometric knowledge, like TGTP and Intergeo, are an attempt to make the already vast data set of geometric knowledge widely available. Considering the large amount of geometric information already available, we face the need of a query mechanism for descriptions of geometric constructions. In this paper we discuss two approaches for describing geometric figures (declarative and procedural), and present algorithms for querying geometric figures in declaratively and procedurally described corpora, by using a DGS or a dedicated controlled natural language for queries.


A Neuron as a Signal Processing Device

arXiv.org Machine Learning

A neuron is a basic physiological and computational unit of the brain. While much is known about the physiological properties of a neuron, its computational role is poorly understood. Here we propose to view a neuron as a signal processing device that represents the incoming streaming data matrix as a sparse vector of synaptic weights scaled by an outgoing sparse activity vector. Formally, a neuron minimizes a cost function comprising a cumulative squared representation error and regularization terms. We derive an online algorithm that minimizes such cost function by alternating between the minimization with respect to activity and with respect to synaptic weights. The steps of this algorithm reproduce well-known physiological properties of a neuron, such as weighted summation and leaky integration of synaptic inputs, as well as an Oja-like, but parameter-free, synaptic learning rule. Our theoretical framework makes several predictions, some of which can be verified by the existing data, others require further experiments. Such framework should allow modeling the function of neuronal circuits without necessarily measuring all the microscopic biophysical parameters, as well as facilitate the design of neuromorphic electronics.