Goto

Collaborating Authors

 Statistical Learning


Learning to Rank With Bregman Divergences and Monotone Retargeting

arXiv.org Machine Learning

This paper introduces a novel approach for learning to rank (LETOR) based on the notion of monotone retargeting. It involves minimizing a divergence between all monotonic increasing transformations of the training scores and a parameterized prediction function. The minimization is both over the transformations as well as over the parameters. It is applied to Bregman divergences, a large class of "distance like" functions that were recently shown to be the unique class that is statistically consistent with the normalized discounted gain (NDCG) criterion [19]. The algorithm uses alternating projection style updates, in which one set of simultaneous projections can be computed independent of the Bregman divergence and the other reduces to parameter estimation of a generalized linear model. This results in easily implemented, efficiently parallelizable algorithm for the LETOR task that enjoys global optimum guarantees under mild conditions. We present empirical results on benchmark datasets showing that this approach can outperform the state of the art NDCG consistent techniques.


Hilbert Space Embedding for Dirichlet Process Mixtures

arXiv.org Machine Learning

This paper proposes a Hilbert space embedding for Dirichlet Process mixture models via a stick-breaking construction of Sethuraman. Although Bayesian nonparametrics offers a powerful approach to construct a prior that avoids the need to specify the model size/complexity explicitly, an exact inference is often intractable. On the other hand, frequentist approaches such as kernel machines, which suffer from the model selection/comparison problems, often benefit from efficient learning algorithms. This paper discusses the possibility to combine the best of both worlds by using the Dirichlet Process mixture model as a case study.


The Kernel Pitman-Yor Process

arXiv.org Artificial Intelligence

Nonparametric Bayesian modeling techniques, especially Dirichlet process mixture (DPM) models, have become very popular in statistics over the last few years, for performing nonparametric density estimation [1], [2], [3]. This theory is based on the observation that an infinite number of component distributions in an ordinary finite mixture model (clustering model) tends on the limit to a Dirichlet process (DP) prior [2], [4]. Eventually, the nonparametric Bayesian inference scheme induced by a DPM model yields a posterior distribution on the proper number of model component densities (inferred clusters) [5], rather than selecting a fixed number of mixture components. Hence, the obtained nonparametric Bayesian formulation eliminates the need of doing inference (or making arbitrary choices) on the number of mixture components (clusters) necessary to represent the modeled data. An interesting alternative to the Dirichlet process prior for nonparametric Bayesian modeling is the Pitman-Yor process (PYP) prior [6]. Pitman-Yor processes produce power-law distributions that allow for better modeling populations comprising a high number of clusters with low popularity and a low number of clusters with high popularity [7]. Indeed, the Pitman-Yor process prior can be viewed as a generalization of the Dirichlet process prior, and reduces to it for a specific selection of its parameter values. In [8], a Gaussian process-based coupled PYP method for joint segmentation of multiple images is proposed.


Local optima networks and the performance of iterated local search

arXiv.org Artificial Intelligence

Local Optima Networks (LONs) have been recently proposed as an alternative model of combinatorial fitness landscapes. The model compresses the information given by the whole search space into a smaller mathematical object that is the graph having as vertices the local optima and as edges the possible weighted transitions between them. A new set of metrics can be derived from this model that capture the distribution and connectivity of the local optima in the underlying configuration space. This paper departs from the descriptive analysis of local optima networks, and actively studies the correlation between network features and the performance of a local search heuristic. The NK family of landscapes and the Iterated Local Search metaheuristic are considered. With a statistically-sound approach based on multiple linear regression, it is shown that some LONs' features strongly influence and can even partly predict the performance of a heuristic search algorithm. This study validates the expressive power of LONs as a model of combinatorial fitness landscapes.


Opinion Mining for Relating Subjective Expressions and Annual Earnings in US Financial Statements

arXiv.org Artificial Intelligence

Financial statements contain quantitative information and manager's subjective evaluation of firm's financial status. Using information released in U.S. 10-K filings. Both qualitative and quantitative appraisals are crucial for quality financial decisions. To extract such opinioned statements from the reports, we built tagging models based on the conditional random field (CRF) techniques, considering a variety of combinations of linguistic factors including morphology, orthography, predicate-argument structure, syntax, and simple semantics. Our results show that the CRF models are reasonably effective to find opinion holders in experiments when we adopted the popular MPQA corpus for training and testing. The contribution of our paper is to identify opinion patterns in multiword expressions (MWEs) forms rather than in single word forms. We find that the managers of corporations attempt to use more optimistic words to obfuscate negative financial performance and to accentuate the positive financial performance. Our results also show that decreasing earnings were often accompanied by ambiguous and mild statements in the reporting year and that increasing earnings were stated in assertive and positive way.


Unsupervised Detection and Tracking of Arbitrary Objects with Dependent Dirichlet Process Mixtures

arXiv.org Machine Learning

This paper proposes a technique for the unsupervised detection and tracking of arbitrary objects in videos. It is intended to reduce the need for detection and localization methods tailored to specific object types and serve as a general framework applicable to videos with varied objects, backgrounds, and image qualities. The technique uses a dependent Dirichlet process mixture (DDPM) known as the Generalized Polya Urn (GPUDDPM) to model image pixel data that can be easily and efficiently extracted from the regions in a video that represent objects. This paper describes a specific implementation of the model using spatial and color pixel data extracted via frame differencing and gives two algorithms for performing inference in the model to accomplish detection and tracking. This technique is demonstrated on multiple synthetic and benchmark video datasets that illustrate its ability to, without modification, detect and track objects with diverse physical characteristics moving over non-uniform backgrounds and through occlusion.


Adaptive sequential Monte Carlo by means of mixture of experts

arXiv.org Machine Learning

Appropriately designing the proposal kernel of particle filters is an issue of significant importance, since a bad choice may lead to deterioration of the particle sample and, consequently, waste of computational power. In this paper we introduce a novel algorithm adaptively approximating the so-called optimal proposal kernel by a mixture of integrated curved exponential distributions with logistic weights. This family of distributions, referred to as mixtures of experts, is broad enough to be used in the presence of multi-modality or strongly skewed distributions. The mixtures are fitted, via online-EM methods, to the optimal kernel through minimisation of the Kullback-Leibler divergence between the auxiliary target and instrumental distributions of the particle filter. At each iteration of the particle filter, the algorithm is required to solve only a single optimisation problem for the whole particle sample, yielding an algorithm with only linear complexity. In addition, we illustrate in a simulation study how the method can be successfully applied to optimal filtering in nonlinear state-space models.


Optimization in Differentiable Manifolds in Order to Determine the Method of Construction of Prehistoric Wall-Paintings

arXiv.org Artificial Intelligence

In this paper a general methodology is introduced for the determination of potential prototype curves used for the drawing of prehistoric wall-paintings. The approach includes a) preprocessing of the wall-paintings contours to properly partition them, according to their curvature, b) choice of prototype curves families, c) analysis and optimization in 4-manifold for a first estimation of the form of these prototypes, d) clustering of the contour parts and the prototypes, to determine a minimal number of potential guides, e) further optimization in 4-manifold, applied to each cluster separately, in order to determine the exact functional form of the potential guides, together with the corresponding drawn contour parts. The introduced methodology simultaneously deals with two problems: a) the arbitrariness in data-points orientation and b) the determination of one proper form for a prototype curve that optimally fits the corresponding contour data. Arbitrariness in orientation has been dealt with a novel curvature based error, while the proper forms of curve prototypes have been exhaustively determined by embedding curvature deformations of the prototypes into 4-manifolds. Application of this methodology to celebrated wall-paintings excavated at Tyrins, Greece and the Greek island of Thera, manifests it is highly probable that these wall-paintings had been drawn by means of geometric guides that correspond to linear spirals and hyperbolae. These geometric forms fit the drawings' lines with an exceptionally low average error, less than 0.39mm. Hence, the approach suggests the existence of accurate realizations of complicated geometric entities, more than 1000 years before their axiomatic formulation in Classical Ages.


Rank/Norm Regularization with Closed-Form Solutions: Application to Subspace Clustering

arXiv.org Machine Learning

When data is sampled from an unknown subspace, principal component analysis (PCA) provides an effective way to estimate the subspace and hence reduce the dimension of the data. At the heart of PCA is the Eckart-Young-Mirsky theorem, which characterizes the best rank k approximation of a matrix. In this paper, we prove a generalization of the Eckart-Young-Mirsky theorem under all unitarily invariant norms. Using this result, we obtain closed-form solutions for a set of rank/norm regularized problems, and derive closed-form solutions for a general class of subspace clustering problems (where data is modelled by unions of unknown subspaces). From these results we obtain new theoretical insights and promising experimental results.


Multi-view constrained clustering with an incomplete mapping between views

arXiv.org Artificial Intelligence

Multi-view learning algorithms typically assume a complete bipartite mapping between the different views in order to exchange information during the learning process. However, many applications provide only a partial mapping between the views, creating a challenge for current methods. To address this problem, we propose a multi-view algorithm based on constrained clustering that can operate with an incomplete mapping. Given a set of pairwise constraints in each view, our approach propagates these constraints using a local similarity measure to those instances that can be mapped to the other views, allowing the propagated constraints to be transferred across views via the partial mapping. It uses co-EM to iteratively estimate the propagation within each view based on the current clustering model, transfer the constraints across views, and then update the clustering model. By alternating the learning process between views, this approach produces a unified clustering model that is consistent with all views. We show that this approach significantly improves clustering performance over several other methods for transferring constraints and allows multi-view clustering to be reliably applied when given a limited mapping between the views. Our evaluation reveals that the propagated constraints have high precision with respect to the true clusters in the data, explaining their benefit to clustering performance in both single- and multi-view learning scenarios.