Goto

Collaborating Authors

 Genre


Hybrid Linear Modeling via Local Best-fit Flats

arXiv.org Machine Learning

We present a simple and fast geometric method for modeling data by a union of affine subspaces. The method begins by forming a collection of local best-fit affine subspaces, i.e., subspaces approximating the data in local neighborhoods. The correct sizes of the local neighborhoods are determined automatically by the Jones' $\beta_2$ numbers (we prove under certain geometric conditions that our method finds the optimal local neighborhoods). The collection of subspaces is further processed by a greedy selection procedure or a spectral method to generate the final model. We discuss applications to tracking-based motion segmentation and clustering of faces under different illuminating conditions. We give extensive experimental evidence demonstrating the state of the art accuracy and speed of the suggested algorithms on these problems and also on synthetic hybrid linear data as well as the MNIST handwritten digits data; and we demonstrate how to use our algorithms for fast determination of the number of affine subspaces.


Learning to Win by Reading Manuals in a Monte-Carlo Framework

Journal of Artificial Intelligence Research

Domain knowledge is crucial for effective performance in autonomous control systems. Typically, human effort is required to encode this knowledge into a control algorithm. In this paper, we present an approach to language grounding which automatically interprets text in the context of a complex control application, such as a game, and uses domain knowledge extracted from the text to improve control performance. Both text analysis and control strategies are learned jointly using only a feedback signal inherent to the application. To effectively leverage textual information, our method automatically extracts the text segment most relevant to the current game state, and labels it with a task-centric predicate structure. This labeled text is then used to bias an action selection policy for the game, guiding it towards promising regions of the action space. We encode our model for text analysis and game playing in a multi-layer neural network, representing linguistic decisions via latent variables in the hidden layers, and game action quality via the output layer. Operating within the Monte-Carlo Search framework, we estimate model parameters using feedback from simulated games. We apply our approach to the complex strategy game Civilization II using the official game manual as the text guide. Our results show that a linguistically-informed game-playing agent significantly outperforms its language-unaware counterpart, yielding a 34% absolute improvement and winning over 65% of games when playing against the built-in AI of Civilization.


A Conjugate Property between Loss Functions and Uncertainty Sets in Classification Problems

arXiv.org Machine Learning

In binary classification problems, mainly two approaches have been proposed; one is loss function approach and the other is uncertainty set approach. The loss function approach is applied to major learning algorithms such as support vector machine (SVM) and boosting methods. The loss function represents the penalty of the decision function on the training samples. In the learning algorithm, the empirical mean of the loss function is minimized to obtain the classifier. Against a backdrop of the development of mathematical programming, nowadays learning algorithms based on loss functions are widely applied to real-world data analysis. In addition, statistical properties of such learning algorithms are well-understood based on a lots of theoretical works. On the other hand, the learning method using the so-called uncertainty set is used in hard-margin SVM, mini-max probability machine (MPM) and maximum margin MPM. In the learning algorithm, firstly, the uncertainty set is defined for each binary label based on the training samples. Then, the best separating hyperplane between the two uncertainty sets is employed as the decision function. This is regarded as an extension of the maximum-margin approach. The uncertainty set approach has been studied as an application of robust optimization in the field of mathematical programming. The statistical properties of learning algorithms with uncertainty sets have not been intensively studied. In this paper, we consider the relation between the above two approaches. We point out that the uncertainty set is described by using the level set of the conjugate of the loss function. Based on such relation, we study statistical properties of learning algorithms using uncertainty sets.


Hybrid Batch Bayesian Optimization

arXiv.org Artificial Intelligence

Bayesian Optimization aims at optimizing an unknown non-convex/concave function that is costly to evaluate. We are interested in application scenarios where concurrent function evaluations are possible. Under such a setting, BO could choose to either sequentially evaluate the function, one input at a time and wait for the output of the function before making the next selection, or evaluate the function at a batch of multiple inputs at once. These two different settings are commonly referred to as the sequential and batch settings of Bayesian Optimization. In general, the sequential setting leads to better optimization performance as each function evaluation is selected with more information, whereas the batch setting has an advantage in terms of the total experimental time (the number of iterations). In this work, our goal is to combine the strength of both settings. Specifically, we systematically analyze Bayesian optimization using Gaussian process as the posterior estimator and provide a hybrid algorithm that, based on the current state, dynamically switches between a sequential policy and a batch policy with variable batch sizes. We provide theoretical justification for our algorithm and present experimental results on eight benchmark BO problems. The results show that our method achieves substantial speedup (up to %78) compared to a pure sequential policy, without suffering any significant performance loss.


Dissimilarity Clustering by Hierarchical Multi-Level Refinement

arXiv.org Machine Learning

We introduce in this paper a new way of optimizing the natural extension of the quantization error using in k-means clustering to dissimilarity data. The proposed method is based on hierarchical clustering analysis combined with multilevel heuristic refinement. The method is computationally efficient and achieves better quantization errors than the relational k-means.


Recovery of Low-Rank Plus Compressed Sparse Matrices with Application to Unveiling Traffic Anomalies

arXiv.org Machine Learning

Given the superposition of a low-rank matrix plus the product of a known fat compression matrix times a sparse matrix, the goal of this paper is to establish deterministic conditions under which exact recovery of the low-rank and sparse components becomes possible. This fundamental identifiability issue arises with traffic anomaly detection in backbone networks, and subsumes compressed sensing as well as the timely low-rank plus sparse matrix recovery tasks encountered in matrix decomposition problems. Leveraging the ability of $\ell_1$- and nuclear norms to recover sparse and low-rank matrices, a convex program is formulated to estimate the unknowns. Analysis and simulations confirm that the said convex program can recover the unknowns for sufficiently low-rank and sparse enough components, along with a compression matrix possessing an isometry property when restricted to operate on sparse vectors. When the low-rank, sparse, and compression matrices are drawn from certain random ensembles, it is established that exact recovery is possible with high probability. First-order algorithms are developed to solve the nonsmooth convex optimization problem with provable iteration complexity guarantees. Insightful tests with synthetic and real network data corroborate the effectiveness of the novel approach in unveiling traffic anomalies across flows and time, and its ability to outperform existing alternatives.


A consistent adjacency spectral embedding for stochastic blockmodel graphs

arXiv.org Machine Learning

We present a method to estimate block membership of nodes in a random graph generated by a stochastic blockmodel. We use an embedding procedure motivated by the random dot product graph model, a particular example of the latent position model. The embedding associates each node with a vector; these vectors are clustered via minimization of a square error criterion. We prove that this method is consistent for assigning nodes to blocks, as only a negligible number of nodes will be mis-assigned. We prove consistency of the method for directed and undirected graphs. The consistent block assignment makes possible consistent parameter estimation for a stochastic blockmodel. We extend the result in the setting where the number of blocks grows slowly with the number of nodes. Our method is also computationally feasible even for very large graphs. We compare our method to Laplacian spectral clustering through analysis of simulated data and a graph derived from Wikipedia documents.


PAC learnability under non-atomic measures: a problem by Vidyasagar

arXiv.org Machine Learning

In response to a 1997 problem of M. Vidyasagar, we state a criterion for PAC learnability of a concept class $\mathscr C$ under the family of all non-atomic (diffuse) measures on the domain $\Omega$. The uniform Glivenko--Cantelli property with respect to non-atomic measures is no longer a necessary condition, and consistent learnability cannot in general be expected. Our criterion is stated in terms of a combinatorial parameter $\VC({\mathscr C}\,{\mathrm{mod}}\,\omega_1)$ which we call the VC dimension of $\mathscr C$ modulo countable sets. The new parameter is obtained by "thickening up" single points in the definition of VC dimension to uncountable "clusters". Equivalently, $\VC(\mathscr C\modd\omega_1)\leq d$ if and only if every countable subclass of $\mathscr C$ has VC dimension $\leq d$ outside a countable subset of $\Omega$. The new parameter can be also expressed as the classical VC dimension of $\mathscr C$ calculated on a suitable subset of a compactification of $\Omega$. We do not make any measurability assumptions on $\mathscr C$, assuming instead the validity of Martin's Axiom (MA). Similar results are obtained for function learning in terms of fat-shattering dimension modulo countable sets, but, just like in the classical distribution-free case, the finiteness of this parameter is sufficient but not necessary for PAC learnability under non-atomic measures.


Quantitative Concept Analysis

arXiv.org Artificial Intelligence

Formal Concept Analysis (FCA) begins from a context, given as a binary relation between some objects and some attributes, and derives a lattice of concepts, where each concept is given as a set of objects and a set of attributes, such that the first set consists of all objects that satisfy all attributes in the second, and vice versa. Many applications, though, provide contexts with quantitative information, telling not just whether an object satisfies an attribute, but also quantifying this satisfaction. Contexts in this form arise as rating matrices in recommender systems, as occurrence matrices in text analysis, as pixel intensity matrices in digital image processing, etc. Such applications have attracted a lot of attention, and several numeric extensions of FCA have been proposed. We propose the framework of proximity sets (proxets), which subsume partially ordered sets (posets) as well as metric spaces. One feature of this approach is that it extracts from quantified contexts quantified concepts, and thus allows full use of the available information. Another feature is that the categorical approach allows analyzing any universal properties that the classical FCA and the new versions may have, and thus provides structural guidance for aligning and combining the approaches.


A Market-Inspired Approach for Intersection Management in Urban Road Traffic Networks

Journal of Artificial Intelligence Research

Traffic congestion in urban road networks is a costly problem that affects all major cities in developed countries. To tackle this problem, it is possible (i) to act on the supply side, increasing the number of roads or lanes in a network, (ii) to reduce the demand, restricting the access to urban areas at specific hours or to specific vehicles, or (iii) to improve the efficiency of the existing network, by means of a widespread use of so-called Intelligent Transportation Systems (ITS). In line with the recent advances in smart transportation management infrastructures, ITS has turned out to be a promising field of application for artificial intelligence techniques. In particular, multiagent systems seem to be the ideal candidates for the design and implementation of ITS. In fact, drivers can be naturally modelled as autonomous agents that interact with the transportation management infrastructure, thereby generating a large-scale, open, agent-based system. To regulate such a system and maintain a smooth and efficient flow of traffic, decentralised mechanisms for the management of the transportation infrastructure are needed. In this article we propose a distributed, market-inspired, mechanism for the management of a future urban road network, where intelligent autonomous vehicles, operated by software agents on behalf of their human owners, interact with the infrastructure in order to travel safely and efficiently through the road network. Building on the reservation-based intersection control model proposed by Dresner and Stone, we consider two different scenarios: one with a single intersection and one with a network of intersections. In the former, we analyse the performance of a novel policy based on combinatorial auctions for the allocation of reservations. In the latter, we analyse the impact that a traffic assignment strategy inspired by competitive markets has on the drivers' route choices. Finally we propose an adaptive management mechanism that integrates the auction-based traffic control policy with the competitive traffic assignment strategy.