Goto

Collaborating Authors

 Asia


The Natural Gradient by Analogy to Signal Whitening, and Recipes and Tricks for its Use

arXiv.org Machine Learning

The natural gradient, as introduced by [Amari, 1987], allows for more efficient gradient descent by removing dependencies and biases inherent in a function's parameterization. Several papers present the topic thoroughly and precisely [Amari, 1987, Amari, 1998, Amari and Nagaoka, 2000, Theis, 2005, Amari, 2010]. It remains a very difficult idea to get your head around however. The intent of this note is to provide simple intuition for the natural gradient and its uses. We review how an ill conditioned parameter space can undermine learning, introduce the natural gradient by analogy to the more widely understood concept of signal whitening, and present tricks and specific prescriptions for applying the natural gradient to learning problems.


Document summarization using positive pointwise mutual information

arXiv.org Artificial Intelligence

The degree of success in document summarization processes depends on the performance of the method used in identifying significant sentences in the documents. The collection of unique words characterizes the major signature of the document, and forms the basis for Term-Sentence-Matrix (TSM). The Positive Pointwise Mutual Information, which works well for measuring semantic similarity in the Term-Sentence-Matrix, is used in our method to assign weights for each entry in the Term-Sentence-Matrix. The Sentence-Rank-Matrix generated from this weighted TSM, is then used to extract a summary from the document. Our experiments show that such a method would outperform most of the existing methods in producing summaries from large documents.


Hybrid Linear Modeling via Local Best-fit Flats

arXiv.org Machine Learning

We present a simple and fast geometric method for modeling data by a union of affine subspaces. The method begins by forming a collection of local best-fit affine subspaces, i.e., subspaces approximating the data in local neighborhoods. The correct sizes of the local neighborhoods are determined automatically by the Jones' $\beta_2$ numbers (we prove under certain geometric conditions that our method finds the optimal local neighborhoods). The collection of subspaces is further processed by a greedy selection procedure or a spectral method to generate the final model. We discuss applications to tracking-based motion segmentation and clustering of faces under different illuminating conditions. We give extensive experimental evidence demonstrating the state of the art accuracy and speed of the suggested algorithms on these problems and also on synthetic hybrid linear data as well as the MNIST handwritten digits data; and we demonstrate how to use our algorithms for fast determination of the number of affine subspaces.


Learning to Win by Reading Manuals in a Monte-Carlo Framework

Journal of Artificial Intelligence Research

Domain knowledge is crucial for effective performance in autonomous control systems. Typically, human effort is required to encode this knowledge into a control algorithm. In this paper, we present an approach to language grounding which automatically interprets text in the context of a complex control application, such as a game, and uses domain knowledge extracted from the text to improve control performance. Both text analysis and control strategies are learned jointly using only a feedback signal inherent to the application. To effectively leverage textual information, our method automatically extracts the text segment most relevant to the current game state, and labels it with a task-centric predicate structure. This labeled text is then used to bias an action selection policy for the game, guiding it towards promising regions of the action space. We encode our model for text analysis and game playing in a multi-layer neural network, representing linguistic decisions via latent variables in the hidden layers, and game action quality via the output layer. Operating within the Monte-Carlo Search framework, we estimate model parameters using feedback from simulated games. We apply our approach to the complex strategy game Civilization II using the official game manual as the text guide. Our results show that a linguistically-informed game-playing agent significantly outperforms its language-unaware counterpart, yielding a 34% absolute improvement and winning over 65% of games when playing against the built-in AI of Civilization.


A Conjugate Property between Loss Functions and Uncertainty Sets in Classification Problems

arXiv.org Machine Learning

In binary classification problems, mainly two approaches have been proposed; one is loss function approach and the other is uncertainty set approach. The loss function approach is applied to major learning algorithms such as support vector machine (SVM) and boosting methods. The loss function represents the penalty of the decision function on the training samples. In the learning algorithm, the empirical mean of the loss function is minimized to obtain the classifier. Against a backdrop of the development of mathematical programming, nowadays learning algorithms based on loss functions are widely applied to real-world data analysis. In addition, statistical properties of such learning algorithms are well-understood based on a lots of theoretical works. On the other hand, the learning method using the so-called uncertainty set is used in hard-margin SVM, mini-max probability machine (MPM) and maximum margin MPM. In the learning algorithm, firstly, the uncertainty set is defined for each binary label based on the training samples. Then, the best separating hyperplane between the two uncertainty sets is employed as the decision function. This is regarded as an extension of the maximum-margin approach. The uncertainty set approach has been studied as an application of robust optimization in the field of mathematical programming. The statistical properties of learning algorithms with uncertainty sets have not been intensively studied. In this paper, we consider the relation between the above two approaches. We point out that the uncertainty set is described by using the level set of the conjugate of the loss function. Based on such relation, we study statistical properties of learning algorithms using uncertainty sets.


Reformulating the Situation Calculus and the Event Calculus in the General Theory of Stable Models and in Answer Set Programming

Journal of Artificial Intelligence Research

Circumscription and logic programs under the stable model semantics are two well-known nonmonotonic formalisms. The former has served as a basis of classical logic based action formalisms, such as the situation calculus, the event calculus and temporal action logics; the latter has served as a basis of a family of action languages, such as language A and several of its descendants. Based on the discovery that circumscription and the stable model semantics coincide on a class of canonical formulas, we reformulate the situation calculus and the event calculus in the general theory of stable models. We also present a translation that turns the reformulations further into answer set programs, so that efficient answer set solvers can be applied to compute the situation calculus and the event calculus.


Avoiding and Escaping Depressions in Real-Time Heuristic Search

Journal of Artificial Intelligence Research

Heuristics used for solving hard real-time search problems have regions with depressions. Such regions are bounded areas of the search space in which the heuristic function is inaccurate compared to the actual cost to reach a solution. Early real-time search algorithms, like LRTA*, easily become trapped in those regions since the heuristic values of their states may need to be updated multiple times, which results in costly solutions. State-of-the-art real-time search algorithms, like LSS-LRTA* or LRTA*(k), improve LRTA*'s mechanism to update the heuristic, resulting in improved performance. Those algorithms, however, do not guide search towards avoiding depressed regions. This paper presents depression avoidance, a simple real-time search principle to guide search towards avoiding states that have been marked as part of a heuristic depression. We propose two ways in which depression avoidance can be implemented: mark-and-avoid and move-to-border. We implement these strategies on top of LSS-LRTA* and RTAA*, producing 4 new real-time heuristic search algorithms: aLSS-LRTA*, daLSS-LRTA*, aRTAA*, and daRTAA*. When the objective is to find a single solution by running the real-time search algorithm once, we show that daLSS-LRTA* and daRTAA* outperform their predecessors sometimes by one order of magnitude. Of the four new algorithms, daRTAA* produces the best solutions given a fixed deadline on the average time allowed per planning episode. We prove all our algorithms have good theoretical properties: in finite search spaces, they find a solution if one exists, and converge to an optimal after a number of trials.


Automatic Sampling of Geographic objects

arXiv.org Artificial Intelligence

Today, one's disposes of large datasets composed of thousands of geographic objects. However, for many processes, which require the appraisal of an expert or much computational time, only a small part of these objects can be taken into account. In this context, robust sampling methods become necessary. In this paper, we propose a sampling method based on clustering techniques. Our method consists in dividing the objects in clusters, then in selecting in each cluster, the most representative objects. A case-study in the context of a process dedicated to knowledge revision for geographic data generalisation is presented. This case-study shows that our method allows to select relevant samples of objects.


The Stick-Breaking Construction of the Beta Process as a Poisson Process

arXiv.org Machine Learning

We show that the stick-breaking construction of the beta process due to Paisley, et al. (2010) can be obtained from the characterization of the beta process as a Poisson process. Specifically, we show that the mean measure of the underlying Poisson process is equal to that of the beta process. We use this underlying representation to derive error bounds on truncated beta processes that are tighter than those in the literature. We also develop a new MCMC inference algorithm for beta processes, based in part on our new Poisson process construction.


The Discrete Infinite Logistic Normal Distribution

arXiv.org Machine Learning

We present the discrete infinite logistic normal distribution (DILN), a Bayesian nonparametric prior for mixed membership models. DILN is a generalization of the hierarchical Dirichlet process (HDP) that models correlation structure between the weights of the atoms at the group level. We derive a representation of DILN as a normalized collection of gamma-distributed random variables, and study its statistical properties. We consider applications to topic modeling and derive a variational inference algorithm for approximate posterior inference. We study the empirical performance of the DILN topic model on four corpora, comparing performance with the HDP and the correlated topic model (CTM). To deal with large-scale data sets, we also develop an online inference algorithm for DILN and compare with online HDP and online LDA on the Nature magazine, which contains approximately 350,000 articles.