Goto

Collaborating Authors

 Markov Models


Generalized Biwords for Bitext Compression and Translation Spotting

Journal of Artificial Intelligence Research

Large bilingual parallel texts (also known as bitexts) are usually stored in a compressed form, and previous work has shown that they can be more efficiently compressed if the fact that the two texts are mutual translations is exploited. For example, a bitext can be seen as a sequence of biwords ---pairs of parallel words with a high probability of co-occurrence--- that can be used as an intermediate representation in the compression process. However, the simple biword approach described in the literature can only exploit one-to-one word alignments and cannot tackle the reordering of words. We therefore introduce a generalization of biwords which can describe multi-word expressions and reorderings. We also describe some methods for the binary compression of generalized biword sequences, and compare their performance when different schemes are applied to the extraction of the biword sequence. In addition, we show that this generalization of biwords allows for the implementation of an efficient algorithm to look on the compressed bitext for words or text segments in one of the texts and retrieve their counterpart translations in the other text ---an application usually referred to as translation spotting--- with only some minor modifications in the compression algorithm.


Spectral dimensionality reduction for HMMs

arXiv.org Machine Learning

Hidden Markov Models (HMMs) can be accurately approximated using co-occurrence frequencies of pairs and triples of observations by using a fast spectral method in contrast to the usual slow methods like EM or Gibbs sampling. We provide a new spectral method which significantly reduces the number of model parameters that need to be estimated, and generates a sample complexity that does not depend on the size of the observation vocabulary. We present an elementary proof giving bounds on the relative accuracy of probability estimates from our model. (Correlaries show our bounds can be weakened to provide either L1 bounds or KL bounds which provide easier direct comparisons to previous work.) Our theorem uses conditions that are checkable from the data, instead of putting conditions on the unobservable Markov transition matrix.


Graphical Models for Integrated Intelligent Robot Architectures

AAAI Conferences

The theoretically elegant yet broadly functional capability of graphical models shows intriguing potential to span in a uniform manner perception, cognition and action; and thus to ultimately yield simpler yet more powerful integrated architectures for intelligent robots and other comparable systems. This position paper explores this potential, with initial support from an effort underway to develop a graphical architecture that is based on factor graphs (with piecewise continuous functions).


A Multitask Representation Using Reusable Local Policy Templates

AAAI Conferences

Constructing robust controllers to perform tasks in large, continually changing worlds is a difficult problem. A long-lived agent placed in such a world could be required to perform a variety of different tasks. For this to be possible, the agent needs to be able to abstract its experiences in a reusable way. This paper addresses the problem of online multitask decision making in such complex worlds, with inherent incompleteness in models of change. A fully general version of this problem is intractable but many interesting domains are rendered manageable by the fact that all instances of tasks may be described using a finite set of qualitatively meaningful contexts. We suggest an approach to solving the multitask problem through decomposing the domain into a set of capabilities based on these local contexts. Capabilities resemble the options of hierarchical reinforcement learning, but provide robust behaviours capable of achieving some subgoal with the associated guarantee of achieving at least a particular aspiration level of performance. This enables using these policies within a planning framework, and they become a level of abstraction which factorises an otherwise large domain into task-independent sub-problems, with well-defined interfaces between the perception, control and planning problems. This is demonstrated in a stochastic navigation example, where an agent reaches different goals in different world instances without relearning.


SNARE: Social Network Analysis and Reasoning Environment

AAAI Conferences

The importance of diversity in reasoning and learning to successfully address complex problems is examined. We discuss an approach by which a multiagent framework with decentralized control mechanisms provides diverse perspectives and hypotheses addressing a class of complex problems. We introduce the SNARE multiagent system. SNARE performs tasks to gain situational awareness of situations of interest in a Social Media Space. It applies a decentralized control mechanism for each agent; this mechanism enables an agent to interact with other agents to reason and learn. This approach facilitates dynamic agent organizations that adapt the topologies of interactions between agents based on the problem context.


A Novel Training Algorithm for HMMs with Partial and Noisy Access to the States

arXiv.org Machine Learning

This paper proposes a new estimation algorithm for the parameters of an HMM as to best account for the observed data. In this model, in addition to the observation sequence, we have \emph{partial} and \emph{noisy} access to the hidden state sequence as side information. This access can be seen as "partial labeling" of the hidden states. Furthermore, we model possible mislabeling in the side information in a joint framework and derive the corresponding EM updates accordingly. In our simulations, we observe that using this side information, we considerably improve the state recognition performance, up to 70%, with respect to the "achievable margin" defined by the baseline algorithms. Moreover, our algorithm is shown to be robust to the training conditions.


Context tree selection and linguistic rhythm retrieval from written texts

arXiv.org Machine Learning

The starting point of this article is the question "How to retrieve fingerprints of rhythm in written texts?" We address this problem in the case of Brazilian and European Portuguese. These two dialects of Modern Portuguese share the same lexicon and most of the sentences they produce are superficially identical. Yet they are conjectured, on linguistic grounds, to implement different rhythms. We show that this linguistic question can be formulated as a problem of model selection in the class of variable length Markov chains. To carry on this approach, we compare texts from European and Brazilian Portuguese. These texts are previously encoded according to some basic rhythmic features of the sentences which can be automatically retrieved. This is an entirely new approach from the linguistic point of view. Our statistical contribution is the introduction of the smallest maximizer criterion which is a constant free procedure for model selection. As a by-product, this provides a solution for the problem of optimal choice of the penalty constant when using the BIC to select a variable length Markov chain. Besides proving the consistency of the smallest maximizer criterion when the sample size diverges, we also make a simulation study comparing our approach with both the standard BIC selection and the Peres-Shields order estimation. Applied to the linguistic sample constituted for our case study, the smallest maximizer criterion assigns different context-tree models to the two dialects of Portuguese. The features of the selected models are compatible with current conjectures discussed in the linguistic literature.


Learning Feature Hierarchies with Centered Deep Boltzmann Machines

arXiv.org Artificial Intelligence

Deep Boltzmann machines are in principle powerful models for extracting the hierarchical structure of data. Unfortunately, attempts to train layers jointly (without greedy layer-wise pretraining) have been largely unsuccessful. We propose a modification of the learning algorithm that initially recenters the output of the activation functions to zero. This modification leads to a better conditioned Hessian and thus makes learning easier. We test the algorithm on real data and demonstrate that our suggestion, the centered deep Boltzmann machine, learns a hierarchy of increasingly abstract representations and a better generative model of data.


The Hierarchical Dirichlet Process Hidden Semi-Markov Model

arXiv.org Machine Learning

There is much interest in the Hierarchical Dirichlet Process Hidden Markov Model (HDP-HMM) as a natural Bayesian nonparametric extension of the traditional HMM. However, in many settings the HDP-HMM's strict Markovian constraints are undesirable, particularly if we wish to learn or encode non-geometric state durations. We can extend the HDP-HMM to capture such structure by drawing upon explicit-duration semi-Markovianity, which has been developed in the parametric setting to allow construction of highly interpretable models that admit natural prior information on state durations. In this paper we introduce the explicitduration HDP-HSMM and develop posterior sampling algorithms for efficient inference in both the direct-assignment and weak-limit approximation settings. We demonstrate the utility of the model and our inference methods on synthetic data as well as experiments on a speaker diarization problem and an example of learning the patterns in Morse code.


Timeline: A Dynamic Hierarchical Dirichlet Process Model for Recovering Birth/Death and Evolution of Topics in Text Stream

arXiv.org Machine Learning

Topic models have proven to be a useful tool for discovering latent structures in document collections. However, most document collections often come as temporal streams and thus several aspects of the latent structure such as the number of topics, the topics' distribution and popularity are time-evolving. Several models exist that model the evolution of some but not all of the above aspects. In this paper we introduce infinite dynamic topic models, iDTM, that can accommodate the evolution of all the aforementioned aspects. Our model assumes that documents are organized into epochs, where the documents within each epoch are exchangeable but the order between the documents is maintained across epochs. iDTM allows for unbounded number of topics: topics can die or be born at any epoch, and the representation of each topic can evolve according to a Markovian dynamics. We use iDTM to analyze the birth and evolution of topics in the NIPS community and evaluated the efficacy of our model on both simulated and real datasets with favorable outcome.