Country
A unifying representation for a class of dependent random measures
Foti, Nicholas J., Futoma, Joseph D., Rockmore, Daniel N., Williamson, Sinead
We present a general construction for dependent random measures based on thinning Poisson processes on an augmented space. The framework is not restricted to dependent versions of a specific nonparametric model, but can be applied to all models that can be represented using completely random measures. Several existing dependent random measures can be seen as specific cases of this framework. Interesting properties of the resulting measures are derived and the efficacy of the framework is demonstrated by constructing a covariate-dependent latent feature model and topic model that obtain superior predictive performance.
Domain Adaptations for Computer Vision Applications
A basic assumption of statistical learning theory is that train and test data are drawn from the same underlying distribution. Unfortunately, this assumption doesn't hold in many applications. Instead, ample labeled data might exist in a particular `source' domain while inference is needed in another, `target' domain. Domain adaptation methods leverage labeled data from both domains to improve classification on unseen data in the target domain. In this work we survey domain transfer learning methods for various application domains with focus on recent work in Computer Vision.
A survey of non-exchangeable priors for Bayesian nonparametric models
Foti, Nicholas J., Williamson, Sinead
Dependent nonparametric processes extend distributions over measures, such as the Dirichlet process and the beta process, to give distributions over collections of measures, typically indexed by values in some covariate space. Such models are appropriate priors when exchangeability assumptions do not hold, and instead we want our model to vary fluidly with some set of covariates. Since the concept of dependent nonparametric processes was formalized by MacEachern [1], there have been a number of models proposed and used in the statistics and machine learning literatures. Many of these models exhibit underlying similarities, an understanding of which, we hope, will help in selecting an appropriate prior, developing new models, and leveraging inference techniques.
Random Input Sampling for Complex Models Using Markov Chain Monte Carlo
Mahmutoglu, A. Gokcen, Erdogan, Alper T., Demir, Alper
Many random processes can be simulated as the output of a deterministic model accepting random inputs. Such a model usually describes a complex mathematical or physical stochastic system and the randomness is introduced in the input variables of the model. When the statistics of the output event are known, these input variables have to be chosen in a specific way for the output to have the prescribed statistics. Because the probability distribution of the input random variables is not directly known but dictated implicitly by the statistics of the output random variables, this problem is usually intractable for classical sampling methods. Based on Markov Chain Monte Carlo we propose a novel method to sample random inputs to such models by introducing a modification to the standard Metropolis-Hastings algorithm. As an example we consider a system described by a stochastic differential equation (sde) and demonstrate how sample paths of a random process satisfying this sde can be generated with our technique.
Gliders2012: Development and Competition Results
Moore, Edward, Obst, Oliver, Prokopenko, Mikhail, Wang, Peter, Held, Jason
The RoboCup 2D Simulation League incorporates several challenging features, setting a benchmark for Artificial Intelligence (AI). In this paper we describe some of the ideas and tools around the development of our team, Gliders2012. In our description, we focus on the evaluation function as one of our central mechanisms for action selection. We also point to a new framework for watching log files in a web browser that we release for use and further development by the RoboCup community. Finally, we also summarize results of the group and final matches we played during RoboCup 2012, with Gliders2012 finishing 4th out of 19 teams.
A New Similarity Measure for Taxonomy Based on Edge Counting
K, Manjula Shenoy., Shet, K. C., Acharya, U. Dinesh
This paper introduces a new similarity measure based on edge counting in a taxonomy like WorldNet or Ontology. Measurement of similarity between text segments or concepts is very useful for many applications like information retrieval, ontology matching, text mining, and question answering and so on. Several measures have been developed for measuring similarity between two concepts: out of these we see that the measure given by Wu and Palmer [1] is simple, and gives good performance. Our measure is based on their measure but strengthens it. Wu and Palmer [1] measure has a disadvantage that it does not consider how far the concepts are semantically. In our measure we include the shortest path between the concepts and the depth of whole taxonomy together with the distances used in Wu and Palmer [1]. Also the measure has following disadvantage i.e. in some situations, the similarity of two elements of an IS-A ontology contained in the neighborhood exceeds the similarity value of two elements contained in the same hierarchy. Our measure introduces a penalization factor for this case based upon shortest length between the concepts and depth of whole taxonomy.
Bayesian nonparametric models for ranked data
Caron, Francois, Teh, Yee Whye
We develop a Bayesian nonparametric extension of the popular Plackett-Luce choice model that can handle an infinite number of choice items. Our framework is based on the theory of random atomic measures, with the prior specified by a gamma process. We derive a posterior characterization and a simple and effective Gibbs sampler for posterior simulation. We develop a time-varying extension of our model, and apply it to the New York Times lists of weekly bestselling books.
A Dataset for StarCraft AI \& an Example of Armies Clustering
Synnaeve, Gabriel, Bessiere, Pierre
This paper advocates the exploration of the full state of recorded real-time strategy (RTS) games, by human or robotic players, to discover how to reason about tactics and strategy. We present a dataset of StarCraft games encompassing the most of the games' state (not only player's orders). We explain one of the possible usages of this dataset by clustering armies on their compositions. This reduction of armies compositions to mixtures of Gaussian allow for strategic reasoning at the level of the components. We evaluated this clustering method by predicting the outcomes of battles based on armies compositions' mixtures components
A Rule-Based Approach For Aligning Japanese-Spanish Sentences From A Comparable Corpora
Ramírez, Jessica C., Matsumoto, Yuji
The performance of a Statistical Machine Translation System (SMT) system is proportionally directed to the quality and length of the parallel corpus it uses. However for some pair of languages there is a considerable lack of them. The long term goal is to construct a Japanese-Spanish parallel corpus to be used for SMT, whereas, there are a lack of useful Japanese-Spanish parallel Corpus. To address this problem, In this study we proposed a method for extracting Japanese-Spanish Parallel Sentences from Wikipedia using POS tagging and Rule-Based approach. The main focus of this approach is the syntactic features of both languages. Human evaluation was performed over a sample and shows promising results, in comparison with the baseline.
Data Clustering via Principal Direction Gap Partitioning
Abbey, Ralph, Diepenbrock, Jeremy, Langville, Amy, Meyer, Carl, Race, Shaina, Zhou, Dexin
We explore the geometrical interpretation of the PCA based clustering algorithm Principal Direction Divisive Partitioning (PDDP). We give several examples where this algorithm breaks down, and suggest a new method, gap partitioning, which takes into account natural gaps in the data between clusters. Geometric features of the PCA space are derived and illustrated and experimental results are given which show our method is comparable on the datasets used in the original paper on PDDP.