Goto

Collaborating Authors

 Genre


Global Sensitivity Analysis with Dependence Measures

arXiv.org Machine Learning

Global sensitivity analysis with variance-based measures suffers from several theoretical and practical limitations, since they focus only on the variance of the output and handle multivariate variables in a limited way. In this paper, we introduce a new class of sensitivity indices based on dependence measures which overcomes these insufficiencies. Our approach originates from the idea to compare the output distribution with its conditional counterpart when one of the input variables is fixed. We establish that this comparison yields previously proposed indices when it is performed with Csiszar f-divergences, as well as sensitivity indices which are well-known dependence measures between random variables. This leads us to investigate completely new sensitivity indices based on recent state-of-the-art dependence measures, such as distance correlation and the Hilbert-Schmidt independence criterion. We also emphasize the potential of feature selection techniques relying on such dependence measures as alternatives to screening in high dimension.


Predictable Feature Analysis

arXiv.org Machine Learning

Every organism in an environment, whether biological, robotic or virtual, must be able to predict certain aspects of its environment in order to survive or perform whatever task is intended. It needs a model that is capable of estimating the consequences of possible actions, so that planning, control, and decision-making become feasible. For scientific purposes, such models are usually created in a problem specific manner using differential equations and other techniques from control- and system-theory. In contrast to that, we aim for an unsupervised approach that builds up the desired model in a self-organized fashion. Inspired by Slow Feature Analysis (SFA), our approach is to extract sub-signals from the input, that behave as predictable as possible. These "predictable features" are highly relevant for modeling, because predictability is a desired property of the needed consequence-estimating model by definition. In our approach, we measure predictability with respect to a certain prediction model. We focus here on the solution of the arising optimization problem and present a tractable algorithm based on algebraic methods which we call Predictable Feature Analysis (PFA). We prove that the algorithm finds the globally optimal signal, if this signal can be predicted with low error. To deal with cases where the optimal signal has a significant prediction error, we provide a robust, heuristically motivated variant of the algorithm and verify it empirically. Additionally, we give formal criteria a prediction-model must meet to be suitable for measuring predictability in the PFA setting and also provide a suitable default-model along with a formal proof that it meets these criteria.


Exploiting correlation and budget constraints in Bayesian multi-armed bandit optimization

arXiv.org Machine Learning

We address the problem of finding the maximizer of a nonlinear smooth function, that can only be evaluated point-wise, subject to constraints on the number of permitted function evaluations. This problem is also known as fixed-budget best arm identification in the multi-armed bandit literature. We introduce a Bayesian approach for this problem and show that it empirically outperforms both the existing frequentist counterpart and other Bayesian optimization methods. The Bayesian approach places emphasis on detailed modelling, including the modelling of correlations among the arms. As a result, it can perform well in situations where the number of arms is much larger than the number of allowed function evaluation, whereas the frequentist counterpart is inapplicable. This feature enables us to develop and deploy practical applications, such as automatic machine learning toolboxes. The paper presents comprehensive comparisons of the proposed approach, Thompson sampling, classical Bayesian optimization techniques, more recent Bayesian bandit approaches, and state-of-the-art best arm identification methods. This is the first comparison of many of these methods in the literature and allows us to examine the relative merits of their different features.


Motility at the origin of life: Its characterization and a model

arXiv.org Artificial Intelligence

Due to recent advances in synthetic biology and artificial life, the origin of life is currently a hot topic of research. We review the literature and argue that the two traditionally competing "replicator-first" and "metabolism-first" approaches are merging into one integrated theory of individuation and evolution. We contribute to the maturation of this more inclusive approach by highlighting some problematic assumptions that still lead to an impoverished conception of the phenomenon of life. In particular, we argue that the new consensus has so far failed to consider the relevance of intermediate timescales. We propose that an adequate theory of life must account for the fact that all living beings are situated in at least four distinct timescales, which are typically associated with metabolism, motility, development, and evolution. On this view, self-movement, adaptive behavior and morphological changes could have already been present at the origin of life. In order to illustrate this possibility we analyze a minimal model of life-like phenomena, namely of precarious, individuated, dissipative structures that can be found in simple reaction-diffusion systems. Based on our analysis we suggest that processes in intermediate timescales could have already been operative in prebiotic systems. They may have facilitated and constrained changes occurring in the faster- and slower-paced timescales of chemical self-individuation and evolution by natural selection, respectively.


Using Expressive Language Generation to Increase Authorial Leverage

AAAI Conferences

It is widely agreed that the interaction possibilities in interactive narrative are limited by the current approach to dialog creation, which typically relies on teams of scriptwriters. This paper reports on experiments testing whether authorial leverage can be increased by methods for natural language generation of dialog in a story world called Heart of Shadows. Our experiments show that (1) expert writers spend less time authoring dialogue variations by editing automatically generated content than creating it from scratch and (2) expert writers are more critical of automatically generated dialogue than amateur writers.


The Combinatorial Multi-Armed Bandit Problem and Its Application to Real-Time Strategy Games

AAAI Conferences

Game tree search in games with large branching factors is a notoriously hard problem. In this paper, we address this problem with a new sampling strategy for Monte Carlo Tree Search (MCTS) algorithms, called "Naive Sampling", based on a variant of the Multi-armed Bandit problem called the "Combinatorial Multi-armed Bandit" (CMAB) problem. We present a new MCTS algorithm based on Naive Sampling called NaiveMCTS, and evaluate it in the context of real-time strategy (RTS) games. Our results show that as the branching factor grows, NaiveMCTS performs significantly better than other algorithms.


Learning Gaussian Graphical Models with Observed or Latent FVSs

arXiv.org Machine Learning

Gaussian Graphical Models (GGMs) or Gauss Markov random fields are widely used in many applications, and the trade-off between the modeling capacity and the efficiency of learning and inference has been an important research problem. In this paper, we study the family of GGMs with small feedback vertex sets (FVSs), where an FVS is a set of nodes whose removal breaks all the cycles. Exact inference such as computing the marginal distributions and the partition function has complexity $O(k^{2}n)$ using message-passing algorithms, where k is the size of the FVS, and n is the total number of nodes. We propose efficient structure learning algorithms for two cases: 1) All nodes are observed, which is useful in modeling social or flight networks where the FVS nodes often correspond to a small number of high-degree nodes, or hubs, while the rest of the networks is modeled by a tree. Regardless of the maximum degree, without knowing the full graph structure, we can exactly compute the maximum likelihood estimate in $O(kn^2+n^2\log n)$ if the FVS is known or in polynomial time if the FVS is unknown but has bounded size. 2) The FVS nodes are latent variables, where structure learning is equivalent to decomposing a inverse covariance matrix (exactly or approximately) into the sum of a tree-structured matrix and a low-rank matrix. By incorporating efficient inference into the learning steps, we can obtain a learning algorithm using alternating low-rank correction with complexity $O(kn^{2}+n^{2}\log n)$ per iteration. We also perform experiments using both synthetic data as well as real data of flight delays to demonstrate the modeling capacity with FVSs of various sizes.


Pattern-Coupled Sparse Bayesian Learning for Recovery of Block-Sparse Signals

arXiv.org Machine Learning

We consider the problem of recovering block-sparse signals whose structures are unknown \emph{a priori}. Block-sparse signals with nonzero coefficients occurring in clusters arise naturally in many practical scenarios. However, the knowledge of the block structure is usually unavailable in practice. In this paper, we develop a new sparse Bayesian learning method for recovery of block-sparse signals with unknown cluster patterns. Specifically, a pattern-coupled hierarchical Gaussian prior model is introduced to characterize the statistical dependencies among coefficients, in which a set of hyperparameters are employed to control the sparsity of signal coefficients. Unlike the conventional sparse Bayesian learning framework in which each individual hyperparameter is associated independently with each coefficient, in this paper, the prior for each coefficient not only involves its own hyperparameter, but also the hyperparameters of its immediate neighbors. In doing this way, the sparsity patterns of neighboring coefficients are related to each other and the hierarchical model has the potential to encourage structured-sparse solutions. The hyperparameters, along with the sparse signal, are learned by maximizing their posterior probability via an expectation-maximization (EM) algorithm. Numerical results show that the proposed algorithm presents uniform superiority over other existing methods in a series of experiments.


Low-Rank Matrix and Tensor Completion via Adaptive Sampling

arXiv.org Machine Learning

Recently, the machine learning and signal processing communities have focused considerable attention toward understanding the benefits of adaptive sensing. This theme is particularly relevant to modern data analysis, where adaptive sensing has emerged as an efficient alternative to obtaining and processing the large data sets associated with scientific investigation. These empirical observations have lead to a number of theoretical studies characterizing the performance gains offered by adaptive sensing over conventional, passive approaches. In this work, we continue in that direction and study the role of adaptive data acquisition in low rank matrix and tensor completion problems. Our study is motivated not only by prior theoretical results in favor of adaptive sensing but also by several applications where adaptive sensing is feasible. In recommender systems, obtaining a measurement amounts to asking a user about an item, an interaction that has been deployed in production systems. Another application pertains to network tomography, where a network operator is interested in inferring latencies between hosts in a communication network while injecting few packets into the network.


Moment-based Uniform Deviation Bounds for $k$-means and Friends

arXiv.org Machine Learning

Suppose $k$ centers are fit to $m$ points by heuristically minimizing the $k$-means cost; what is the corresponding fit over the source distribution? This question is resolved here for distributions with $p\geq 4$ bounded moments; in particular, the difference between the sample cost and distribution cost decays with $m$ and $p$ as $m^{\min\{-1/4, -1/2+2/p\}}$. The essential technical contribution is a mechanism to uniformly control deviations in the face of unbounded parameter sets, cost functions, and source distributions. To further demonstrate this mechanism, a soft clustering variant of $k$-means cost is also considered, namely the log likelihood of a Gaussian mixture, subject to the constraint that all covariance matrices have bounded spectrum. Lastly, a rate with refined constants is provided for $k$-means instances possessing some cluster structure.