Uncertainty
Comment on "An excess of massive stars in the local 30 Doradus starburst"
Schneider et al. (Reports, 5 January 2018, p. 69) used an ad hoc statistical method in their calculation of the stellar initial mass function. Adopting an improved approach, we reanalyze their data and determine a power-law exponent of . Alternative assumptions regarding dataset completeness and the star formation history model can shift the inferred exponent to and, respectively. They estimate the ages and masses of individual stars with the BONNSAI Bayesian code (3), then obtain an overall mass distribution by effectively adding together the posterior probability density functions of individual stars. There is no statistical meaning to a distribution obtained in this way, which does not represent the posterior probability density function on the mass distribution. Hierarchical Bayesian inference provides the statistically justified solution to this problem (4).
Entropic Latent Variable Discovery
Kocaoglu, Murat, Shakkottai, Sanjay, Dimakis, Alexandros G., Caramanis, Constantine, Vishwanath, Sriram
We consider the problem of discovering the simplest latent variable that can make two observed discrete variables conditionally independent. This problem has appeared in the literature as probabilistic latent semantic analysis (pLSA), and has connections to non-negative matrix factorization. When the simplicity of the variable is measured through its cardinality, we show that a solution to this latent variable discovery problem can be used to distinguish direct causal relations from spurious correlations among almost all joint distributions on simple causal graphs with two observed variables. Conjecturing a similar identifiability result holds with Shannon entropy, we study a loss function that trades-off between entropy of the latent variable and the conditional mutual information of the observed variables. We then propose a latent variable discovery algorithm -- LatentSearch -- and show that its stationary points are the stationary points of our loss function. We experimentally show that LatentSearch can indeed be used to distinguish direct causal relations from spurious correlations.
A Survey on Multi-Task Learning
Multi-Task Learning (MTL) is a learning paradigm in machine learning and its aim is to leverage useful information contained in multiple related tasks to help improve the generalization performance of all the tasks. In this paper, we give a survey for MTL. First, we classify different MTL algorithms into several categories, including feature learning approach, low-rank approach, task clustering approach, task relation learning approach, and decomposition approach, and then discuss the characteristics of each approach. In order to improve the performance of learning tasks further, MTL can be combined with other learning paradigms including semi-supervised learning, active learning, unsupervised learning, reinforcement learning, multi-view learning and graphical models. When the number of tasks is large or the data dimensionality is high, batch MTL models are difficult to handle this situation and online, parallel and distributed MTL models as well as dimensionality reduction and feature hashing are reviewed to reveal their computational and storage advantages. Many real-world applications use MTL to boost their performance and we review representative works. Finally, we present theoretical analyses and discuss several future directions for MTL.
Variational Bayesian Reinforcement Learning with Regret Bounds
We consider the exploration-exploitation trade-off in reinforcement learning and we show that an agent imbued with a risk-seeking utility function is able to explore efficiently, as measured by regret. The parameter that controls how risk-seeking the agent is can be optimized exactly, or annealed according to a schedule. We call the resulting algorithm K-learning and show that the corresponding K-values are optimistic for the expected Q-values at each state-action pair. The K-values induce a natural Boltzmann exploration policy for which the `temperature' parameter is equal to the risk-seeking parameter. This policy achieves an expected regret bound of $\tilde O(L^{3/2} \sqrt{S A T})$, where $L$ is the time horizon, $S$ is the number of states, $A$ is the number of actions, and $T$ is the total number of elapsed time-steps. This bound is only a factor of $L$ larger than the established lower bound. K-learning can be interpreted as mirror descent in the policy space, and it is similar to other well-known methods in the literature, including Q-learning, soft-Q-learning, and maximum entropy policy gradient, and is closely related to optimism and count based exploration methods. K-learning is simple to implement, as it only requires adding a bonus to the reward at each state-action and then solving a Bellman equation. We conclude with a numerical example demonstrating that K-learning is competitive with other state-of-the-art algorithms in practice.
Deep Contextual Multi-armed Bandits
Collier, Mark, Llorens, Hector Urdiales
Contextual multi-armed bandit problems arise frequently in important industrial applications. Existing solutions model the context either linearly, which enables uncertainty driven (principled) exploration, or non-linearly, by using epsilon-greedy exploration policies. Here we present a deep learning framework for contextual multi-armed bandits that is both non-linear and enables principled exploration at the same time. We tackle the exploration vs. exploitation trade-off through Thompson sampling by exploiting the connection between inference time dropout and sampling from the posterior over the weights of a Bayesian neural network. In order to adjust the level of exploration automatically as more data is made available to the model, the dropout rate is learned rather than considered a hyperparameter. We demonstrate that our approach substantially reduces regret on two tasks (the UCI Mushroom task and the Casino Parity task) when compared to 1) non-contextual bandits, 2) epsilon-greedy deep contextual bandits, and 3) fixed dropout rate deep contextual bandits. Our approach is currently being applied to marketing optimization problems at HubSpot.
Convergence Rates of Gaussian ODE Filters
Kersting, Hans, Sullivan, T. J., Hennig, Philipp
A recently-introduced class of probabilistic (uncertainty-aware) solvers for ordinary differential equations (ODEs) applies Gaussian (Kalman) filtering to initial value problems. These methods model the true solution $x$ and its first $q$ derivatives a priori as a Gauss--Markov process $\boldsymbol{X}$, which is then iteratively conditioned on information about $\dot{x}$. We prove worst-case local convergence rates of order $h^{q+1}$ for a wide range of versions of this Gaussian ODE filter, as well as global convergence rates of order $h^q$ in the case of $q=1$ and an integrated Brownian motion prior, and analyze how inaccurate information on $\dot{x}$ coming from approximate evaluations of $f$ affects these rates. Moreover, we present explicit formulas for the steady states and show that the posterior confidence intervals are well calibrated in all considered cases that exhibit global convergence---in the sense that they globally contract at the same rate as the truncation error.
Visual Dynamics: Stochastic Future Generation via Layered Cross Convolutional Networks
Xue, Tianfan, Wu, Jiajun, Bouman, Katherine L., Freeman, William T.
We study the problem of synthesizing a number of likely future frames from a single input image. In contrast to traditional methods that have tackled this problem in a deterministic or non-parametric way, we propose to model future frames in a probabilistic manner. Our probabilistic model makes it possible for us to sample and synthesize many possible future frames from a single input image. To synthesize realistic movement of objects, we propose a novel network structure, namely a Cross Convolutional Network; this network encodes image and motion information as feature maps and convolutional kernels, respectively. In experiments, our model performs well on synthetic data, such as 2D shapes and animated game sprites, and on real-world video frames. We present analyses of the learned network representations, showing it is implicitly learning a compact encoding of object appearance and motion. We also demonstrate a few of its applications, including visual analogy-making and video extrapolation.
Tractable Querying and Learning in Hybrid Domains via Sum-Product Networks
Bueff, Andreas, Speichert, Stefanie, Belle, Vaishak
Probabilistic representations, such as Bayesian and Markov networks, are fundamental to much of statistical machine learning. Thus, learning probabilistic representations directly from data is a deep challenge, the main computational bottleneck being inference that is intractable. Tractable learning is a powerful new paradigm that attempts to learn distributions that support efficient probabilistic querying. By leveraging local structure, representations such as sum-product networks (SPNs) can capture high tree-width models with many hidden layers, essentially a deep architecture, while still admitting a range of probabilistic queries to be computable in time polynomial in the network size. The leaf nodes in SPNs, from which more intricate mixtures are formed, are tractable univariate distributions, and so the literature has focused on Bernoulli and Gaussian random variables. This is clearly a restriction for handling mixed discrete-continuous data, especially if the continuous features are generated from non-parametric and non-Gaussian distribution families. In this work, we present a framework that systematically integrates SPN structure learning with weighted model integration, a recently introduced computational abstraction for performing inference in hybrid domains, by means of piecewise polynomial approximations of density functions of arbitrary shape. Our framework is instantiated by exploiting the notion of propositional abstractions, thus minimally interfering with the SPN structure learning module, and supports a powerful query interface for conditioning on interval constraints. Our empirical results show that our approach is effective, and allows a study of the trade off between the granularity of the learned model and its predictive power.
Global Artificial Intelligence (AI) Industry
Germany Market Analysis Table 35: German Recent Past, Current & Future Analysis for Artificial Intelligence Analyzed with Annual Revenue Figures in US$ Million for Years 2015 through 2024 (includes corresponding Graph/Chart) 9.4.3 Italy Market Analysis Table 36: Italian Recent Past, Current & Future Analysis for Artificial Intelligence Analyzed with Annual Revenue Figures in US$ Million for Years 2015 through 2024 (includes corresponding Graph/Chart) 9.4.4
Artificial intelligence saves water for water users associations
Agriculture uses 70 percent of the water in the world, and this appears to be an upward trend regarding water needs. As the demand in other industry sectors is also increasing, and the effects of climate change exacerbate water shortages, water saving measures have become an unavoidable challenge for maintaining the sector and preserving life. Agronomy researcher Rafael González has developed a model to predict in advance the water that users will need each day. This tool came about from a drive to ally with water resource sustainability. The model applies artificial intelligence techniques including fuzzy logic, a system used to explain the behavior of decision making.