Directed Networks
Physics-constrained, data-driven discovery of coarse-grained dynamics
Felsberger, L., Koutsourelakis, P. S.
The combination of high-dimensionality and disparity of time scales encountered in many problems in computational physics has motivated the development of coarse-grained (CG) models. In this paper, we advocate the paradigm of data-driven discovery for extract- ing governing equations by employing fine-scale simulation data. In particular, we cast the coarse-graining process under a probabilistic state-space model where the transition law dic- tates the evolution of the CG state variables and the emission law the coarse-to-fine map. The directed probabilistic graphical model implied, suggests that given values for the fine- grained (FG) variables, probabilistic inference tools must be employed to identify the cor- responding values for the CG states and to that end, we employ Stochastic Variational In- ference. We advocate a sparse Bayesian learning perspective which avoids overfitting and reveals the most salient features in the CG evolution law. The formulation adopted enables the quantification of a crucial, and often neglected, component in the CG process, i.e. the pre- dictive uncertainty due to information loss. Furthermore, it is capable of reconstructing the evolution of the full, fine-scale system. We demonstrate the efficacy of the proposed frame- work in high-dimensional systems of random walkers.
Deep learning with t-exponential Bayesian kitchen sinks
Partaourides, Harris, Chatzis, Sotirios
Bayesian learning has been recently considered as an effective means of accounting for uncertainty in trained deep network parameters. This is of crucial importance when dealing with small or sparse training datasets. On the other hand, shallow models that compute weighted sums of their inputs, after passing them through a bank of arbitrary randomized nonlinearities, have been recently shown to enjoy good test error bounds that depend on the number of nonlinearities. Inspired from these advances, in this paper we examine novel deep network architectures, where each layer comprises a bank of arbitrary nonlinearities, linearly combined using multiple alternative sets of weights. We effect model training by means of approximate inference based on a t-divergence measure; this generalizes the Kullback-Leibler divergence in the context of the t-exponential family of distributions. We adopt the t-exponential family since it can more flexibly accommodate real-world data, that entail outliers and distributions with fat tails, compared to conventional Gaussian model assumptions. We extensively evaluate our approach using several challenging benchmarks, and provide comparative results to related state-of-the-art techniques.
Enhanced version of AdaBoostM1 with J48 Tree learning method
Kang, Kyongche, Michalak, Jack
Machine Learning focuses on the construction and study of systems that can learn from data. This is connected with the classification problem, which usually is what Machine Learning algorithms are designed to solve. When a machine learning method is used by people with no special expertise in machine learning, it is important that the method be'robust' in classification, in the sense that reasonable performance is obtained with minimal tuning of the problem at hand. Algorithms are evaluated based on how'robust' they can classify the given data. In this paper, we propose a quantifiable measure of'robustness', and describe a particular learning method that is robust according to this measure in the context of classification problem. We proposed Adaptive Boosting (AdaBoostM1) with J48(C4.5 tree) as a base learner with tuning weight threshold (P) and number of iterations (I) for boosting algorithm. To benchmark the performance, we used the baseline classifier, AdaBoostM1 with Decision Stump as base learner without tuning parameters. By tuning parameters and using J48 as base learner, we are able to reduce the overall average error rate ratio (errorC/errorNB) from 2.4 to 0.9 for development sets of data and 2.1 to 1.2 for evaluation sets of data.
Minimally Faithful Inversion of Graphical Models
Webb, Stefan, Golinski, Adam, Zinkov, Robert, Siddharth, N., Rainforth, Tom, Teh, Yee Whye, Wood, Frank
Inference amortization methods allow the sharing of statistical strength across related observations when learning to perform posterior inference. Generally this requires the inversion of the dependency structure in the generative model, as the modeller must design and learn a distribution to approximate the posterior. Previous methods invert the dependency structure in a heuristic way and fail to capture the dependencies in the model, therefore limiting the performance of the eventual inference algorithm. We introduce an algorithm for faithfully and minimally inverting the graphical model structure of any generative model. Such an inversion has two crucial properties: a) it does not encode any independence assertions absent from the model, and b) for a given inversion, it encodes as many true independence assertions as possible. Our algorithm works by simulating variable elimination on the generative model to reparametrize the distribution. We show with experiments how such minimal inversions can assist in performing better inference.
Machine Learning in Robotics - 5 Modern Applications
As the term "machine learning" has heated up, interest in "robotics" (as expressed in Google Trends) has not altered much over the last three years. So how much of a place is there for machine learning in robotics? While only a portion of recent developments in robotics can be credited to developments and uses of machine learning, I've aimed to collect some of the more prominent applications together in this article, along with links and references. Before I delve into machine learning in robotics, go ahead and define "robot". Though at first this might seem simple, it's no easy task to come to an agreement on just what a robot is and what it is not, even amongst roboticists.
Probabilistic Planning With Influence Diagrams
Lee, Junkyu (University of California, Irvine)
Graphical models provide a powerful framework for reasoning under uncertainty, and an influence diagram (ID) is a graphical model of a sequential decision problem that maximizes the total expected utility of a non-forgetting agent. Relaxing the regular modeling assumptions, an ID can be flexibly extended to general decision scenarios involving a limited memory agent or multi-agents. The approach of probabilistic planning with IDs is expected to gain computational leverage by exploiting the local structure as well as representation flexibility of influence diagram frameworks. My research focuses on graphical model inference for IDs and its application to probabilistic planning, targeting online MDP/POMDP planning as testbeds in the evaluation.
Incorporating Discriminator in Sentence Generation: a Gibbs Sampling Method
Su, Jinyue (Fudan University) | Xu, Jiacheng (Fudan University) | Qiu, Xipeng (Fudan University) | Huang, Xuanjing (Fudan University)
Generating plausible and fluent sentence with desired properties has long been a challenge. Most of the recent works use recurrent neural networks (RNNs) and their variants to predict following words given previous sequence and target label. In this paper, we propose a novel framework to generate constrained sentences via Gibbs Sampling. The candidate sentences are revised and updated iteratively, with sampled new words replacing old ones. Our experiments show the effectiveness of the proposed method to generate plausible and diverse sentences.
Bayesian Verb Sense Clustering
Peterson, Daniel W (University of Colorado at Boulder) | Palmer, Martha (University of Colorado at Boulder)
This work performs verb sense induction and clustering based on observed syntactic distributions in a large corpus. VerbNet is a hierarchical clustering of verbs and a useful semantic resource. We address the main drawbacks of VerbNet, by proposing a Bayesian model to build VerbNet-like clusters automatically and with full coverage. Relative to the prior state of the art, we improve accuracy on verb sense induction by over 20% absolute F1. We then propose a new model, inspired by the positive pointwise mutual information (PPMI). Our PPMI-based mixture model permits an extremely efficient sampler, while improving performance. Our best model shows a 4.5% absolute F1 improvement over the best non-PPMI model, with over an order of magnitude less computation time. Though this model is inspired by clustering verb senses, it may be applicable in other situations where multiple items are being sampled as a group.
SELF: Structural Equational Likelihood Framework for Causal Discovery
Cai, Ruichu (Guangdong University of Technology) | Qiao, Jie ( Guangdong University of Technology ) | Zhang, Zhenjie ( Advanced Digital Sciences Center, Illinois at Singapore Pte. Ltd. ) | Hao, Zhifeng (Guangdong University of Technology)
Causal discovery without intervention is well recognized as a challenging yet powerful data analysis tool, boosting the development of other scientific areas, such as biology, astronomy, and social science. The major technical difficulty behind the observation-based causal discovery is to effectively and efficiently identify causes and effects from correlated variables given the existence of significant noises. Previous studies mostly employ two very different methodologies under Bayesian network framework, namely global likelihood maximization and locally complexity analysis over marginal distributions. While these approaches are effective in their respective problem domains, in this paper, we show that they can be combined to formulate a new global optimization model with local statistical significance, called structural equational likelihood framework (or SELF in short). We provide thorough analysis on the soundness of the model under mild conditions and present efficient heuristic-based algorithms for scalable model training. Empirical evaluations using XGBoost validate the superiority of our proposal over state-of-the-art solutions, on both synthetic and real world causal structures.
Thompson Sampling for Dynamic Pricing
Ganti, Ravi, Sustik, Matyas, Tran, Quoc, Seaman, Brian
In this paper we apply active learning algorithms for dynamic pricing in a prominent e-commerce website. Dynamic pricing involves changing the price of items on a regular basis, and uses the feedback from the pricing decisions to update prices of the items. Most popular approaches to dynamic pricing use a passive learning approach, where the algorithm uses historical data to learn various parameters of the pricing problem, and uses the updated parameters to generate a new set of prices. We show that one can use active learning algorithms such as Thompson sampling to more efficiently learn the underlying parameters in a pricing problem. We apply our algorithms to a real e-commerce system and show that the algorithms indeed improve revenue compared to pricing algorithms that use passive learning.