Directed Networks
Active Learning of Spin Network Models
Jiang, Jialong, Sivak, David A., Thomson, Matt
Complex networks can be modeled as a probabilistic graphical model, where the interactions between binary variables, "spins", on nodes are described by a coupling matrix that is inferred from observations. The inverse statistical problem of finding direct interactions is difficult, especially for large systems, because of the exponential growth in the possible number of states and the possible number of networks. In the context of the experimental sciences, well-controlled perturbations can be applied to a system, shedding light on the internal structure of the network. Therefore, we propose a method to improve the accuracy and efficiency of inference by iteratively applying perturbations to a network that are advantageous under a Bayesian framework. The spectrum of the empirical Fisher information can be used as a measure for the difficulty of the inference during the training process. We significantly improve the accuracy and efficiency of inference in medium-sized networks based on this strategy with a reasonable number of experimental queries. Our method could be powerful in the analysis of complex networks as well as in the rational design of experiments.
The Mysterious Math of How Cells Determine Their Own Fate
In 1891, when the German biologist Hans Driesch split two-cell sea urchin embryos in half, he found that each of the separated cells then gave rise to its own complete, albeit smaller, larva. Somehow, the halves "knew" to change their entire developmental program: At that stage, the blueprint for what they would become had apparently not yet been drawn out, at least not in ink. Original story reprinted with permission from Quanta Magazine, an editorially independent publication of the Simons Foundation whose mission is to enhance public understanding of science by covering research developments and trends in mathematics and the physical and life sciences. Since then, scientists have been trying to understand what goes into making this blueprint, and how instructive it is. It's now known that some form of positional information makes genes variously switch on and off throughout the embryo, giving cells distinct identities based on their location.
From both sides now: the math of linear regression ·
Linear regression is the most basic and the most widely used technique in machine learning; yet for all its simplicity, studying it can unlock some of the most important concepts in statistics. If you have a basic undestanding of linear regression expressed as $ \hat{Y} \theta_0 \theta_1X$, but don't have a background in statistics and find statements like "ridge regression is equivalent to the maximum a posteriori (MAP) estimate with a zero-mean Gaussian prior" bewildering, then this post is for you. With a superficial goal of understanding that somewhat obtuse statement, its main objective is to explore the topic, starting from the standard formulation of linear regression, moving on to the probabilistic approach (maximum likelihood formulation) and from there to Bayesian linear regression. I'll use the $\theta$ character throughout to refer to the coefficients (weights) of a regression model, either explicitly broken out as $\theta_0$ and $\theta_1$ for intercept and slope respectively, or just $\theta$ referring to the vector of coefficients. I'll usually use the expression $\theta Tx_i$ for the prediction a model gives at $x_i$, the assumption being that a 1 has been added to the vector of values at $x_i$. 1 In the single predictor case, we know that the least squares fit is the line that minimizes the sum of the squared distances between observed data and predicted values, i.e. it minimizes the Residual Sum of Squares (RSS): These residuals are pretty important in how we reason about our model.
How to Improve Political Forecasts - Issue 70: Variables
The 2020 Democratic candidates are out of the gate and the pollsters have the call! Bernie Sanders is leading by two lengths with Kamala Harris and Elizabeth Warren right behind, but Cory Booker and Beto O'Rourke are coming on fast! The political horse-race season is upon us and I bet I know what you are thinking: "Stop!" Every election we complain about horse-race coverage and every election we stay glued to it all the same. The problem with this kind of coverage is not that it's unimportant.
Time Series Imputation
Arcadinho, Samuel, Mateus, Paulo
Nowadays the world is full of digital data, due to the large deployment of sensors, fast internet and more computational power to generate all such that data. This data is might be very useful to extract information and predict events, allowing us to control or profit from them. In order to achieve such goal, we need fast algorithms that are capable of finding features that could bring useful information. However, this is a nontrivial task, as data is very large and usual simple statistics are slow and inaccurate. Thus, the term data mining appeared to describe the problem of finding useful information in large data sets by integrating methods from many fields, like machine learning, statistics and database systems, spatial or temporal data analysis, pattern recognition, image and signal processing. In recent years many works have been done to use machine learning techniques in order to extract useful information from data.
Scalable Data Augmentation for Deep Learning
Wang, Yuexi, Polson, Nicholas G., Sokolov, Vadim O.
Scalable Data Augmentation (SDA) provides a framework for training deep neural networks (DNNs). Our methodology exploits auxiliary hidden units which are designed to avoid backtracking and traverse local modes in an efficient way. This allows us to exploit recent advantages in high performance computing such as scalable linear algebra (CUDA, XLA). We show how to implement standard activation and objective functions, including ReLU (Polson and Ročková, 2018), logit (Zhou et al., 2012) and SVM (Mallick et al., 2005) are all available as data augmentation schemes. Data augmentation strategies are commonplace in statistical applications such as EM, ECM and MM algorithms, as they accelerate convergence and can use Nesterov acceleration (Nesterov, 1983).
Multi-modal Probabilistic Prediction of Interactive Behavior via an Interpretable Model
Hu, Yeping, Zhan, Wei, Tomizuka, Masayoshi
For autonomous agents to successfully operate in real world, the ability to anticipate future motions of surrounding entities in the scene can greatly enhance their safety levels since potentially dangerous situations could be avoided in advance. While impressive results have been shown on predicting each agent's behavior independently, we argue that it is not valid to consider road entities individually since transitions of vehicle states are highly coupled. Moreover, as the predicted horizon becomes longer, modeling prediction uncertainties and multi-modal distributions over future sequences will turn into a more challenging task. In this paper, we address this challenge by presenting a multi-modal probabilistic prediction approach. The proposed method is based on a generative model and is capable of jointly predicting sequential motions of each pair of interacting agents. Most importantly, our model is interpretable, which can explain the underneath logic as well as obtain more reliability to use in real applications. A complicate real-world roundabout scenario is utilized to implement and examine the proposed method.
Variational Bayesian modelling of mixed-effects
This note is concerned with an accurate and computationally efficient variational bayesian treatment of mixed-effects modelling. We focus on group studies, i.e. empirical studies that report multiple measurements acquired in multiple subjects. When approached from a bayesian perspective, such mixed-effects models typically rely upon a hierarchical generative model of the data, whereby both within- and between-subject effects contribute to the overall observed variance. The ensuing VB scheme can be used to assess statistical significance at the group level and/or to capture inter-individual differences. Alternatively, it can be seen as an adaptive regularization procedure, which iteratively learns the corresponding within-subject priors from estimates of the group distribution of effects of interest (cf. so-called "empirical bayes" approaches). We outline the mathematical derivation of the ensuing VB scheme, whose open-source implementation is available as part the VBA toolbox.
Transferability of Operational Status Classification Models Among Different Wind Turbine Typesq
Trstanova, Z., Martinsson, A., Matthews, C., Jimenez, S., Leimkuhler, B., Van Delft, T., Wilkinson, M.
A detailed understanding of wind turbine performance status classification can improve operations and maintenance in the wind energy industry. Due to different engineering properties of wind turbines, the standard supervised learning models used for classification do not generalize across data sets obtained from different wind sites. We propose two methods to deal with the transferability of the trained models: first, data normalization in the form of power curve alignment, and second, a robust method based on convolutional neural networks and feature-space extension. We demonstrate the success of our methods on real-world data sets with industrial applications. Keywords: Machine learning, classification, generalization, CNN, wind turbine, wind energy 1. Introduction Classification of operational status is an important step for performance analysis of wind farms from data of SCADA (Supervisory Control and Data Acquisition) type.
The Binary Space Partitioning-Tree Process
Fan, Xuhui, Li, Bin, Sisson, Scott Anthony
The Mondrian process represents an elegant and powerful approach for space partition modelling. However, as it restricts the partitions to be axis-aligned, its modelling flexibility is limited. In this work, we propose a self-consistent Binary Space Partitioning (BSP)-Tree process to generalize the Mondrian process. The BSP-Tree process is an almost surely right continuous Markov jump process that allows uniformly distributed oblique cuts in a two-dimensional convex polygon. The BSP-Tree process can also be extended using a non-uniform probability measure to generate direction differentiated cuts. The process is also self-consistent, maintaining distributional invariance under a restricted subdomain. We use Conditional-Sequential Monte Carlo for inference using the tree structure as the high-dimensional variable. The BSP-Tree process's performance on synthetic data partitioning and relational modelling demonstrates clear inferential improvements over the standard Mondrian process and other related methods.