A high-bias, low-variance introduction to Machine Learning for physicists

arXiv.org Machine Learning

Machine Learning (ML) is one of the most exciting and dynamic areas of modern research and application. The purpose of this review is to provide an introduction to the core concepts and tools of machine learning in a manner easily understood and intuitive to physicists. The review begins by covering fundamental concepts in ML and modern statistics such as the bias-variance tradeoff, overfitting, regularization, and generalization before moving on to more advanced topics in both supervised and unsupervised learning. Topics covered in the review include ensemble models, deep learning and neural networks, clustering and data visualization, energy-based models (including MaxEnt models and Restricted Boltzmann Machines), and variational methods. Throughout, we emphasize the many natural connections between ML and statistical physics. A notable aspect of the review is the use of Python notebooks to introduce modern ML/statistical packages to readers using physics-inspired datasets (the Ising Model and Monte-Carlo simulations of supersymmetric decays of proton-proton collisions). We conclude with an extended outlook discussing possible uses of machine learning for furthering our understanding of the physical world as well as open problems in ML where physicists maybe able to contribute. (Notebooks are available at https://physics.bu.edu/~pankajm/MLnotebooks.html )


BayesGrad: Explaining Predictions of Graph Convolutional Networks

arXiv.org Machine Learning

Recent advances in graph convolutional networks have significantly improved the performance of chemical predictions, raising a new research question: "how do we explain the predictions of graph convolutional networks?" A possible approach to answer this question is to visualize evidence substructures responsible for the predictions. For chemical property prediction tasks, the sample size of the training data is often small and/or a label imbalance problem occurs, where a few samples belong to a single class and the majority of samples belong to the other classes. This can lead to uncertainty related to the learned parameters of the machine learning model. To address this uncertainty, we propose BayesGrad, utilizing the Bayesian predictive distribution, to define the importance of each node in an input graph, which is computed efficiently using the dropout technique. We demonstrate that BayesGrad successfully visualizes the substructures responsible for the label prediction in the artificial experiment, even when the sample size is small. Furthermore, we use a real dataset to evaluate the effectiveness of the visualization. The basic idea of BayesGrad is not limited to graph-structured data and can be applied to other data types.


Why Do You Spread This Message? Understanding Users Sentiment in Social Media Campaigns

AAAI Conferences

Twitter has been increasingly used for spreading messages about campaigns. Such campaigns try to gain followers through their Twitter accounts, influence the followers and spread messages through them. In this paper, we explore the relationship between followers’ sentiment towards the cam-paign topic and their rate of retweeting of messages gener-ated by the campaign. Our analysis with followers of mul-tiple social-media campaigns found statistical significant correlations between such sentiment and retweeting rate. Based on our analysis, we have conducted an online inter-vention study among the followers of different social-media campaigns. Our study shows that targeting followers based on their sentiment towards the campaign can give higher re-tweet rate than a number of other baseline approaches.


Thompson Sampling Based Monte-Carlo Planning in POMDPs

AAAI Conferences

Monte-Carlo tree search (MCTS) has been drawing great interest in recent years for planning under uncertainty. One of the key challenges is the trade-off between exploration and exploitation. To address this, we introduce a novel online planning algorithm for large POMDPs using Thompson sampling based MCTS that balances between cumulative and simple regrets. The proposed algorithm  Dirichlet-Dirichlet-NormalGamma based Partially Observable Monte-Carlo Planning (D 2 NG-POMCP) treats the accumulated reward of performing an action from a belief state in the MCTS search tree as a random variable following an unknown distribution with hidden parameters. Bayesian method is used to model and infer the posterior distribution of these parameters by choosing the conjugate prior in the form of a combination of two Dirichlet and one NormalGamma distributions. Thompson sampling is exploited to guide the action selection in the search tree. Experimental results confirmed that our algorithm outperforms the state-of-the-art approaches on several common benchmark problems.


Short-Term Wind-Speed Forecasting Using Kernel Spectral Hidden Markov Models

arXiv.org Machine Learning

In machine learning, a nonparametric forecasting algorithm for time series data has been proposed, called the kernel spectral hidden Markov model (KSHMM). In this paper, we propose a technique for short-term wind-speed prediction based on KSHMM. We numerically compared the performance of our KSHMMbased forecasting technique to other techniques with machine learning, using wind-speed data offered by the National Renewable Energy Laboratory. Our results demonstrate that, compared to these methods, the proposed technique offers comparable or better performance. Keywords: Wind-Speed Prediction, Kernel Methods, Kernel Mean Embedding, Spectral Learning, Hidden Markov Models. 1. Introduction Wind energy is one of the most attractive renewable energy sources.