Learning Graphical Models
Modeling Short Over-Dispersed Spike Count Data: A Hierarchical Parametric Empirical Bayes Framework
She, Qi, Jelfs, Beth, Chan, Rosa H. M.
In this letter, a Hierarchical Parametric Empirical Bayes model is proposed to model spike count data. We have integrated Generalized Linear Models (GLMs) and empirical Bayes theory to simultaneously provide three advantages: (1) a model of over-dispersion of spike count values; (2) reduced MSE in estimation when compared to using the maximum likelihood method for GLMs; and (3) an efficient alternative to inference with fully Bayes estimators. We apply the model to study both simulated data and experimental neural data from the retina. The simulation results indicate that the new model can estimate both the weights of connections among neural populations and the output firing rates (mean spike count) efficiently and accurately. The results from the retinal datasets show that the proposed model outperforms both standard Poisson and Negative Binomial GLMs in terms of the prediction log-likelihood of held-out datasets.
Need for DYNAMICAL Machine Learning: Bayesian exact recursive estimation
In my recent blog, Marrying Kalman Filtering & Machine Learning, we saw the merger of Bayesian exact recursive estimation (algorithm for which is Kalman Filter/Smoother in the linear, Gaussian case) and Machine Learning. We developed a solution called Kernel Projection Kalman Filter for business applications that require static or dynamical, dynamical or time-varying dynamical, linear or non-linear Machine Learning, i.e., pretty much all applications - therefore, Kernel Projection Kalman Filter is a "universal" solution . . . But who needs anything more than STATIC Machine Learning (ML)? Indeed, university courses in ML largely teach static ML. Given a set of inputs and outputs, find a static map between the two during supervised "Training" and use this static map for business purposes during "Operation" (which is called "Testing" during pre-operation evaluation).
A methodology for solving problems with DataScience for Internet of Things - Part One
Real-time systems differ in the way they perform analytics. Specifically, Real-time systems perform analytics on short time windows for Data Streams. Hence, the scope of Real Time analytics is a'window' which typically comprises of the last few time slots. Making Predictions on Real Time Data streams involves building an Offline model and applying it to a stream. Models incorporate one or more machine learning algorithms which are trained using the training Data.
Informative Planning and Online Learning with Sparse Gaussian Processes
Ma, Kai-Chieh, Liu, Lantao, Sukhatme, Gaurav S.
A big challenge in environmental monitoring is the spatiotemporal variation of the phenomena to be observed. To enable persistent sensing and estimation in such a setting, it is beneficial to have a time-varying underlying environmental model. Here we present a planning and learning method that enables an autonomous marine vehicle to perform persistent ocean monitoring tasks by learning and refining an environmental model. To alleviate the computational bottleneck caused by large-scale data accumulated, we propose a framework that iterates between a planning component aimed at collecting the most information-rich data, and a sparse Gaussian Process learning component where the environmental model and hyperparameters are learned online by taking advantage of only a subset of data that provides the greatest contribution. Our simulations with ground-truth ocean data shows that the proposed method is both accurate and efficient.
A Locally Adaptive Normal Distribution
Arvanitidis, Georgios, Hansen, Lars Kai, Hauberg, Søren
The multivariate normal density is a monotonic function of the distance to the mean, and its ellipsoidal shape is due to the underlying Euclidean metric. We suggest to replace this metric with a locally adaptive, smoothly changing (Riemannian) metric that favors regions of high local density. The resulting locally adaptive normal distribution (LAND) is a generalization of the normal distribution to the "manifold" setting, where data is assumed to lie near a potentially low-dimensional manifold embedded in $\mathbb{R}^D$. The LAND is parametric, depending only on a mean and a covariance, and is the maximum entropy distribution under the given metric. The underlying metric is, however, non-parametric. We develop a maximum likelihood algorithm to infer the distribution parameters that relies on a combination of gradient descent and Monte Carlo integration. We further extend the LAND to mixture models, and provide the corresponding EM algorithm. We demonstrate the efficiency of the LAND to fit non-trivial probability distributions over both synthetic data, and EEG measurements of human sleep.
Fast Learning of Clusters and Topics via Sparse Posteriors
Hughes, Michael C., Sudderth, Erik B.
Mixture models and topic models generate each observation from a single cluster, but standard variational posteriors for each observation assign positive probability to all possible clusters. This requires dense storage and runtime costs that scale with the total number of clusters, even though typically only a few clusters have significant posterior mass for any data point. We propose a constrained family of sparse variational distributions that allow at most $L$ non-zero entries, where the tunable threshold $L$ trades off speed for accuracy. Previous sparse approximations have used hard assignments ($L=1$), but we find that moderate values of $L>1$ provide superior performance. Our approach easily integrates with stochastic or incremental optimization algorithms to scale to millions of examples. Experiments training mixture models of image patches and topic models for news articles show that our approach produces better-quality models in far less time than baseline methods.
The Three Faces of Bayes
Last summer, I was at a conference having lunch with Hal Daume III when we got to talking about how "Bayesian" can be a funny and ambiguous term. It seems like the definition should be straightforward: "following the work of English mathematician Rev. Thomas Bayes," perhaps, or even "uses Bayes' theorem." But many methods bearing the reverend's name or using his theorem aren't even considered "Bayesian" by his most religious followers. Why is it that Bayesian networks, for example, aren't considered… y'know… Bayesian? As I've read more outside the fields of machine learning and natural language processing -- from psychometrics and environmental biology to hackers who dabble in data science -- I've noticed three broad uses of the term "Bayesian."
Markov Chain Monte Carlo Without all the Bullshit
I have a little secret: I don't like the terminology, notation, and style of writing in statistics. I find it unnecessarily complicated. This shows up when trying to read about Markov Chain Monte Carlo methods. Take, for example, the abstract to the Markov Chain Monte Carlo article in the Encyclopedia of Biostatistics. Markov chain Monte Carlo (MCMC) is a technique for estimating by simulation the expectation of a statistic in a complex model. Successive random selections form a Markov chain, the stationary distribution of which is the target distribution.
Hawkes Processes with Stochastic Excitations
Lee, Young, Lim, Kar Wai, Ong, Cheng Soon
We propose an extension to Hawkes processes by treating the levels of self-excitation as a stochastic differential equation. Our new point process allows better approximation in application domains where events and intensities accelerate each other with correlated levels of contagion. We generalize a recent algorithm for simulating draws from Hawkes processes whose levels of excitation are stochastic processes, and propose a hybrid Markov chain Monte Carlo approach for model fitting. Our sampling procedure scales linearly with the number of required events and does not require stationarity of the point process. A modular inference procedure consisting of a combination between Gibbs and Metropolis Hastings steps is put forward. We recover expectation maximization as a special case. Our general approach is illustrated for contagion following geometric Brownian motion and exponential Langevin dynamics.