Plotting

Cyclades: Conflict-free Asynchronous Machine Learning

Neural Information Processing Systems

We present Cyclades, a general framework for parallelizing stochastic optimization algorithms in a shared memory setting. Cyclades is asynchronous during model updates, and requires no memory locking mechanisms, similar to Hogwild!-type algorithms. Unlike Hogwild!, Cyclades introduces no conflicts during parallel execution, and offers a black-box analysis for provable speedups across a large family of algorithms. Due to its inherent cache locality and conflict-free nature, our multi-core implementation of Cyclades consistently outperforms Hogwild!-type algorithms on sufficiently sparse datasets, leading to up to 40% speedup gains compared to Hogwild!, and up to 5\times gains over asynchronous implementations of variance reduction algorithms.


Iterative Refinement of the Approximate Posterior for Directed Belief Networks

Neural Information Processing Systems

Variational methods that rely on a recognition network to approximate the posterior of directed graphical models offer better inference and learning than previous methods. Recent advances that exploit the capacity and flexibility in this approach have expanded what kinds of models can be trained. However, as a proposal for the posterior, the capacity of the recognition network is limited, which can constrain the representational power of the generative model and increase the variance of Monte Carlo estimates. To address these issues, we introduce an iterative refinement procedure for improving the approximate posterior of the recognition network and show that training with the refined posterior is competitive with state-of-the-art methods. The advantages of refinement are further evident in an increased effective sample size, which implies a lower variance of gradient estimates.


Learning Supervised PageRank with Gradient-Based and Gradient-Free Optimization Methods

Neural Information Processing Systems

In this paper, we consider a non-convex loss-minimization problem of learning Supervised PageRank models, which can account for features of nodes and edges. We propose gradient-based and random gradient-free methods to solve this problem. Our algorithms are based on the concept of an inexact oracle and unlike the state-of-the-art gradient-based method we manage to provide theoretically the convergence rate guarantees for both of them. Finally, we compare the performance of the proposed optimization methods with the state of the art applied to a ranking task.


A Probabilistic Model of Social Decision Making based on Reward Maximization

Neural Information Processing Systems

A fundamental problem in cognitive neuroscience is how humans make decisions, act, and behave in relation to other humans. Here we adopt the hypothesis that when we are in an interactive social setting, our brains perform Bayesian inference of the intentions and cooperativeness of others using probabilistic representations. We employ the framework of partially observable Markov decision processes (POMDPs) to model human decision making in a social context, focusing specifically on the volunteer's dilemma in a version of the classic Public Goods Game. We show that the POMDP model explains both the behavior of subjects as well as neural activity recorded using fMRI during the game. The decisions of subjects can be modeled across all trials using two interpretable parameters.


Professor Forcing: A New Algorithm for Training Recurrent Networks

Neural Information Processing Systems

The Teacher Forcing algorithm trains recurrent networks by supplying observed sequence values as inputs during training and using the network's own one-step-ahead predictions to do multi-step sampling. We introduce the Professor Forcing algorithm, which uses adversarial domain adaptation to encourage the dynamics of the recurrent network to be the same when training the network and when sampling from the network over multiple time steps. We apply Professor Forcing to language modeling, vocal synthesis on raw waveforms, handwriting generation, and image generation. Empirically we find that Professor Forcing acts as a regularizer, improving test likelihood on character level Penn Treebank and sequential MNIST. We also find that the model qualitatively improves samples, especially when sampling for a large number of time steps.


Measuring the reliability of MCMC inference with bidirectional Monte Carlo

Neural Information Processing Systems

Markov chain Monte Carlo (MCMC) is one of the main workhorses of probabilistic inference, but it is notoriously hard to measure the quality of approximate posterior samples. This challenge is particularly salient in black box inference methods, which can hide details and obscure inference failures. In this work, we extend the recently introduced bidirectional Monte Carlo technique to evaluate MCMC-based posterior inference algorithms. By running annealed importance sampling (AIS) chains both from prior to posterior and vice versa on simulated data, we upper bound in expectation the symmetrized KL divergence between the true posterior distribution and the distribution of approximate samples. We integrate our method into two probabilistic programming languages, WebPPL and Stan, and validate it on several models and datasets.


Finite-Dimensional BFRY Priors and Variational Bayesian Inference for Power Law Models

Neural Information Processing Systems

Bayesian nonparametric methods based on the Dirichlet process (DP), gamma process and beta process, have proven effective in capturing aspects of various datasets arising in machine learning. However, it is now recognized that such processes have their limitations in terms of the ability to capture power law behavior. As such there is now considerable interest in models based on the Stable Processs (SP), Generalized Gamma process (GGP) and Stable-beta process (SBP). In analogy to tractable processes such as the finite-dimensional Dirichlet process, we describe a class of random processes, we call iid finite-dimensional BFRY processes, that enables one to begin to develop efficient posterior inference algorithms such as variational Bayes that readily scale to massive datasets. For illustrative purposes, we describe a simple variational Bayes algorithm for normalized SP mixture models, and demonstrate its usefulness with experiments on synthetic and real-world datasets.


If Ted Talks are getting shorter, what does that say about our attention spans?

The Guardian

Age: Ted started in 1984. And has Ted been talking ever since? I know, and they do the inspirational online talks. Correct, under the slogan "Ideas change everything". She was talking at the Hay festival, in Wales.


Voiceover artist calls on ScotRail to stop using her voice for AI announcements

BBC News

ReadSpeaker markets its products, including Iona, as an "AI voice generator," but it said all of its programmes are based on "human voice talent". The firm uses a text-to-speech model, that means a user can type anything and Iona will read it out loud. The technology uses artificial intelligence learning but AI needs something to learn from. In this instance, it is voice recordings of an accent or language it is trying to emulate. In response to the complaints, the tech firm said: "ReadSpeaker is aware of Ms Potter's concerns, and has comprehensively addressed these with Ms Potter's legal representative several times in the past."


Jasmine Crockett shares bizarre song clip calling herself 'leader of the future'

FOX News

Texas Rep. Jasmine Crockett attacked President Donald Trump's West Point address on MSNBC and called it proof of his unfitness as commander in chief. Rep. Jasmine Crockett, D-Texas, appears to be leaning in on her rising political stardom this week, briefly sharing what appeared to be a fan-made song that referred to the Democratic firebrand as the "leader of the future." "Jasmine Crockett, she rises with the dawn. Fighting for justice, her light will never be gone," the song went. Infectious with passion, she'll never bow down.