Country


Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation

Neural Information Processing Systems

In this work, we propose to apply trust region optimization to deep reinforcement learning using a recently proposed Kronecker-factored approximation to the curvature. We extend the framework of natural policy gradient and propose to optimize both the actor and the critic using Kronecker-factored approximate curvature (K-FAC) with trust region; hence we call our method Actor Critic using Kronecker-Factored Trust Region (ACKTR). To the best of our knowledge, this is the first scalable trust region natural gradient method for actor-critic methods. It is also the method that learns non-trivial tasks in continuous control as well as discrete control policies directly from raw pixel inputs. We tested our approach across discrete domains in Atari games as well as continuous domains in the Mu-JoCo environment. With the proposed methods, we are able to achieve higher rewards and a 2-to 3-fold improvement in sample efficiency on average, compared to previous state-of-the-art on-policy actor-critic methods.


Countering Feedback Delays in Multi-Agent Learning

Neural Information Processing Systems

We consider a model of game-theoretic learning based on online mirror descent (OMD) with asynchronous and delayed feedback information. Instead of focusing on specific games, we consider a broad class of continuous games defined by the general equilibrium stability notion, which we call λ-variational stability. Our first contribution is that, in this class of games, the actual sequence of play induced by OMD-based learning converges to Nash equilibria provided that the feedback delays faced by the players are synchronous and bounded. Subsequently, to tackle fully decentralized, asynchronous environments with (possibly) unbounded delays between actions and feedback, we propose a variant of OMD which we call delayed mirror descent (DMD), and which relies on the repeated leveraging of past information. With this modification, the algorithm converges to Nash equilibria with no feedback synchronicity assumptions and even when the delays grow superlinearly relative to the horizon of play.


Parallel Streaming Wasserstein Barycenters

Neural Information Processing Systems

Efficiently aggregating data from different sources is a challenging problem, particularly when samples from each source are distributed differently. These differences can be inherent to the inference task or present for other reasons: sensors in a sensor network may be placed far apart, affecting their individual measurements. Conversely, it is computationally advantageous to split Bayesian inference tasks across subsets of data, but data need not be identically distributed across subsets. One principled way to fuse probability distributions is via the lens of optimal transport: the Wasserstein barycenter is a single distribution that summarizes a collection of input measures while respecting their geometry. However, computing the barycenter scales poorly and requires discretization of all input distributions and the barycenter itself.


Learning Linear Dynamical Systems via Spectral Filtering

Neural Information Processing Systems

We present an efficient and practical algorithm for the online prediction of discrete-time linear dynamical systems with a symmetric transition matrix. We circumvent the non-convex optimization problem using improper learning: carefully overparameterize the class of LDSs by a polylogarithmic factor, in exchange for convexity of the loss functions. From this arises a polynomial-time algorithm with a near-optimal regret guarantee, with an analogous sample complexity bound for agnostic learning. Our algorithm is based on a novel filtering technique, which may be of independent interest: we convolve the time series with the eigenvectors of a certain Hankel matrix.


Kernel functions based on triplet comparisons

Neural Information Processing Systems

Given only information in the form of similarity triplets "Object A is more similar to object B than to object C" about a data set, we propose two ways of defining a kernel function on the data set. While previous approaches construct a lowdimensional Euclidean embedding of the data set that reflects the given similarity triplets, we aim at defining kernel functions that correspond to high-dimensional embeddings. These kernel functions can subsequently be used to apply any kernel method to the data set.


Inverse Filtering for Hidden Markov Models

Neural Information Processing Systems

This paper considers a number of related inverse filtering problems for hidden Markov models (HMMs). In particular, given a sequence of state posteriors and the system dynamics; i) estimate the corresponding sequence of observations, ii) estimate the observation likelihoods, and iii) jointly estimate the observation likelihoods and the observation sequence. We show how to avoid a computationally expensive mixed integer linear program (MILP) by exploiting the algebraic structure of the HMM filter using simple linear algebra operations, and provide conditions for when the quantities can be uniquely reconstructed. We also propose a solution to the more general case where the posteriors are noisily observed. Finally, the proposed inverse filtering algorithms are evaluated on real-world polysomnographic data used for automatic sleep segmentation.


WNBA investigation finds no evidence of hateful comments toward Angel Reese

FOX News

Fox News Flash top sports headlines are here. Check out what's clicking on Foxnews.com. The WNBA and the Indiana Fever announced that the allegations of "hateful comments" directed toward Angel Reese on May 17 were "not substantiated." Reese and her Chicago Sky faced the Fever and Caitlin Clark, and at one point, the two had to be separated after a flagrant foul by Clark against Reese. The association announced the next day that it would launch an investigation into the alleged comments.


If Ted Talks are getting shorter, what does that say about our attention spans?

The Guardian

Age: Ted started in 1984. And has Ted been talking ever since? I know, and they do the inspirational online talks. Correct, under the slogan "Ideas change everything". She was talking at the Hay festival, in Wales.


Jasmine Crockett shares bizarre song clip calling herself 'leader of the future'

FOX News

Texas Rep. Jasmine Crockett attacked President Donald Trump's West Point address on MSNBC and called it proof of his unfitness as commander in chief. Rep. Jasmine Crockett, D-Texas, appears to be leaning in on her rising political stardom this week, briefly sharing what appeared to be a fan-made song that referred to the Democratic firebrand as the "leader of the future." "Jasmine Crockett, she rises with the dawn. Fighting for justice, her light will never be gone," the song went. Infectious with passion, she'll never bow down.


These robot cats have glowing eyes and artificial heartbeats – and could help reduce stress in children

The Guardian

At Springwood library in the Blue Mountains, a librarian appears with a cat carrier in each hand. About 30 children gather around in a semicircle. Inside each carrier, a pair of beaming, sci-fi-like eyes peer out at the expectant crowd. "That is the funniest thing ever," one child says. The preschoolers have just finished reading The Truck Cat by Deborah Frenkel and Danny Snell for the annual National Simultaneous Storytime.