Goto

Collaborating Authors

 Information Technology


Toward Multimodal Image-to-Image Translation

Neural Information Processing Systems

Many image-to-image translation problems are ambiguous, as a single input image may correspond to multiple possible outputs. In this work, we aim to model a distribution of possible outputs in a conditional generative modeling setting. The ambiguity of the mapping is distilled in a low-dimensional latent vector, which can be randomly sampled at test time. A generator learns to map the given input, combined with this latent code, to the output. We explicitly encourage the connection between output and the latent code to be invertible. This helps prevent a many-to-one mapping from the latent code to the output during training, also known as the problem of mode collapse, and produces more diverse results. We explore several variants of this approach by employing different training objectives, network architectures, and methods of injecting the latent code. Our proposed method encourages bijective consistency between the latent encoding and output modes. We present a systematic comparison of our method and other variants on both perceptual realism and diversity.


Learning with Average Top-k Loss

Neural Information Processing Systems

Furthermore, it remains a convex function over all individual losses, which can lead to convex optimization problems that can be solved effectively with conventional gradient-based methods.


Using Options and Covariance Testing for Long Horizon Off-Policy Policy Evaluation

Neural Information Processing Systems

Evaluating a policy by deploying it in the real world can be risky and costly. Off-policy policy evaluation (OPE) algorithms use historical data collected from running a previous policy to evaluate a new policy, which provides a means for evaluating a policy without requiring it to ever be deployed. Importance sampling is a popular OPE method because it is robust to partial observability and works with continuous states and actions. However, the amount of historical data required by importance sampling can scale exponentially with the horizon of the problem: the number of sequential decisions that are made. We propose using policies over temporally extended actions, called options, and show that combining these policies with importance sampling can significantly improve performance for long-horizon problems. In addition, we can take advantage of special cases that arise due to options-based policies to further improve the performance of importance sampling. We further generalize these special cases to a general covariance testing rule that can be used to decide which weights to drop in an IS estimate, and derive a new IS algorithm called Incremental Importance Sampling that can provide significantly more accurate estimates for a broad class of domains.


U.K. envoy urges transatlantic tech alliance, cites China threat

The Japan Times

The U.S. and its allies across the Atlantic must forge a technology partnership and win the artificial intelligence race even as China makes steady advances, the U.K.'s envoy in Washington said. Ambassador Peter Mandelson warned of the consequences if China continues to get ahead in AI and other key technologies. "They will be able to do things which cascade down not just to their own country but everyone else's across the world," Mandelson said at an event hosted by the Atlantic Council in Washington on Tuesday. "There is nothing I fear more in this world than China winning the race for technological dominance."


Deconvolutional Paragraph Representation Learning

Neural Information Processing Systems

Learning latent representations from long text sequences is an important first step in many natural language processing applications. Recurrent Neural Networks (RNNs) have become a cornerstone for this challenging task.


Lockheed Martin CEO shares path to making Trump's 'Golden Dome' missile shield a reality

FOX News

Lockheed Martin CEO Jim Taiclet weighs in on the Trump administration's Golden Dome defense system announcement on'Special Report.' Lockheed Martin CEO Jim Taiclet said President Donald Trump's proposed "Golden Dome" missile shield for the United States is a "fantastic vision" for the country as defense contracting companies work to implement the commander-in-chief's bold idea by the end of his term. "We'll be able to use the Golden Dome concept to make sure the country is increasingly protected against hypersonic threats," Taiclet said in an exclusive interview Tuesday on "Special Report." Trump unveiled his ambitious missile defense plan at the White House last week, which he says will be operational by the time he leaves office. The announcement comes as the United States faces growing threats from adversaries around the world who are making significant inroads in artificial intelligence and drone technology.


Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation

Neural Information Processing Systems

In this work, we propose to apply trust region optimization to deep reinforcement learning using a recently proposed Kronecker-factored approximation to the curvature. We extend the framework of natural policy gradient and propose to optimize both the actor and the critic using Kronecker-factored approximate curvature (K-FAC) with trust region; hence we call our method Actor Critic using Kronecker-Factored Trust Region (ACKTR). To the best of our knowledge, this is the first scalable trust region natural gradient method for actor-critic methods. It is also the method that learns non-trivial tasks in continuous control as well as discrete control policies directly from raw pixel inputs. We tested our approach across discrete domains in Atari games as well as continuous domains in the Mu-JoCo environment. With the proposed methods, we are able to achieve higher rewards and a 2-to 3-fold improvement in sample efficiency on average, compared to previous state-of-the-art on-policy actor-critic methods.


Countering Feedback Delays in Multi-Agent Learning

Neural Information Processing Systems

We consider a model of game-theoretic learning based on online mirror descent (OMD) with asynchronous and delayed feedback information. Instead of focusing on specific games, we consider a broad class of continuous games defined by the general equilibrium stability notion, which we call ฮป-variational stability. Our first contribution is that, in this class of games, the actual sequence of play induced by OMD-based learning converges to Nash equilibria provided that the feedback delays faced by the players are synchronous and bounded. Subsequently, to tackle fully decentralized, asynchronous environments with (possibly) unbounded delays between actions and feedback, we propose a variant of OMD which we call delayed mirror descent (DMD), and which relies on the repeated leveraging of past information. With this modification, the algorithm converges to Nash equilibria with no feedback synchronicity assumptions and even when the delays grow superlinearly relative to the horizon of play.


Parallel Streaming Wasserstein Barycenters

Neural Information Processing Systems

Efficiently aggregating data from different sources is a challenging problem, particularly when samples from each source are distributed differently. These differences can be inherent to the inference task or present for other reasons: sensors in a sensor network may be placed far apart, affecting their individual measurements. Conversely, it is computationally advantageous to split Bayesian inference tasks across subsets of data, but data need not be identically distributed across subsets. One principled way to fuse probability distributions is via the lens of optimal transport: the Wasserstein barycenter is a single distribution that summarizes a collection of input measures while respecting their geometry. However, computing the barycenter scales poorly and requires discretization of all input distributions and the barycenter itself.


Learning Linear Dynamical Systems via Spectral Filtering

Neural Information Processing Systems

We present an efficient and practical algorithm for the online prediction of discrete-time linear dynamical systems with a symmetric transition matrix. We circumvent the non-convex optimization problem using improper learning: carefully overparameterize the class of LDSs by a polylogarithmic factor, in exchange for convexity of the loss functions. From this arises a polynomial-time algorithm with a near-optimal regret guarantee, with an analogous sample complexity bound for agnostic learning. Our algorithm is based on a novel filtering technique, which may be of independent interest: we convolve the time series with the eigenvectors of a certain Hankel matrix.