Building Drones--for the Children?

The New Yorker

A couple of months ago, Vice-President J. D. Vance made an appearance in Washington at the American Dynamism summit, an annual event put on by the venture-capital firm Andreessen Horowitz. Members of Congress, startup founders, investors, and Defense Department officials sat in the audience. They gave Vance a standing ovation as he walked onstage, while Alabama's "Forty Hour Week (For a Livin')" played in the background. "You're here, I hope, because you love your country," Vance told the crowd. "You love its people, the opportunities that it's given you, and you recognize that building things--our capacity to create new innovation in the economy--cannot be a race to the bottom."


An interview with Larry Niven – Ringworld author and sci-fi legend

New Scientist

Larry Niven is one of the biggest names in the history of science fiction, and it was a privilege to interview him via Zoom at his home in Los Angeles recently. His 1970 novel Ringworld is the latest pick for the New Scientist Book Club, but he has also written a whole space-fleet-load of novels and short stories over the years, including my favourite sci-fi of all time, A World Out of Time. At 87 years of age, he is very much still writing. I spoke to him about Ringworld, his start in sci-fi, his favourite work over the years, his current projects and whether he thinks humankind will ever leave this solar system. This is an edited version of our conversation.



Munchausen Reinforcement Learning

Neural Information Processing Systems

Bootstrapping is a core mechanism in Reinforcement Learning (RL). Most algorithms, based on temporal differences, replace the true value of a transiting state by their current estimate of this value. Yet, another estimate could be leveraged to bootstrap RL: the current policy. Our core contribution stands in a very simple idea: adding the scaled log-policy to the immediate reward. We show that slightly modifying Deep Q-Network (DQN) in that way provides an agent that is competitive with distributional methods on Atari games, without making use of distributional RL, n-step returns or prioritized replay. To demonstrate the versatility of this idea, we also use it together with an Implicit Quantile Network (IQN). The resulting agent outperforms Rainbow on Atari, installing a new State of the Art with very little modifications to the original algorithm. To add to this empirical study, we provide strong theoretical insights on what happens under the hood - implicit Kullback-Leibler regularization and increase of the action-gap.


A Appendix

Neural Information Processing Systems

A.1 Conventional Test-Time Augmentation Center-Crop is the standard test-time augmentation for most of computer vision tasks [56, 29, 5, 7, 18, 26, 52]. The Center-Crop first resizes an image to a fixed size and then crops the central area to make a predefined input size. We resize an image to 256 pixels and crop the central 224 pixels for ResNet-50 in ImageNet experiment, as the same way as [18, 26, 52]. In the case of CIFAR, all images in the dataset are 32 by 32 pixels; we use the original images without any modification at the test time. Horizontal-Flip is an ensemble method using the original image and the horizontally inverted image.


Learning Loss for Test-Time Augmentation

Neural Information Processing Systems

Data augmentation has been actively studied for robust neural networks. Most of the recent data augmentation methods focus on augmenting datasets during the training phase. At the testing phase, simple transformations are still widely used for test-time augmentation. This paper proposes a novel instance-level testtime augmentation that efficiently selects suitable transformations for a test input. Our proposed method involves an auxiliary module to predict the loss of each possible transformation given the input. Then, the transformations having lower predicted losses are applied to the input.


efficient instance-aware test-time augmentation method resulting in significant gains over previous approaches

Neural Information Processing Systems

We would like to thank you for your thorough evaluation, helpful suggestions, and comments. Figure 1: Comparison for the same 5 Crop Figure 2: Comparison for the same GPS transforms candidates on the clean ImageNet set using on the clean ImageNet set using ResNet-Test-time Relative Clean Corrupted set Corrupted Test-set ResNet-50. We trained our loss predictor for We trained our loss predictor on the five crop areas. Compared to the 5-crop ensemble, searched GPS policies to choose ones specific Center-Crop 1 24.14 78.93 75.42 choosing one transform by our method for each test instance. A detailed comparison will be included.


Offline Meta Reinforcement Learning - Identifiability Challenges and Effective Data Collection Strategies

Neural Information Processing Systems

Consider the following instance of the Offline Meta Reinforcement Learning (OMRL) problem: given the complete training logs of N conventional RL agents, trained on N different tasks, design a meta-agent that can quickly maximize reward in a new, unseen task from the same task distribution. In particular, while each conventional RL agent explored and exploited its own different task, the meta-agent must identify regularities in the data that lead to effective exploration/exploitation in the unseen task. Here, we take a Bayesian RL (BRL) view, and seek to learn a Bayes-optimal policy from the offline data. Building on the recent VariBAD BRL approach, we develop an off-policy BRL method that learns to plan an exploration strategy based on an adaptive neural belief estimate. However, learning to infer such a belief from offline data brings a new identifiability issue we term MDP ambiguity. We characterize the problem, and suggest resolutions via data collection and modification procedures. Finally, we evaluate our framework on a diverse set of domains, including difficult sparse reward tasks, and demonstrate learning of effective exploration behavior that is qualitatively different from the exploration used by any RL agent in the data. Our code is available online at https://github.com/Rondorf/BOReL.



Supplementary Material to " Sufficient dimension reduction for classification using principal optimal transport direction "

Neural Information Processing Systems

X. Without loss of generality, we assume S(B) = S Hence, to prove Theorem 1, it is sufficient to show that S(B) = S(Σ) holds. To verify S(B) = S(Σ), we only need to show the following two results hold: (I). We now begin with the statement (I). This completes the proof for Statement I. We then turn to Statement II. This leads to a contradiction with (H.2) where the structure dimension is r.