Goto

Collaborating Authors

Chinese fishing 'militia' formations signal rising gray-zone pressure on Taiwan

FOX News

China's People's Armed Forces Maritime Militia deployed thousands of fishing vessels in coordinated formations that could disrupt global shipping lanes, analysts warn.


A New Study Details How Cats Almost Always Land on Their Feet

WIRED

The secret to this acrobatic skill lies in an extremely flexible part of the spine that allows cats to twist in the air and land safely. It's well established that when cats fall, they're able to land perfectly most of the time, nimbly maneuvering to right themselves before they hit the ground. Now, researchers at Japan's Yamaguchi University have advanced our understanding of this extraordinary ability, focusing on the mechanical properties of feline spines. What they found, as detailed in a recent study in the journal The Anatomical Record, is that those sure-footed landings are due in part to the fact that a cat's thoracic region is much more flexible than its lumbar region. While a cat's ability to rotate in the air without something to push again seems to defy the laws of physics, it's instead a complex righting maneuver.


On Learning Intrinsic Rewards for Policy Gradient Methods

Neural Information Processing Systems

In many sequential decision making tasks, it is challenging to design reward functions that help an RL agent efficiently learn behavior that is considered good by the agent designer. A number of different formulations of the reward-design problem, or close variants thereof, have been proposed in the literature. In this paper we build on the Optimal Rewards Framework of Singh et al. that defines the optimal intrinsic reward function as one that when used by an RL agent achieves behavior that optimizes the task-specifying or extrinsic reward function. Previous work in this framework has shown how good intrinsic reward functions can be learned for lookahead search based planning agents. Whether it is possible to learn intrinsic reward functions for learning agents remains an open problem. In this paper we derive a novel algorithm for learning intrinsic rewards for policy-gradient based learning agents. We compare the performance of an augmented agent that uses our algorithm to provide additive intrinsic rewards to an A2C-based policy learner (for Atari games) and a PPO-based policy learner (for Mujoco domains) with a baseline agent that uses the same policy learners but with only extrinsic rewards. Our results show improved performance on most but not all of the domains.


A Block Coordinate Ascent Algorithm for Mean-Variance Optimization

Neural Information Processing Systems

Risk management in dynamic decision problems is a primary concern in many fields, including financial investment, autonomous driving, and healthcare. The mean-variance function is one of the most widely used objective functions in risk management due to its simplicity and interpretability. Existing algorithms for mean-variance optimization are based on multi-time-scale stochastic approximation, whose learning rate schedules are often hard to tune, and have only asymptotic convergence proof. In this paper, we develop a model-free policy search framework for mean-variance optimization with finite-sample error bound analysis (to local optima). Our starting point is a reformulation of the original mean-variance function with its Fenchel dual, from which we propose a stochastic block coordinate ascent policy search algorithm. Both the asymptotic convergence guarantee of the last iteration's solution and the convergence rate of the randomly picked solution are provided, and their applicability is demonstrated on several benchmark domains.


Meta-Reinforcement Learning of Structured Exploration Strategies

Neural Information Processing Systems

Exploration is a fundamental challenge in reinforcement learning (RL). Many current exploration methods for deep RL use task-agnostic objectives, such as information gain or bonuses based on state visitation. However, many practical applications of RL involve learning more than a single task, and prior tasks can be used to inform how exploration should be performed in new tasks. In this work, we study how prior tasks can inform an agent about how to explore effectively in new situations. We introduce a novel gradient-based fast adaptation algorithm - model agnostic exploration with structured noise (MAESN) - to learn exploration strategies from prior experience. The prior experience is used both to initialize a policy and to acquire a latent exploration space that can inject structured stochasticity into a policy, producing exploration strategies that are informed by prior knowledge and are more effective than random action-space noise. We show that MAESN is more effective at learning exploration strategies when compared to prior meta-RL methods, RL without learned exploration strategies, and task-agnostic exploration methods. We evaluate our method on a variety of simulated tasks: locomotion with a wheeled robot, locomotion with a quadrupedal walker, and object manipulation.


Learning to Decompose and Disentangle Representations for Video Prediction

Neural Information Processing Systems

Our goal is to predict future video frames given a sequence of input frames. Despite large amounts of video data, this remains a challenging task because of the high-dimensionality of video frames. We address this challenge by proposing the Decompositional Disentangled Predictive Auto-Encoder (DDPAE), a framework that combines structured probabilistic models and deep networks to automatically (i) decompose the high-dimensional video that we aim to predict into components, and (ii) disentangle each component to have low-dimensional temporal dynamics that are easier to predict. Crucially, with an appropriately specified generative model of video frames, our DDPAE is able to learn both the latent decomposition and disentanglement without explicit supervision. For the Moving MNIST dataset, we show that DDPAE is able to recover the underlying components (individual digits) and disentanglement (appearance and location) as we would intuitively do. We further demonstrate that DDPAE can be applied to the Bouncing Balls dataset involving complex interactions between multiple objects to predict the video frame directly from the pixels and recover physical states without explicit supervision.


Deep Functional Dictionaries: Learning Consistent Semantic Structures on 3D Models from Functions

Neural Information Processing Systems

Various 3D semantic attributes such as segmentation masks, geometric features, keypoints, and materials can be encoded as per-point probe functions on 3D geometries. Given a collection of related 3D shapes, we consider how to jointly analyze such probe functions over different shapes, and how to discover common latent structures using a neural network -- even in the absence of any correspondence information. Our network is trained on point cloud representations of shape geometry and associated semantic functions on that point cloud. These functions express a shared semantic understanding of the shapes but are not coordinated in any way. For example, in a segmentation task, the functions can be indicator functions of arbitrary sets of shape parts, with the particular combination involved not known to the network. Our network is able to produce a small dictionary of basis functions for each shape, a dictionary whose span includes the semantic functions provided for that shape. Even though our shapes have independent discretizations and no functional correspondences are provided, the network is able to generate latent bases, in a consistent order, that reflect the shared semantic structure among the shapes. We demonstrate the effectiveness of our technique in various segmentation and keypoint selection applications.


FPV drone slams into US military base in Iraq

Al Jazeera

Could Iran be using China's BeiDou system? Iraq's Iranian-backed Kataib Hezbollah has released drone video from an attack on the US's Victory Base near Baghdad International Airport. It's believed to be the first time the group has successfully used the FPV attack drone to skirt US defences. Iran's Space Research Centre severely damaged in strikes Thousands in Madrid protest'forgotten' Gaza, warn Iran war may spiral into


Neural Voice Cloning with a Few Samples

Neural Information Processing Systems

Voice cloning is a highly desired feature for personalized speech interfaces. We introduce a neural voice cloning system that learns to synthesize a person's voice from only a few audio samples. We study two approaches: speaker adaptation and speaker encoding. Speaker adaptation is based on fine-tuning a multi-speaker generative model. Speaker encoding is based on training a separate model to directly infer a new speaker embedding, which will be applied to a multi-speaker generative model. In terms of naturalness of the speech and similarity to the original speaker, both approaches can achieve good performance, even with a few cloning audios. While speaker adaptation can achieve slightly better naturalness and similarity, cloning time and required memory for the speaker encoding approach are significantly less, making it more favorable for low-resource deployment.


Exiled Iranian crown prince says he's ready to lead Iran 'as soon as the Islamic Republic falls'

FOX News

Iranian Crown Prince Reza Pahlavi said he is ready to lead Iran's transition as soon as the Islamic Republic falls, announcing plans for a transitional system under his leadership.