Goto

Collaborating Authors

 Reinforcement Learning


Enhancing Text-based Reinforcement Learning Agents with Commonsense Knowledge

arXiv.org Artificial Intelligence

In this paper, we consider the recent trend of evaluating progress on reinforcement learning technology by using text-based environments and games as evaluation environments. This reliance on text brings advances in natural language processing into the ambit of these agents, with a recurring thread being the use of external knowledge to mimic and better human-level performance. We present one such instantiation of agents that use commonsense knowledge from ConceptNet to show promising performance on two text-based environments.


Salesforce's AI Economist taps reinforcement learning to generate optimal tax policies

#artificialintelligence

Salesforce today announced the AI Economist, a research environment designed to elucidate how economic design might be improved with techniques from the field of AI and machine learning. The goal is to help economists, governments, and others design tax policies that optimize not only productivity and conservation, but that promote widespread, whole-country social equality. Studies have shown that income inequality gaps can negatively impact economic growth, economic opportunity, and even health. For example, over-taxation can discourage people from working, leading to lower productivity. But it's difficult to experiment with tax policies in the real world, at least in part because economic theory relies on stylized assumptions that are tough to validate, like people's sensitivity to taxes. The AI Economist, then, learns the best tax policies from simulations in which citizens and a government adapt and learn.


Improving Robustness via Risk Averse Distributional Reinforcement Learning

arXiv.org Machine Learning

One major obstacle that precludes the success of reinforcement learning in real-world applications is the lack of robustness, either to model uncertainties or external disturbances, of the trained policies. Robustness is critical when the policies are trained in simulations instead of real world environment. In this work, we propose a risk-aware algorithm to learn robust policies in order to bridge the gap between simulation training and real-world implementation. Our algorithm is based on recently discovered distributional RL framework. We incorporate CVaR risk measure in sample based distributional policy gradients (SDPG) for learning risk-averse policies to achieve robustness against a range of system disturbances.


GCN-RL Circuit Designer: Transferable Transistor Sizing with Graph Neural Networks and Reinforcement Learning

arXiv.org Artificial Intelligence

Automatic transistor sizing is a challenging problem in circuit design due to the large design space, complex performance trade-offs, and fast technological advancements. Although there has been plenty of work on transistor sizing targeting on one circuit, limited research has been done on transferring the knowledge from one circuit to another to reduce the re-design overhead. In this paper, we present GCN-RL Circuit Designer, leveraging reinforcement learning (RL) to transfer the knowledge between different technology nodes and topologies. Moreover, inspired by the simple fact that circuit is a graph, we learn on the circuit topology representation with graph convolutional neural networks (GCN). The GCN-RL agent extracts features of the topology graph whose vertices are transistors, edges are wires. Our learning-based optimization consistently achieves the highest Figures of Merit (FoM) on four different circuits compared with conventional black-box optimization methods (Bayesian Optimization, Evolutionary Algorithms), random search, and human expert designs. Experiments on transfer learning between five technology nodes and two circuit topologies demonstrate that RL with transfer learning can achieve much higher FoMs than methods without knowledge transfer. Our transferable optimization method makes transistor sizing and design porting more effective and efficient.


Unsupervised Learning of KB Queries in Task Oriented Dialogs

arXiv.org Machine Learning

Task-oriented dialog (TOD) systems converse with users to accomplish a specific task. This task requires the system to query a knowledge base (KB) and use the retrieved results to fulfil user needs. Predicting the KB queries is crucial and can lead to severe under-performance if made incorrectly. KB queries are usually annotated in real-world datasets and are learnt using supervised approaches to achieve acceptable task completion. This need for query annotations prevents TOD systems from easily adapting to new domains. In this paper, we propose a novel problem of learning end-to-end TOD systems using dialogs that do not contain KB query annotations. Our approach first learns to predict the KB queries using reinforcement learning (RL) and then learns the end-to-end system using the predicted queries. However, predicting the correct query in TOD systems is uniquely plagued by correlated attributes, in which, due to data bias, certain attributes always occur together in the KB. This prevents the RL system to generalise and accuracy suffers as a result. We propose Correlated Attributes Resilient RL (CARRL), a modification to the RL gradient estimation, which mitigates the problem of correlated attributes and predicts KB queries better than existing weakly supervised approaches. Finally, we compare the performance of our end-to-end system trained using predicted queries to a system trained using annotated gold queries.


Sim-to-Real Transfer with Incremental Environment Complexity for Reinforcement Learning of Depth-Based Robot Navigation

arXiv.org Artificial Intelligence

Transferring learning-based models to the real world remains one of the hardest problems in model-free control theory. Due to the cost of data collection on a real robot and the limited sample efficiency of Deep Reinforcement Learning algorithms, models are usually trained in a simulator which theoretically provides an infinite amount of data. Despite offering unbounded trial and error runs, the reality gap between simulation and the physical world brings little guarantee about the policy behavior in real operation. Depending on the problem, expensive real fine-tuning and/or a complex domain randomization strategy may be required to produce a relevant policy. In this paper, a Soft-Actor Critic (SAC) training strategy using incremental environment complexity is proposed to drastically reduce the need for additional training in the real world. The application addressed is depth-based mapless navigation, where a mobile robot should reach a given waypoint in a cluttered environment with no prior mapping information. Experimental results in simulated and real environments are presented to assess quantitatively the efficiency of the proposed approach, which demonstrated a success rate twice higher than a naive strategy.


Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning

arXiv.org Artificial Intelligence

Learning a good representation is an essential component for deep reinforcement learning (RL). Representation learning is especially important in multitask and partially observable settings where building a representation of the unknown environment is crucial to solve the tasks. Here we introduce Prediction of Bootstrap Latents (PBL), a simple and flexible self-supervised representation learning algorithm for multitask deep RL. PBL builds on multistep predictive representations of future observations, and focuses on capturing structured information about environment dynamics. Specifically, PBL trains its representation by predicting latent embeddings of future observations. These latent embeddings are themselves trained to be predictive of the aforementioned representations. These predictions form a bootstrapping effect, allowing the agent to learn more about the key aspects of the environment dynamics. In addition, by defining prediction tasks completely in latent space, PBL provides the flexibility of using multimodal observations involving pixel images, language instructions, rewards and more. We show in our experiments that PBL delivers across-the-board improved performance over state of the art deep RL agents in the DMLab-30 and Atari-57 multitask setting.


Improving Target-driven Visual Navigation with Attention on 3D Spatial Relationships

arXiv.org Machine Learning

Embodied artificial intelligence (AI) tasks shift from tasks focusing on internet images to active settings involving embodied agents that perceive and act within 3D environments. In this paper, we investigate the target-driven visual navigation using deep reinforcement learning (DRL) in 3D indoor scenes, whose navigation task aims to train an agent that can intelligently make a series of decisions to arrive at a pre-specified target location from any possible starting positions only based on egocentric views. However, most navigation methods currently struggle against several challenging problems, such as data efficiency, automatic obstacle avoidance, and generalization. Generalization problem means that agent does not have the ability to transfer navigation skills learned from previous experience to unseen targets and scenes. To address these issues, we incorporate two designs into classic DRL framework: attention on 3D knowledge graph (KG) and target skill extension (TSE) module. On the one hand, our proposed method combines visual features and 3D spatial representations to learn navigation policy. On the other hand, TSE module is used to generate sub-targets which allow agent to learn from failures. Specifically, our 3D spatial relationships are encoded through recently popular graph convolutional network (GCN). Considering the real world settings, our work also considers open action and adds actionable targets into conventional navigation situations. Those more difficult settings are applied to test whether DRL agent really understand its task, navigating environment, and can carry out reasoning. Our experiments, performed in the AI2-THOR, show that our model outperforms the baselines in both SR and SPL metrics, and improves generalization ability across targets and scenes.


Scaling your AI-powered Battlesnake with distributed reinforcement learning in Amazon SageMaker Amazon Web Services

#artificialintelligence

Battlesnake is an AI competition in which you build AI-powered snakes. Battlesnake's rules are similar to the traditional snakes game. Your goal is to be the last surviving snake when competing against other snakes. Developers of all levels build snakes using techniques ranging from unique heuristic-based strategies to state-of-the-art deep reinforcement learning (RL) algorithms. You can use the SageMaker Battlesnake Starter Pack to build your own snake and compete in the Battlesnake arena.


AWS AI Tech Talk - Making Reinforcement Learning Practical w/ AWS DeepRacer

#artificialintelligence

Start date/time: May 1st, 10 AM PT / 1PM ET Description: Building machine learning-enabled products is hard for developers & data scientists; throw in a hardware component & the complexity increase exponentially. Lyndon Leggate introduces you to reinforcement learning & walks you through a step-by-step demonstration of how you developer can up level your RL skills through autonomous driving. Lyndon is a keen participant in the AWS DeepRacer league. Racing as Etaggel, he regularly positioned in the top 10 during the 2019 league, is featured in DeepRacer TV & in May 2019 established the AWS DeepRacer Community.