South America
Using Soft Actor-Critic for Low-Level UAV Control
Barros, Gabriel Moraes, Colombini, Esther Luna
Unmanned Aerial Vehicles (UAVs), or drones, have recently been used in several civil application domains from organ delivery to remote locations to wireless network coverage. These platforms, however, are naturally unstable systems for which many different control approaches have been proposed. Generally based on classic and modern control, these algorithms require knowledge of the robot's dynamics. However, recently, model-free reinforcement learning has been successfully used for controlling drones without any prior knowledge of the robot model. In this work, we present a framework to train the Soft Actor-Critic (SAC) algorithm to low-level control of a quadrotor in a go-to-target task. All experiments were conducted under simulation. With the experiments, we show that SAC can not only learn a robust policy, but it can also cope with unseen scenarios. Videos from the simulations are available in https://www.youtube.com/watch?v=9z8vGs0Ri5g and the code in https://github.com/larocs/SAC_uav.
How AI Could Feed The World's Hungry While Sustaining The Planet
A new crop of startups are developing AI systems to tackle challenges from climate change to ... [ ] COVID-19. Artificial Intelligence is transforming the world at a rapid and accelerating pace, offering huge potential, but also posing social and economic challenges. Human beings are naturally fearful of machines – this is a constant. Technological advancements tend to outpace cultural shifts. It has taken the shock of a global pandemic to accelerate the uptake of many technologies that have been around for at least a decade.
The act of remembering: a study in partially observable reinforcement learning
Icarte, Rodrigo Toro, Valenzano, Richard, Klassen, Toryn Q., Christoffersen, Phillip, Farahmand, Amir-massoud, McIlraith, Sheila A.
Reinforcement Learning (RL) agents typically learn memoryless policies---policies that only consider the last observation when selecting actions. Learning memoryless policies is efficient and optimal in fully observable environments. However, some form of memory is necessary when RL agents are faced with partial observability. In this paper, we study a lightweight approach to tackle partial observability in RL. We provide the agent with an external memory and additional actions to control what, if anything, is written to the memory. At every step, the current memory state is part of the agent's observation, and the agent selects a tuple of actions: one action that modifies the environment and another that modifies the memory. When the external memory is sufficiently expressive, optimal memoryless policies yield globally optimal solutions. Unfortunately, previous attempts to use external memory in the form of binary memory have produced poor results in practice. Here, we investigate alternative forms of memory in support of learning effective memoryless policies. Our novel forms of memory outperform binary and LSTM-based memory in well-established partially observable domains.
Fairness in Machine Learning: A Survey
As Machine Learning technologies become increasingly used in contexts that affect citizens, companies as well as researchers need to be confident that their application of these methods will not have unexpected social implications, such as bias towards gender, ethnicity, and/or people with disabilities. There is significant literature on approaches to mitigate bias and promote fairness, yet the area is complex and hard to penetrate for newcomers to the domain. This article seeks to provide an overview of the different schools of thought and approaches to mitigating (social) biases and increase fairness in the Machine Learning literature. It organises approaches into the widely accepted framework of pre-processing, in-processing, and post-processing methods, subcategorizing into a further 11 method areas. Although much of the literature emphasizes binary classification, a discussion of fairness in regression, recommender systems, unsupervised learning, and natural language processing is also provided along with a selection of currently available open source libraries. The article concludes by summarising open challenges articulated as four dilemmas for fairness research.
Action Guidance: Getting the Best of Sparse Rewards and Shaped Rewards for Real-time Strategy Games
Huang, Shengyi, Ontañón, Santiago
HRL is especially popular in RTS games with combinatorial action spaces (Pang et al., 2019; Ye et al., 2020). The most closely related work is perhaps Scheduled Auxiliary Control (SAC-X) (Riedmiller et al., 2018), which is an HRL algorithm that trains auxiliary agents to perform primitive actions with shaped rewards and a main agent to schedule the use of auxiliary agents with sparse rewards. However, our approach differs in the treatment of the main agent. Instead of learning to schedule auxiliary agents, our main agent learns to act in the entire action space by taking action guidance from the auxiliary agents. There are two intuitive benefits to our approach since our main agent learns in the full action space. First, during policy evaluation our main agent does not have to commit to a particular auxiliary agent to perform actions for a fixed number of time steps like it is usually done in SAC-X. Second, learning in the full action space means the main agent will less likely suffer from the definition of handcrafted sub-tasks, which could be incomplete or biased.
RODE: Learning Roles to Decompose Multi-Agent Tasks
Wang, Tonghan, Gupta, Tarun, Mahajan, Anuj, Peng, Bei, Whiteson, Shimon, Zhang, Chongjie
Role-based learning holds the promise of achieving scalable multi-agent learning by decomposing complex tasks using roles. However, it is largely unclear how to efficiently discover such a set of roles. To solve this problem, we propose to first decompose joint action spaces into restricted role action spaces by clustering actions according to their effects on the environment and other agents. Learning a role selector based on action effects makes role discovery much easier because it forms a bi-level learning hierarchy -- the role selector searches in a smaller role space and at a lower temporal resolution, while role policies learn in significantly reduced primitive action-observation spaces. We further integrate information about action effects into the role policies to boost learning efficiency and policy generalization. By virtue of these advances, our method (1) outperforms the current state-of-the-art MARL algorithms on 10 of the 14 scenarios that comprise the challenging StarCraft II micromanagement benchmark and (2) achieves rapid transfer to new environments with three times the number of agents. Demonstrative videos are available at https://sites.google.com/view/rode-marl .
SSMBA: Self-Supervised Manifold Based Data Augmentation for Improving Out-of-Domain Robustness
Ng, Nathan, Cho, Kyunghyun, Ghassemi, Marzyeh
Models that perform well on a training domain often fail to generalize to out-of-domain (OOD) examples. Data augmentation is a common method used to prevent overfitting and improve OOD generalization. However, in natural language, it is difficult to generate new examples that stay on the underlying data manifold. We introduce SSMBA, a data augmentation method for generating synthetic training examples by using a pair of corruption and reconstruction functions to move randomly on a data manifold. We investigate the use of SSMBA in the natural language domain, leveraging the manifold assumption to reconstruct corrupted text with masked language models. In experiments on robustness benchmarks across 3 tasks and 9 datasets, SSMBA consistently outperforms existing data augmentation methods and baseline models on both in-domain and OOD data, achieving gains of 0.8% accuracy on OOD Amazon reviews, 1.8% accuracy on OOD MNLI, and 1.4 BLEU on in-domain IWSLT14 German-English.
Off-Policy Multi-Agent Decomposed Policy Gradients
Wang, Yihan, Han, Beining, Wang, Tonghan, Dong, Heng, Zhang, Chongjie
Multi-agent policy gradient (MAPG) methods recently witness vigorous progress. However, there is a significant performance discrepancy between MAPG methods and state-of-the-art multi-agent value-based approaches. In this paper, we investigate causes that hinder the performance of MAPG algorithms and present a multi-agent decomposed policy gradient method (DOP). This method introduces the idea of value function decomposition into the multi-agent actor-critic framework. Based on this idea, DOP supports efficient off-policy learning and addresses the issue of centralized-decentralized mismatch and credit assignment in both discrete and continuous action spaces. We formally show that DOP critics have sufficient representational capability to guarantee convergence. In addition, empirical evaluations on the StarCraft II micromanagement benchmark and multi-agent particle environments demonstrate that DOP significantly outperforms both state-of-the-art value-based and policy-based multi-agent reinforcement learning algorithms. Demonstrative videos are available at https://sites.google.com/view/dop-mapg/.
The future of life insurance: Reimagining the industry for the decade ahead
The global life insurance industry has seen significant changes over the past decade. Developing economies--predominantly emerging markets in Asia that were formerly small contributors--have become global growth drivers and now account for more than half of global premium growth (Exhibit 1) and 84 percent of individual annuities growth (Exhibit 2). The availability of data has skyrocketed, and insurers have made progress in advanced analytics and artificial intelligence. Digital and mobile advances have raised the bar on transparency and service quality: customers can now file claims and access agents, insurance quotes, and policy information with a few taps on a screen. The past decade has also introduced new challenges. Life insurers have not benefitted from the bull market (Exhibit 3). Global penetration fell to 3 percent, and premium growth within most developed markets, hovering just below 2 percent per year, struggled to match GDP.
Getting started with NLP.js
Ever wanted to build a chatbot and encountered some blockers along the way relating to data privacy or supported languages? Do you wish to reduce chatbot response time or run them without an active data connection? If that's the case or if you're just curious and want to learn more, give NLP.js a try. Natural Language Processing or NLP is a field combining linguistics and computing, as well as artificial intelligence. Correctly understanding natural language is critical for virtual assistants, chatbots, voice assistants, and a wide range of applications based on a voice or text interface with a machine.