AITopics

Chen, Changan, Majumder, Sagnik, Al-Halah, Ziad, Gao, Ruohan, Ramakrishnan, Santhosh Kumar, Grauman, Kristen

Audio-Visual Waypoints for Navigation

arXiv.org Artificial IntelligenceAug-21-2020

In audio-visual navigation, an agent intelligently travels through a complex, unmapped 3D environment using both sights and sounds to find a sound source (e.g., a phone ringing in another room). Existing models learn to act at a fixed granularity of agent motion and rely on simple recurrent aggregations of the audio observations. We introduce a reinforcement learning approach to audio-visual navigation with two key novel elements 1) audio-visual waypoints that are dynamically set and learned end-to-end within the navigation policy, and 2) an acoustic memory that provides a structured, spatially grounded record of what the agent has heard as it moves. Both new ideas capitalize on the synergy of audio and visual data for revealing the geometry of an unmapped space. We demonstrate our approach on the challenging Replica environments of real-world 3D scenes. Our model improves the state of the art by a substantial margin, and our experiments reveal that learning the links between sights, sounds, and space is essential for audio-visual navigation.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

2008.09622

Country: Europe > France > Hauts-de-France > Nord > Lille (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

#artificialintelligenceAug-20-2020, 06:40:44 GMT

Reinforcement Learning with Quantum Variational Circuits

The general formulation of reinforcement learning can be defined by an agent interacting with an environment attempting to maximize its reward function. This is often formulated as a Markov Decision Process (MDP).

machine learning, quantum variational circuit, reinforcement learning, (2 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Zhou, Tianze, Zhang, Fubiao, Wang, Chenfei

Multi-Agent Reinforcement Learning with Graph Clustering

In this paper, we introduce the group concept into multi-agent reinforcement learning. In this method, agents are divided into several groups and each group completes a specific subtask so that agents can cooperate to complete the main task. Existing methods use the communication vector to exchange information between agents. This may encounter communication redundancy. To solve this problem, we propose a MARL method based on graph clustering. It allows agents to adaptively learn group features and replaces the communication operation. In our method, agent features are divide into two types, including in-group features and individual features. They represent the generality and differences between agents, respectively. Based on the graph attention network(GAT), we introduce the graph clustering method as a punishment to optimize agent group feature. Then these features are used to generate individual Q value. To overcome the consistent problem brought by GAT, we introduce the split loss to distinguish agent features. Our method is easy to convert into the CTDE framework via using Kullback-Leibler divergence method. Empirical results are evaluated on a challenging set of StarCraft II micromanagement tasks. The result shows that our method outperforms existing multi-agent reinforcement learning methods and the performance increases with the number of agents increasing.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

2008.08808

Country: Asia > China > Beijing > Beijing (0.05)

Genre: Research Report > New Finding (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Expressing Diverse Human Driving Behavior with Probabilistic Rewards and Online Inference

Sun, Liting, Wu, Zheng, Ma, Hengbo, Tomizuka, Masayoshi

In human-robot interaction (HRI) systems, such as autonomous vehicles, understanding and representing human behavior are important. Human behavior is naturally rich and diverse. Cost/reward learning, as an efficient way to learn and represent human behavior, has been successfully applied in many domains. Most of traditional inverse reinforcement learning (IRL) algorithms, however, cannot adequately capture the diversity of human behavior since they assume that all behavior in a given dataset is generated by a single cost function.In this paper, we propose a probabilistic IRL framework that directly learns a distribution of cost functions in continuous domain. Evaluations on both synthetic data and real human driving data are conducted. Both the quantitative and subjective results show that our proposed framework can better express diverse human driving behaviors, as well as extracting different driving styles that match what human participants interpret in our user study.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

2008.08812

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Transportation > Ground > Road (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)

Jothimurugan, Kishor, Alur, Rajeev, Bastani, Osbert

A Composable Specification Language for Reinforcement Learning Tasks

Reinforcement learning is a promising approach for learning control policies for robot tasks. However, specifying complex tasks (e.g., with multiple objectives and safety constraints) can be challenging, since the user must design a reward function that encodes the entire task. Furthermore, the user often needs to manually shape the reward to ensure convergence of the learning algorithm. We propose a language for specifying complex control tasks, along with an algorithm that compiles specifications in our language into a reward function and automatically performs reward shaping. We implement our approach in a tool called SPECTRL, and show that it outperforms several state-of-the-art baselines.

machine learning, reinforcement learning, specification, (18 more...)

2008.09293

Country:

North America > United States > Pennsylvania (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Heuillet, Alexandre, Couthouis, Fabien, Díaz-Rodríguez, Natalia

Explainability in Deep Reinforcement Learning

During the past decade, Artificial Intelligence (AI), and by extension Machine Learning (ML), have seen an unprecedented rise in both industry and research. The progressive improvement of computer hardware associated with the need to process larger and larger amounts of data made these underestimated techniques shine under a new light. Reinforcement Learning (RL) focuses on learning how to map situations to actions, in order to maximize a numerical reward signal [102]. The learner is not told which actions to take, but instead must discover which actions are the most rewarding by trying them. Reinforcement learning addresses the problem of how agents should learn a policy that take actions to maximize the cumulative reward through interaction with the environment [31]. Recent progress in Deep Learning (DL) for learning feature representations has significantly impacted RL, and the combination of both methods (known as deep RL) has led to remarkable results in a lot of areas. Typically, RL is used to solve optimisation problems when the system has a very large number of states and has a complex stochastic structure. Notable examples include training agents to play Atari games based on raw pixels [75, 76], board games [96, 97], complex real-world robotics problems such as manipulation [8] or grasping [54] and other real-world applications such as resource management in computer clusters [72], network traffic signal control [9], chemical reactions optimization [117] or recommendation systems [116].

artificial intelligence, machine learning, reinforcement learning, (13 more...)

2008.06693

Country:

North America > United States (0.14)
Europe > France (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.86)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Papagiannis, Georgios, Li, Yunpeng

Imitation Learning with Sinkhorn Distances

arXiv.org Machine LearningAug-20-2020

Imitation learning algorithms have been interpreted as variants of divergence minimization problems. The ability to compare occupancy measures between experts and learners is crucial in their effectiveness in learning from demonstrations. In this paper, we present tractable solutions by formulating imitation learning as minimization of the Sinkhorn distance between occupancy measures. The formulation combines the valuable properties of optimal transport metrics in comparing non-overlapping distributions with a cosine distance cost defined in an adversarially learned feature space. This leads to a highly discriminative critic network and optimal transport plan that subsequently guide imitation learning. We evaluate the proposed approach using both the reward metric and the Sinkhorn distance metric on a number of MuJoCo experiments.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

2008.09167

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > Colorado > Denver County > Denver (0.04)
(23 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Perepu, Satheesh K., Balaji, Bala Shyamala, Tanneru, Hemanth Kumar, Kathari, Sudhakar, Pinnamaraju, Vivek Shankar

Reinforcement Learning based dynamic weighing of Ensemble Models for Time Series Forecasting

arXiv.org Machine LearningAug-20-2020

Ensemble models are powerful model building tools that are developed with a focus to improve the accuracy of model predictions. They find applications in time series forecasting in varied scenarios including but not limited to process industries, health care, and economics where a single model might not provide optimal performance. It is known that if models selected for data modelling are distinct (linear/non-linear, static/dynamic) and independent (minimally correlated models), the accuracy of the predictions is improved. Various approaches suggested in the literature to weigh the ensemble models use a static set of weights. Due to this limitation, approaches using a static set of weights for weighing ensemble models cannot capture the dynamic changes or local features of the data effectively. To address this issue, a Reinforcement Learning (RL) approach to dynamically assign and update weights of each of the models at different time instants depending on the nature of data and the individual model predictions is proposed in this work. The RL method implemented online, essentially learns to update the weights and reduce the errors as the time progresses. Simulation studies on time series data showed that the dynamic weighted approach using RL learns the weight better than existing approaches. The accuracy of the proposed method is compared with an existing approach of online Neural Network tuning quantitatively through normalized mean square error(NMSE) values.

machine learning, prediction, reinforcement learning, (15 more...)

arXiv.org Machine Learning

2008.08878

Country:

Asia > India > Tamil Nadu > Chennai (0.04)
Asia > India > Karnataka > Bengaluru (0.04)
Asia > India > Jharkhand > Dhanbad (0.04)
Asia > India > Andhra Pradesh > Visakhapatnam (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.34)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Chena, Kehua, Wang, Hongcheng, Perezc, Borja Valverde, Vezzaro, Luca, Wang, Aijie

Optimization of operation parameters towards sustainable WWTP based on deep reinforcement learning

arXiv.org Artificial IntelligenceAug-19-2020

A large amount of wastewater has been produced nowadays. Wastewater treatment plants (WWTPs) are designed to eliminate pollutants and alleviate environmental pollution resulting from human activities. However, the construction and operation of WWTPs still have negative impacts. WWTPs are complex to control and optimize because of high nonlinearity and variation. This study used a novel technique, multi-agent deep reinforcement learning (DRL), to optimize dissolved oxygen (DO) and dosage in a hypothetical WWTP. The reward function is specially designed as LCA-based form to achieve sustainability optimization. Four scenarios: baseline, LCA-oriented, cost-oriented and effluent-oriented are considered. The result shows that optimization based on LCA has lowest environmental impacts. The comparison of different SRT indicates that a proper SRT can reduce negative impacts greatly. It is worth mentioning that the retrofitting of WWTPs should be implemented with the consideration of other environmental impacts except cost. Moreover, the comparison between DRL and genetic algorithm (GA) indicates that DRL can solve optimization problems effectively and has great extendibility. In a nutshell, there are still limits and shortcomings of this work, future studies are required.

deep learning, neural network, optimization, (22 more...)

2008.10417

Country: Asia > China (0.95)

Genre: Research Report > New Finding (0.67)

Industry:

Water & Waste Management > Water Management > Water Supplies & Services (0.68)
Water & Waste Management > Water Management > Lifecycle > Treatment (0.36)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)