AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Data-driven deep reinforcement learning

RobohubDec-7-2019, 20:50:07 GMT

One of the primary factors behind the success of machine learning approaches in open world settings, such as image recognition and natural language processing, has been the ability of high-capacity deep neural network function approximators to learn generalizable models from large amounts of data. Deep reinforcement learning methods, however, require active online data collection, where the model actively interacts with its environment. This makes such methods hard to scale to complex real-world problems, where active data collection means that large datasets of experience must be collected for every experiment – this can be expensive and, for systems such as autonomous vehicles or robots, potentially unsafe. In a number of domains of practical interest, such as autonomous driving, robotics, and games, there exist plentiful amounts of previously collected interaction data which, consists of informative behaviours that are a rich source of prior information. Deep RL algorithms that can utilize such prior datasets will not only scale to real-world problems, but will also lead to solutions that generalize substantially better.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

Robohub

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

RoboNet: A dataset for large-scale multi-robot learning

RobohubDec-7-2019, 18:12:58 GMT

Note that any GIF compression artifacts in this animation are not present in the dataset itself. After collecting a diverse dataset, we experimentally investigate how it can be used to enable general skill learning that transfers to new environments. First, we pre-train visual dynamics models on a subset of data from RoboNet, and then fine-tune them to work in an unseen test environment using a small amount of new data. The constructed test environments (one of which is visualized below) all include different lab settings, new cameras and viewpoints, held-out robots, and novel objects purchased after data collection concluded. Example test environment constructed in a new lab, with a temporary uncalibrated camera, and a new Baxter robot.

dataset, robonet, test environment, (15 more...)

Robohub

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.77)

Add feedback

Facebook taught an AI the 'theory of mind'

#artificialintelligenceDec-7-2019, 10:56:59 GMT

When it comes to competitive games, AI systems have already shown they can easily mop the floor with the best humanity has to offer. But life in the real world isn't a zero sum game like poker or Starcraft and we need AI to work with us, not against us. That's why a research team from Facebook taught an AI how to play the cooperative card game Hanabi (the Japanese word for fireworks), to gain a better understanding of how humans think. Specifically, the Facebook team set out to instill upon its AI system the theory of mind. "Theory of mind is this idea of understanding the beliefs and intentions of other agents or other players or humans," Noam Brown, a researcher at Facebook AI, told Engadget.

facebook, searcher, teammate, (17 more...)

#artificialintelligence

Country: North America > United States > Texas (0.05)

Industry: Leisure & Entertainment > Games > Computer Games (0.50)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.32)

Add feedback

Spectrum Management in Dynamic Spectrum Access: A Deep Reinforcement Learning Approach

#artificialintelligenceDec-7-2019, 05:28:00 GMT

Generally, in dynamic spectrum access (DSA) networks, co-operations and centralized control are unavailable and DSA users have to carry out wireless transmissions individually. DSA users have to know other users' behaviors by sensing and analyzing wireless environments, so that DSA users can adjust their parameters properly and carry out effective wireless transmissions. In this thesis, machine learning and deep learning technologies are leveraged in DSA network to enable appropriate and intelligent spectrum managements, including both spectrum access and power allocations. Accordingly, a novel spectrum management framework utilizing deep reinforcement learning is proposed, in which deep reinforcement learning is employed to accurately learn wireless environments and generate optimal spectrum management strategies to adapt to the variations of wireless environments. Due to the model-free nature of reinforcement learning, DSA users only need to directly interact with environments to obtain optimal strategies rather than relying on accurate channel estimations.

deep reinforcement learning approach, dynamic spectrum access, spectrum management framework, (4 more...)

#artificialintelligence

Industry:

Telecommunications (1.00)
Information Technology > Networks (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.65)

Add feedback

No-Regret Exploration in Goal-Oriented Reinforcement Learning

Tarbouriech, Jean, Garcelon, Evrard, Valko, Michal, Pirotta, Matteo, Lazaric, Alessandro

arXiv.org Machine LearningDec-7-2019

Many popular reinforcement learning problems (e.g., navigation in a maze, some Atari games, mountain car) are instances of the so-called episodic setting or stochastic shortest path (SSP) problem, where an agent has to achieve a predefined goal state (e.g., the top of the hill) while maximizing the cumulative reward or minimizing the cumulative cost. Despite its popularity, most of the literature studying the exploration-exploitation dilemma either focused on different problems (i.e., fixed-horizon and infinite-horizon) or made the restrictive loop-free assumption (which implies that no same state can be visited twice during any episode). In this paper, we study the general SSP setting and introduce the algorithm UC-SSP whose regret scales as $\displaystyle \widetilde{O}(c_{\max}^{3/2} c_{\min}^{-1/2} D S \sqrt{ A D K})$ after $K$ episodes for any unknown SSP with $S$ non-terminal states, $A$ actions, an SSP-diameter of $D$ and positive costs in $[c_{\min}, c_{\max}]$. UC-SSP is thus the first learning algorithm with vanishing regret in the theoretically challenging setting of episodic RL.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

1912.03517

Country: Europe > France > Île-de-France > Paris > Paris (0.04)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games > Computer Games (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Hierarchical Cooperative Multi-Agent Reinforcement Learning with Skill Discovery

Yang, Jiachen, Borovikov, Igor, Zha, Hongyuan

arXiv.org Machine LearningDec-7-2019

Human players in professional team sports achieve high level coordination by dynamically choosing complementary skills and executing primitive actions to perform these skills. As a step toward creating intelligent agents with this capability for fully cooperative multi-agent settings, we propose a two-level hierarchical multi-agent reinforcement learning (MARL) algorithm with unsupervised skill discovery. Agents learn useful and distinct skills at the low level via independent Q-learning, while they learn to select complementary latent skill variables at the high level via centralized multi-agent training with an extrinsic team reward. The set of low-level skills emerges from an intrinsic reward that solely promotes the decodability of latent skill variables from the trajectory of a low-level skill, without the need for hand-crafted rewards for each skill. For scalable decentralized execution, each agent independently chooses latent skill variables and primitive actions based on local observations. Our overall method enables the use of general cooperative MARL algorithms for training high level policies and single-agent RL for training low level skills. Experiments on a stochastic high dimensional team game show the emergence of useful skills and cooperative team play. The interpretability of the learned skills show the promise of the proposed method for achieving human-AI cooperation in team sports games.

agent, latent skill, possession, (12 more...)

arXiv.org Machine Learning

1912.03558

Country: North America > United States > Massachusetts (0.04)

Genre: Research Report (0.50)

Industry:

Leisure & Entertainment > Sports (0.88)
Leisure & Entertainment > Games > Computer Games (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Improving Network Automation and Security with Artificial Intelligence - IT Peer Network

#artificialintelligenceDec-6-2019, 16:28:18 GMT

Communication service providers (CommSPs) are already saving money and generating revenue from network transformation investments. There is an expectation these benefits will continue to increase as NFV functions scale across the various elements of the infrastructure--enterprise, radio access network, wireless core, cable and cloud. New 5G and edge computing use cases promise to deliver new revenue along with even more data that must be moved, stored, processed and analyzed. The industry is looking to Artificial Intelligence (AI) and Machine Learning (ML) to enable CommSPs to solve problems and unlock value for their own business operations and their customers. As an example, distributed AI based on reinforcement learning will play a key role in building automated and self-managed networks.

artificial intelligence, network automation and security, power consumption, (12 more...)

#artificialintelligence

Industry:

Information Technology (0.73)
Telecommunications (0.71)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.56)

Add feedback

Risk-Averse Trust Region Optimization for Reward-Volatility Reduction

Bisi, Lorenzo, Sabbioni, Luca, Vittori, Edoardo, Papini, Matteo, Restelli, Marcello

arXiv.org Machine LearningDec-6-2019

In real-world decision-making problems, for instance in the fields of finance, robotics or autonomous driving, keeping uncertainty under control is as important as maximizing expected returns. Risk aversion has been addressed in the reinforcement learning literature through risk measures related to the variance of returns. However, in many cases, the risk is measured not only on a long-term perspective, but also on the step-wise rewards (e.g., in trading, to ensure the stability of the investment bank, it is essential to monitor the risk of portfolio positions on a daily basis). In this paper, we define a novel measure of risk, which we call reward volatility, consisting of the variance of the rewards under the state-occupancy measure. We show that the reward volatility bounds the return variance so that reducing the former also constrains the latter. We derive a policy gradient theorem with a new objective function that exploits the mean-volatility relationship, and develop an actor-only algorithm. Furthermore, thanks to the linearity of the Bellman equations defined under the new objective function, it is possible to adapt the well-known policy gradient algorithms with monotonic improvement guarantees such as TRPO in a risk-averse manner. Finally, we test the proposed approach in two simulated financial environments.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

1912.03193

Country:

Europe > Italy > Lombardy > Milan (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Italy > Piedmont > Turin Province > Turin (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Industry:

Banking & Finance > Trading (1.00)
Transportation > Ground > Road (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)

Add feedback

VALAN: Vision and Language Agent Navigation

Lansing, Larry, Jain, Vihan, Mehta, Harsh, Huang, Haoshuo, Ie, Eugene

arXiv.org Machine LearningDec-6-2019

VALAN is a lightweight and scalable software framework for deep reinforcement learning based on the SEED RL architecture. The framework facilitates the development and evaluation of embodied agents for solving grounded language understanding tasks, such as Vision-and-Language Navigation and Vision-and-Dialog Navigation, in photo-realistic environments, such as Matterport3D and Google StreetView. We have added a minimal set of abstractions on top of SEED RL allowing us to generalize the architecture to solve a variety of other RL problems. In this article, we will describe VALAN's software abstraction and architecture, and also present an example of using VALAN to design agents for instruction-conditioned indoor navigation.

agent, instruction, trajectory, (13 more...)

arXiv.org Machine Learning

1912.03241

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre:

Workflow (0.47)
Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Making Smart Homes Smarter: Optimizing Energy Consumption with Human in the Loop

Verma, Mudit, Bhambri, Siddhant, Buduru, Arun Balaji

arXiv.org Artificial IntelligenceDec-6-2019

Rapid advancements in the Internet of Things (IoT) have facilitated more efficient deployment of smart environment solutions for specific user requirement. With the increase in the number of IoT devices, it has become difficult for the user to control or operate every individual smart device into achieving some desired goal like optimized power consumption, scheduled appliance running time, etc. Furthermore, existing solutions to automatically adapt the IoT devices are not capable enough to incorporate the user behavior. This paper presents a novel approach to accurately configure IoT devices while achieving the twin objectives of energy optimization along with conforming to user preferences. Our work comprises of unsupervised clustering of devices' data to find the states of operation for each device, followed by probabilistically analyzing user behavior to determine their preferred states. Eventually, we deploy an online reinforcement learning (RL) agent to find the best device settings automatically. Results for three different smart homes' data-sets show the effectiveness of our methodology. To the best of our knowledge, this is the first time that a practical approach has been adopted to achieve the above mentioned objectives without any human interaction within the system.

consumption, domain state, power consumption, (15 more...)

arXiv.org Artificial Intelligence

1912.03298

Country:

Asia > India > NCT > New Delhi (0.04)
Asia > India > NCT > Delhi (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Information Technology > Smart Houses & Appliances (1.00)
Energy (1.00)

Technology:

Information Technology > Internet of Things (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)

Add feedback