AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Improving efficacy of AI models during times of business disruption

#artificialintelligenceNov-16-2020, 21:25:26 GMT

While most AI in use today can be classified as early-stage advanced analytics, some enterprises have built large data science teams to apply machine learning and deep learning algorithms to business processes. Many of these enterprises have built the necessary support infrastructure to train these algorithms on large data sets, deploying the resulting AI models to production to generate business insights. However, many consumer and business consumption patterns changed dramatically in 2020, causing these advanced AI models to fail or behave erratically. Many of these models that have been trained to make predictions based on historical data have not been able to deal with the data anomalies created by disruptive business conditions and changing preferences. Companies using AI for insights had a hard time making use of existing models in production.

ai model, algorithm, business disruption, (13 more...)

#artificialintelligence

Industry: Banking & Finance > Insurance (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.78)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Google Open Sourced this Architecture for Massively Scalable Reinforcement Learning Models

#artificialintelligenceNov-16-2020, 15:05:05 GMT

I recently started a new newsletter focus on AI education. TheSequence is a no-BS( meaning no hype, no news etc) AI-focused newsletter that takes 5 minutes to read. The goal is to keep you up to date with machine learning projects, research papers and concepts. Deep reinforcement learning(DRL) is one of the fastest areas of research in the deep learning space. Responsible for some of the top milestones in the recent years of AI such as AlphaGo, Dota2 Five or Alpha Star, DRL seems to be the discipline that approximates human intelligence the closest.

actor, architecture, learner, (12 more...)

#artificialintelligence

Industry:

Leisure & Entertainment > Games (0.55)
Information Technology (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.38)

Add feedback

NLPGym -- A toolkit for evaluating RL agents on Natural Language Processing Tasks

Ramamurthy, Rajkumar, Sifa, Rafet, Bauckhage, Christian

arXiv.org Artificial IntelligenceNov-16-2020

Reinforcement learning (RL) has recently shown impressive performance in complex game AI and robotics tasks. To a large extent, this is thanks to the availability of simulated environments such as OpenAI Gym, Atari Learning Environment, or Malmo which allow agents to learn complex tasks through interaction with virtual environments. While RL is also increasingly applied to natural language processing (NLP), there are no simulated textual environments available for researchers to apply and consistently benchmark RL on NLP tasks. With the work reported here, we therefore release NLPGym, an open-source Python toolkit that provides interactive textual environments for standard NLP tasks such as sequence tagging, multi-label classification, and question answering. We also present experimental results for 6 tasks using different RL algorithms which serve as baselines for further research. The toolkit is published at https://github.com/rajcscw/nlp-gym

agent, computer game, upstream oil & gas, (21 more...)

arXiv.org Artificial Intelligence

2011.08272

Country:

North America > United States (0.46)
Asia > China (0.28)
Europe > Sweden > Skåne County > Malmö (0.24)

Genre: Research Report (0.40)

Industry:

Banking & Finance > Economy (0.93)
Education (0.89)
Leisure & Entertainment > Games > Computer Games (0.68)
Energy > Oil & Gas > Upstream (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Meta Automatic Curriculum Learning

Portelas, Rémy, Romac, Clément, Hofmann, Katja, Oudeyer, Pierre-Yves

arXiv.org Artificial IntelligenceNov-16-2020

A major challenge in the Deep RL (DRL) community is to train agents able to generalize their control policy over situations never seen in training. Training on diverse tasks has been identified as a key ingredient for good generalization, which pushed researchers towards using rich procedural task generation systems controlled through complex continuous parameter spaces. In such complex task spaces, it is essential to rely on some form of Automatic Curriculum Learning (ACL) to adapt the task sampling distribution to a given learning agent, instead of randomly sampling tasks, as many could end up being either trivial or unfeasible. Since it is hard to get prior knowledge on such task spaces, many ACL algorithms explore the task space to detect progress niches over time, a costly tabula-rasa process that needs to be performed for each new learning agents, although they might have similarities in their capabilities profiles. To address this limitation, we introduce the concept of Meta-ACL, and formalize it in the context of black-box RL learners, i.e. algorithms seeking to generalize curriculum generation to an (unknown) distribution of learners. In this work, we present AGAIN, a first instantiation of Meta-ACL, and showcase its benefits for curriculum generation over classical ACL in multiple simulated environments including procedurally generated parkour environments with learners of varying morphologies. Videos and code are available at https://sites.google.com/view/meta-acl .

alp-gmm, experiment, student, (16 more...)

arXiv.org Artificial Intelligence

2011.08463

Country:

Europe > France (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Experimental Study (0.46)

Industry:

Education > Educational Setting (0.93)
Education > Educational Technology > Educational Software > Computer Based Training (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Reinforcement Learning of Graph Neural Networks for Service Function Chaining

Heo, DongNyeong, Lee, Doyoung, Kim, Hee-Gon, Park, Suhyun, Choi, Heeyoul

arXiv.org Artificial IntelligenceNov-16-2020

In the management of computer network systems, the service function chaining (SFC) modules play an important role by generating efficient paths for network traffic through physical servers with virtualized network functions (VNF). To provide the highest quality of services, the SFC module should generate a valid path quickly even in various network topology situations including dynamic VNF resources, various requests, and changes of topologies. The previous supervised learning method demonstrated that the network features can be represented by graph neural networks (GNNs) for the SFC task. However, the performance was limited to only the fixed topology with labeled data. In this paper, we apply reinforcement learning methods for training models on various network topologies with unlabeled data. In the experiments, compared to the previous supervised learning method, the proposed methods demonstrated remarkable flexibility in new topologies without re-designing and re-training, while preserving a similar level of performance.

network topology, node, topology, (13 more...)

arXiv.org Artificial Intelligence

2011.08406

Country: Asia > South Korea > Gyeongsangbuk-do > Pohang (0.05)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Dialog Simulation with Realistic Variations for Training Goal-Oriented Conversational Systems

Lin, Chien-Wei, Auvray, Vincent, Elkind, Daniel, Biswas, Arijit, Fazel-Zarandi, Maryam, Belgamwar, Nehal, Chandra, Shubhra, Zhao, Matt, Metallinou, Angeliki, Chung, Tagyoung, Zhu, Charlie Shucheng, Adhikari, Suranjit, Hakkani-Tur, Dilek

arXiv.org Artificial IntelligenceNov-16-2020

Goal-oriented dialog systems enable users to complete specific goals like requesting information about a movie or booking a ticket. Typically the dialog system pipeline contains multiple ML models, including natural language understanding, state tracking and action prediction (policy learning). These models are trained through a combination of supervised or reinforcement learning methods and therefore require collection of labeled domain specific datasets. However, collecting annotated datasets with language and dialog-flow variations is expensive, time-consuming and scales poorly due to human involvement. In this paper, we propose an approach for automatically creating a large corpus of annotated dialogs from a few thoroughly annotated sample dialogs and the dialog schema. Our approach includes a novel goal-sampling technique for sampling plausible user goals and a dialog simulation technique that uses heuristic interplay between the user and the system (Alexa), where the user tries to achieve the sampled goal. We validate our approach by generating data and training three different downstream conversational ML models. We achieve 18 ? 50% relative accuracy improvements on a held-out test set compared to a baseline dialog generation approach that only samples natural language and entity value variations from existing catalogs but does not generate any novel dialog flow variations. We also qualitatively establish that the proposed approach is better than the baseline. Moreover, several different conversational experiences have been built using this method, which enables customers to have a wide variety of conversations with Alexa.

dialog, seed dialog, variation, (17 more...)

arXiv.org Artificial Intelligence

2011.08243

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > Santa Clara County > Sunnyvale (0.05)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Genre: Research Report (0.64)

Industry: Media > Film (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.87)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.83)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.54)

Add feedback

Hierarchical clustering in particle physics through reinforcement learning

Brehmer, Johann, Macaluso, Sebastian, Pappadopulo, Duccio, Cranmer, Kyle

arXiv.org Artificial IntelligenceNov-16-2020

Particle physics experiments often require the reconstruction of decay patterns through a hierarchical clustering of the observed final-state particles. We show that this task can be phrased as a Markov Decision Process and adapt reinforcement learning algorithms to solve it. In particular, we show that Monte-Carlo Tree Search guided by a neural policy can construct high-quality hierarchical clusterings and outperform established greedy and beam search baselines.

aaaa4hicpvlbb9y4fz7d3rbuld 1x3h1kiaxyzhm, latexit latexit sha1, osca zyfsiwf7 wy0yglfwcdcaenkaxscre5t2z utarou1xygu9to, (10 more...)

arXiv.org Artificial Intelligence

2011.08191

Country:

North America > United States > New York (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

ACDER: Augmented Curiosity-Driven Experience Replay

Li, Boyao, Lu, Tao, Li, Jiayi, Lu, Ning, Cai, Yinghao, Wang, Shuo

arXiv.org Artificial IntelligenceNov-16-2020

Exploration in environments with sparse feedback remains a challenging research problem in reinforcement learning (RL). When the RL agent explores the environment randomly, it results in low exploration efficiency, especially in robotic manipulation tasks with high dimensional continuous state and action space. In this paper, we propose a novel method, called Augmented Curiosity-Driven Experience Replay (ACDER), which leverages (i) a new goal-oriented curiosity-driven exploration to encourage the agent to pursue novel and task-relevant states more purposefully and (ii) the dynamic initial states selection as an automatic exploratory curriculum to further improve the sample-efficiency. Our approach complements Hindsight Experience Replay (HER) by introducing a new way to pursue valuable states. Experiments conducted on four challenging robotic manipulation tasks with binary rewards, including Reach, Push, Pick&Place and Multi-step Push. The empirical results show that our proposed method significantly outperforms existing methods in the first three basic tasks and also achieves satisfactory performance in multi-step robotic task learning.

agent, arxiv preprint arxiv, exploration, (12 more...)

arXiv.org Artificial Intelligence

2011.08027

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Leisure & Entertainment > Games > Computer Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Time-Efficient Mars Exploration of Simultaneous Coverage and Charging with Multiple Drones

Chang, Yuan, Yan, Chao, Liu, Xingyu, Wang, Xiangke, Zhou, Han, Xiang, Xiaojia, Tang, Dengqing

arXiv.org Artificial IntelligenceNov-16-2020

This paper presents a time-efficient scheme for Mars exploration by the cooperation of multiple drones and a rover. To maximize effective coverage of the Mars surface in the long run, a comprehensive framework has been developed with joint consideration for limited energy, sensor model, communication range and safety radius, which we call TIME-SC2 (TIme-efficient Mars Exploration of Simultaneous Coverage and Charging). First, we propose a multi-drone coverage control algorithm by leveraging emerging deep reinforcement learning and design a novel information map to represent dynamic system states. Second, we propose a near-optimal charging scheduling algorithm to navigate each drone to an individual charging slot, and we have proven that there always exists feasible solutions. The attractiveness of this framework not only resides on its ability to maximize exploration efficiency, but also on its high autonomy that has greatly reduced the non-exploring time. Extensive simulations have been conducted to demonstrate the remarkable performance of TIME-SC2 in terms of time-efficiency, adaptivity and flexibility.

drone, exploration, information map, (13 more...)

arXiv.org Artificial Intelligence

2011.07759

Country:

North America > United States > New York (0.04)
Asia > China (0.04)

Genre: Research Report (0.64)

Industry: Transportation (0.31)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.46)

Add feedback

2017

Neural Information Processing SystemsNov-15-2020, 16:00:02 GMT

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.63)

Add feedback