AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Feature Selection for Learning to Predict Outcomes of Compute Cluster Jobs with Application to Decision Support

Okanlawon, Adedolapo, Yang, Huichen, Bose, Avishek, Hsu, William, Andresen, Dan, Tanash, Mohammed

arXiv.org Artificial IntelligenceDec-14-2020

We present a machine learning framework and a new test bed for data mining from the Slurm Workload Manager for high-performance computing (HPC) clusters. The focus was to find a method for selecting features to support decisions: helping users decide whether to resubmit failed jobs with boosted CPU and memory allocations or migrate them to a computing cloud. This task was cast as both supervised classification and regression learning, specifically, sequential problem solving suitable for reinforcement learning. Selecting relevant features can improve training accuracy, reduce training time, and produce a more comprehensible model, with an intelligent system that can explain predictions and inferences. We present a supervised learning model trained on a Simple Linux Utility for Resource Management (Slurm) data set of HPC jobs using three different techniques for selecting features: linear regression, lasso, and ridge regression. Our data set represented both HPC jobs that failed and those that succeeded, so our model was reliable, less likely to overfit, and generalizable. Our model achieved an R^2 of 95\% with 99\% accuracy. We identified five predictors for both CPU and memory properties.

information, prediction, slurm data, (15 more...)

arXiv.org Artificial Intelligence

2012.07982

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > Kansas > Riley County > Manhattan (0.04)

Genre: Research Report (0.65)

Industry: Information Technology > Services (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)
(3 more...)

Add feedback

A Framework for Efficient Robotic Manipulation

Zhan, Albert, Zhao, Philip, Pinto, Lerrel, Abbeel, Pieter, Laskin, Michael

arXiv.org Artificial IntelligenceDec-14-2020

Abstract-- Data-efficient learning of manipulation policies from visual observations is an outstanding challenge for realrobot learning. While deep reinforcement learning (RL) algorithms have shown success learning policies from visual observations, they still require an impractical number of real-world data samples to learn effective policies. However, recent advances in unsupervised representation learning and data augmentation significantly improved the sample efficiency of training RL policies on common simulated benchmarks. Building on these advances, we present a Framework for Efficient Robotic Manipulation (FERM) that utilizes data augmentation and unsupervised learning to achieve extremely sample-efficient training of robotic manipulation policies with sparse rewards. We show that, given only 10 demonstrations, a single robotic arm can learn sparse-reward manipulation policies from pixels, such as reaching, picking, moving, pulling a large object, flipping a switch, and opening a drawer in just 15-50 minutes of real-world training time.

data augmentation, demonstration, learning, (12 more...)

arXiv.org Artificial Intelligence

2012.07975

Country:

Oceania > Australia > Queensland > Brisbane (0.04)
North America > United States > Missouri > St. Louis County > St. Louis (0.04)
North America > United States > Florida > Broward County > Fort Lauderdale (0.04)
(7 more...)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Relative Variational Intrinsic Control

Baumli, Kate, Warde-Farley, David, Hansen, Steven, Mnih, Volodymyr

arXiv.org Artificial IntelligenceDec-14-2020

In the absence of external rewards, agents can still learn useful behaviors by identifying and mastering a set of diverse skills within their environment. Existing skill learning methods use mutual information objectives to incentivize each skill to be diverse and distinguishable from the rest. However, if care is not taken to constrain the ways in which the skills are diverse, trivially diverse skill sets can arise. To ensure useful skill diversity, we propose a novel skill learning objective, Relative Variational Intrinsic Control (RVIC), which incentivizes learning skills that are distinguishable in how they change the agent's relationship to its environment. The resulting set of skills tiles the space of affordances available to the agent. We qualitatively analyze skill behaviors on multiple environments and show how RVIC skills are more useful than skills discovered by existing methods when used in hierarchical reinforcement learning.

agent, information, trajectory, (14 more...)

arXiv.org Artificial Intelligence

2012.07827

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (0.40)

Industry:

Leisure & Entertainment > Games (0.69)
Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

C-Learning: Horizon-Aware Cumulative Accessibility Estimation

Naderian, Panteha, Loaiza-Ganem, Gabriel, Braviner, Harry J., Caterini, Anthony L., Cresswell, Jesse C., Li, Tong, Garg, Animesh

arXiv.org Machine LearningDec-14-2020

Multi-goal reaching is an important problem in reinforcement learning needed to achieve algorithmic generalization. Despite recent advances in this field, current algorithms suffer from three major challenges: high sample complexity, learning only a single way of reaching the goals, and difficulties in solving complex motion planning tasks. In order to address these limitations, we introduce the concept of cumulative accessibility functions, which measure the reachability of a goal from a given state within a specified horizon. We show that these functions obey a recurrence relation, which enables learning from offline interactions. We also prove that optimal cumulative accessibility functions are monotonic in the planning horizon. Additionally, our method can trade off speed and reliability in goal-reaching by suggesting multiple paths to a single goal depending on the provided horizon. We evaluate our approach on a set of multi-goal discrete and continuous control tasks. We show that our method outperforms state-of-the-art goal-reaching algorithms in success rate, sample complexity, and path optimality. Additional visualizations can be found at https://sites.google.com/view/learning-cae/.

c-learning, probability, q-learning, (11 more...)

arXiv.org Machine Learning

2011.12363

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Q-Learning Algorithm: From Explanation to Implementation

#artificialintelligenceDec-13-2020, 15:30:17 GMT

Well, let's recall some definitions and equations that we need for implementing the Q-Learning algorithm. In RL, we have an environment that we want to learn. For doing that, we build an agent who will interact with the environment through a trial-error process. At each time step t, the agent is at a certain state s_t and chooses an action a_t to perform. The environment runs the selected action and returns a reward to the agent.

agent, q-learning algorithm, q-value function, (13 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

#144 - Michael Littman: Reinforcement Learning and the Future of AI

#artificialintelligenceDec-13-2020, 04:35:14 GMT

Michael Littman is a computer scientist at Brown University. Please support this podcast by checking out our sponsors: – SimpliSafe: https://simplisafe.com/lex SUPPORT & CONNECT: – Check out the sponsors above, it's the best way to support this podcast – Support on Patreon: https://www.patreon.com/lexfridman On some podcast players you should be able to click the timestamp to jump to that time.

michael littman, reinforcement learning, youtube, (4 more...)

#artificialintelligence

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.43)

Add feedback

Combining Reinforcement Learning with Lin-Kernighan-Helsgaun Algorithm for the Traveling Salesman Problem

Zheng, Jiongzhi, He, Kun, Zhou, Jianrong, Jin, Yan, Li, Chu-min

arXiv.org Artificial IntelligenceDec-13-2020

We address the Traveling Salesman Problem (TSP), a famous NP-hard combinatorial optimization problem. And we propose a variable strategy reinforced approach, denoted as VSR-LKH, which combines three reinforcement learning methods (Q-learning, Sarsa and Monte Carlo) with the well-known TSP algorithm, called Lin-Kernighan-Helsgaun (LKH). VSR-LKH replaces the inflexible traversal operation in LKH, and lets the program learn to make choice at each search step by reinforcement learning. Experimental results on 111 TSP benchmarks from the TSPLIB with up to 85,900 cities demonstrate the excellent performance of the proposed method.

algorithm, reinforcement, vsr-lkh opt, (11 more...)

arXiv.org Artificial Intelligence

2012.04461

Country:

Asia > China (0.04)
Europe > France (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Rounding Up Machine Learning Developments From 2020

#artificialintelligenceDec-12-2020, 05:30:35 GMT

The year 2020 saw many exciting developments in machine learning. As the year 2020 comes to an end, here is a roundup of these innovations in various machine learning domains such as reinforcement learning, Natural Language Processing, ML frameworks such as Pytorch and TensorFlow, and more. Arm-based Graviton processors went mainstream in 2020, which utilize 30 billion transistors with 64-bit Arm cores built by Israeli-based engineering company Annapurna Labs. AWS recently acquired it for powering memory-intensive workloads like real-time big data analytics. It showed a 40% performance improvement emerging as an alternative to x86-based processors for machine learning, shifting the trend from the Intel-dominated cloud market to Arm-based Graviton processors.

algorithm, neural network, reinforcement, (14 more...)

#artificialintelligence

Country: Europe > Germany > Bavaria > Upper Bavaria > Munich (0.05)

Industry: Information Technology (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Semi-supervised reward learning for offline reinforcement learning

Konyushkova, Ksenia, Zolna, Konrad, Aytar, Yusuf, Novikov, Alexander, Reed, Scott, Cabi, Serkan, de Freitas, Nando

arXiv.org Artificial IntelligenceDec-12-2020

In offline reinforcement learning (RL) agents are trained using a logged dataset. It appears to be the most natural route to attack real-life applications because in domains such as healthcare and robotics interactions with the environment are either expensive or unethical. Training agents usually requires reward functions, but unfortunately, rewards are seldom available in practice and their engineering is challenging and laborious. To overcome this, we investigate reward learning under the constraint of minimizing human reward annotations. We consider two types of supervision: timestep annotations and demonstrations. We propose semi-supervised learning algorithms that learn from limited annotations and incorporate unlabelled data. In our experiments with a simulated robotic arm, we greatly improve upon behavioural cloning and closely approach the performance achieved with ground truth rewards. We further investigate the relationship between the quality of the reward model and the final policies. We notice, for example, that the reward models do not need to be perfect to result in useful policies.

annotation, reward model, supervision, (14 more...)

arXiv.org Artificial Intelligence

2012.06899

Genre: Research Report (0.84)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Learning Multi-Arm Manipulation Through Collaborative Teleoperation

Tung, Albert, Wong, Josiah, Mandlekar, Ajay, Martín-Martín, Roberto, Zhu, Yuke, Fei-Fei, Li, Savarese, Silvio

arXiv.org Artificial IntelligenceDec-12-2020

Imitation Learning (IL) is a powerful paradigm to teach robots to perform manipulation tasks by allowing them to learn from human demonstrations collected via teleoperation, but has mostly been limited to single-arm manipulation. However, many real-world tasks require multiple arms, such as lifting a heavy object or assembling a desk. Unfortunately, applying IL to multi-arm manipulation tasks has been challenging -- asking a human to control more than one robotic arm can impose significant cognitive burden and is often only possible for a maximum of two robot arms. To address these challenges, we present Multi-Arm RoboTurk (MART), a multi-user data collection platform that allows multiple remote users to simultaneously teleoperate a set of robotic arms and collect demonstrations for multi-arm tasks. Using MART, we collected demonstrations for five novel two and three-arm tasks from several geographically separated users. From our data we arrived at a critical insight: most multi-arm tasks do not require global coordination throughout its full duration, but only during specific moments. We show that learning from such data consequently presents challenges for centralized agents that directly attempt to model all robot actions simultaneously, and perform a comprehensive study of different policy architectures with varying levels of centralization on our tasks. Finally, we propose and evaluate a base-residual policy framework that allows trained policies to better adapt to the mixed coordination setting common in multi-arm manipulation, and show that a centralized policy augmented with a decentralized residual model outperforms all other models on our set of benchmark tasks. Additional results and videos at https://roboturk.stanford.edu/multiarm .

coordination, demonstration, multi-arm task, (13 more...)

arXiv.org Artificial Intelligence

2012.06738

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.24)
North America > United States > Texas > Travis County > Austin (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback