AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

On the Basis of Sex: A Review of Gender Bias in Machine Learning Applications

Feldman, Tal, Peake, Ashley

arXiv.org Artificial IntelligenceApr-6-2021

Machine Learning models have been deployed across almost every aspect of society, often in situations that affect the social welfare of many individuals. Although these models offer streamlined solutions to large problems, they may contain biases and treat groups or individuals unfairly. To our knowledge, this review is one of the first to focus specifically on gender bias in applications of machine learning. We first introduce several examples of machine learning gender bias in practice. We then detail the most widely used formalizations of fairness in order to address how to make machine learning models fairer. Specifically, we discuss the most influential bias mitigation algorithms as applied to domains in which models have a high propensity for gender discrimination. We group these algorithms into two overarching approaches -- removing bias from the data directly and removing bias from the model through training -- and we present representative examples of each. As society increasingly relies on artificial intelligence to help in decision-making, addressing gender biases present in these models is imperative. To provide readers with the tools to assess the fairness of machine learning models and mitigate the biases present in them, we discuss multiple open source packages for fairness in AI.

discrimination, fairness, proceedings, (15 more...)

arXiv.org Artificial Intelligence

2104.02532

Country:

North America > United States > New York > New York County > New York City (0.05)
Europe > Sweden > Stockholm > Stockholm (0.04)
North America > United States > Florida > Broward County > Fort Lauderdale (0.04)

Genre: Research Report (1.00)

Industry: Law > Civil Rights & Constitutional Law (0.88)

Technology:

Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

Intelligent Building Control Systems for Thermal Comfort and Energy-Efficiency: A Systematic Review of Artificial Intelligence-Assisted Techniques

Merabet, Ghezlane Halhoul, Essaaidi, Mohamed, Haddou, Mohamed Ben, Qolomany, Basheer, Qadir, Junaid, Anan, Muhammad, Al-Fuqaha, Ala, Abid, Mohamed Riduan, Benhaddou, Driss

arXiv.org Artificial IntelligenceApr-5-2021

Building operations represent a significant percentage of the total primary energy consumed in most countries due to the proliferation of Heating, Ventilation and Air-Conditioning (HVAC) installations in response to the growing demand for improved thermal comfort. Reducing the associated energy consumption while maintaining comfortable conditions in buildings are conflicting objectives and represent a typical optimization problem that requires intelligent system design. Over the last decade, different methodologies based on the Artificial Intelligence (AI) techniques have been deployed to find the sweet spot between energy use in HVAC systems and suitable indoor comfort levels to the occupants. This paper performs a comprehensive and an in-depth systematic review of AI-based techniques used for building control systems by assessing the outputs of these techniques, and their implementations in the reviewed works, as well as investigating their abilities to improve the energy-efficiency, while maintaining thermal comfort conditions. This enables a holistic view of (1) the complexities of delivering thermal comfort to users inside buildings in an energy-efficient way, and (2) the associated bibliographic material to assist researchers and experts in the field in tackling such a challenge. Among the 20 AI tools developed for both energy consumption and comfort control, functions such as identification and recognition patterns, optimization, predictive control. Based on the findings of this work, the application of AI technology in building control is a promising area of research and still an ongoing, i.e., the performance of AI-based control is not yet completely satisfactory. This is mainly due in part to the fact that these algorithms usually need a large amount of high-quality real-world data, which is lacking in the building or, more precisely, the energy sector.

comfort parameter, renewable energy, upstream oil & gas, (28 more...)

arXiv.org Artificial Intelligence

2104.02214

Country:

Asia > China (0.45)
North America > United States > Texas (0.27)
North America > United States > Michigan (0.27)
(15 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Research Report > Experimental Study (0.67)

Industry:

Construction & Engineering > HVAC (1.00)
Energy > Oil & Gas > Upstream (0.68)
Energy > Renewable > Solar (0.45)
Government > Regional Government > North America Government > United States Government (0.45)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Systems & Languages (1.00)
(12 more...)

Add feedback

Low Dose Helical CBCT denoising by using domain filtering with deep reinforcement learning

Kang, Wooram, Patwari, Mayank

arXiv.org Artificial IntelligenceApr-2-2021

Cone Beam Computed Tomography(CBCT) is a now known method to conduct CT imaging. Especially, The Low Dose CT imaging is one of possible options to protect organs of patients when conducting CT imaging. Therefore Low Dose CT imaging can be an alternative instead of Standard dose CT imaging. However Low Dose CT imaging has a fundamental issue with noises within results compared to Standard Dose CT imaging. Currently, there are lots of attempts to erase the noises. Most of methods with artificial intelligence have many parameters and unexplained layers or a kind of black-box methods. Therefore, our research has purposes related to these issues. Our approach has less parameters than usual methods by having Iterative learn-able bilateral filtering approach with Deep reinforcement learning. And we applied The Iterative learn-able filtering approach with deep reinforcement learning to sinograms and reconstructed volume domains. The method and the results of the method can be much more explainable than The other black box AI approaches. And we applied the method to Helical Cone Beam Computed Tomography(CBCT), which is the recent CBCT trend. We tested this method with on 2 abdominal scans(L004, L014) from Mayo Clinic TCIA dataset. The results and the performances of our approach overtake the results of the other previous methods.

cbct, focal spot, geometry, (10 more...)

arXiv.org Artificial Intelligence

2104.00889

Country:

Europe > Sweden > Stockholm > Stockholm (0.04)
Europe > Germany > Bavaria > Middle Franconia > Nuremberg (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Distributional Offline Continuous-Time Reinforcement Learning with Neural Physics-Informed PDEs (SciPhy RL for DOCTR-L)

Halperin, Igor

arXiv.org Artificial IntelligenceApr-2-2021

This paper addresses distributional offline continuous-time reinforcement learning (DOCTR-L) with stochastic policies for high-dimensional optimal control. A soft distributional version of the classical Hamilton-Jacobi-Bellman (HJB) equation is given by a semilinear partial differential equation (PDE). This `soft HJB equation' can be learned from offline data without assuming that the latter correspond to a previous optimal or near-optimal policy. A data-driven solution of the soft HJB equation uses methods of Neural PDEs and Physics-Informed Neural Networks developed in the field of Scientific Machine Learning (SciML). The suggested approach, dubbed `SciPhy RL', thus reduces DOCTR-L to solving neural PDEs from data. Our algorithm called Deep DOCTR-L converts offline high-dimensional data into an optimal policy in one step by reducing it to supervised learning, instead of relying on value iteration or policy iteration methods. The method enables a computable approach to the quality control of obtained policies in terms of both their expected returns and uncertainties about their values.

equation, hjb equation, learning, (17 more...)

arXiv.org Artificial Intelligence

2104.0104

Country: North America > United States > District of Columbia > Washington (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Data Science: Supervised Machine Learning in Python

#artificialintelligenceApr-1-2021, 07:59:09 GMT

Data Science: Supervised Machine Learning in Python - Full Guide to Implementing Classic Machine Learning Algorithms in Python and with Scikit-Learn Created by Lazy Programmer Team, Lazy Programmer Inc. English [Auto], Spanish [Auto]Preview this Course - GET COUPON CODE In recent years, we've seen a resurgence in AI, or artificial intelligence, and machine learning. Machine learning has led to some amazing results, like being able to analyze medical images and predict diseases on-par with human experts. Google's AlphaGo program was able to beat a world champion in the strategy game go using deep reinforcement learning. Machine learning is even being used to program self driving cars, which is going to change the automotive industry forever. Imagine a world with drastically reduced car accidents, simply by removing the element of human error.

algorithm, learning, machine learning, (11 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Industry:

Automobiles & Trucks (0.77)
Education > Educational Setting > Online (0.71)
Information Technology (0.71)
Leisure & Entertainment > Games (0.56)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.52)
(2 more...)

Add feedback

UAV-Assisted Communication in Remote Disaster Areas using Imitation Learning

Shamsoshoara, Alireza, Afghah, Fatemeh, Blasch, Erik, Ashdown, Jonathan, Bennis, Mehdi

arXiv.org Artificial IntelligenceApr-1-2021

The damage to cellular towers during natural and man-made disasters can disturb the communication services for cellular users. One solution to the problem is using unmanned aerial vehicles to augment the desired communication network. The paper demonstrates the design of a UAV-Assisted Imitation Learning (UnVAIL) communication system that relays the cellular users' information to a neighbor base station. Since the user equipment (UEs) are equipped with buffers with limited capacity to hold packets, UnVAIL alternates between different UEs to reduce the chance of buffer overflow, positions itself optimally close to the selected UE to reduce service time, and uncovers a network pathway by acting as a relay node. UnVAIL utilizes Imitation Learning (IL) as a data-driven behavioral cloning approach to accomplish an optimal scheduling solution. Results demonstrate that UnVAIL performs similar to a human expert knowledge-based planning in communication timeliness, position accuracy, and energy consumption with an accuracy of 97.52% when evaluated on a developed simulator to train the UAV.

packet, scenario, uav, (14 more...)

arXiv.org Artificial Intelligence

2105.12823

Country:

Europe > Finland > Northern Ostrobothnia > Oulu (0.04)
North America > United States > South Carolina > Charleston County > Charleston (0.04)
North America > United States > New York > Rensselaer County > Troy (0.04)
North America > United States > Arizona > Coconino County > Flagstaff (0.04)

Genre: Research Report > New Finding (0.66)

Industry:

Telecommunications (1.00)
Information Technology (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
(5 more...)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Residual Model Learning for Microrobot Control

Gruenstein, Joshua, Chen, Tao, Doshi, Neel, Agrawal, Pulkit

arXiv.org Artificial IntelligenceApr-1-2021

A majority of microrobots are constructed using compliant materials that are difficult to model analytically, limiting the utility of traditional model-based controllers. Challenges in data collection on microrobots and large errors between simulated models and real robots make current model-based learning and sim-to-real transfer methods difficult to apply. We propose a novel framework residual model learning (RML) that leverages approximate models to substantially reduce the sample complexity associated with learning an accurate robot model. We show that using RML, we can learn a model of the Harvard Ambulatory MicroRobot (HAMR) using just 12 seconds of passively collected interaction data. The learned model is accurate enough to be leveraged as "proxy-simulator" for learning walking and turning behaviors using model-free reinforcement learning algorithms. RML provides a general framework for learning from extremely small amounts of interaction data, and our experiments with HAMR clearly demonstrate that RML substantially outperforms existing techniques.

hamr-full, robot, transmission, (15 more...)

arXiv.org Artificial Intelligence

2104.00631

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report (1.00)

Industry: Materials (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)

Add feedback

Touch-based Curiosity for Sparse-Reward Tasks

Rajeswar, Sai, Ibrahim, Cyril, Surya, Nitin, Golemo, Florian, Vazquez, David, Courville, Aaron, Pinheiro, Pedro O.

arXiv.org Artificial IntelligenceApr-1-2021

Abstract--Robots in many real-world settings have access to force/torque sensors in their gripper and tactile sensing is often necessary in tasks that involve contact-rich motion. In this work, we leverage surprise from mismatches in touch feedback to guide exploration in hard sparse-reward reinforcement learning tasks. Our approach, Touch-based Curiosity (ToC), learns what visible objects interactions are supposed to "feel" like. We encourage exploration by rewarding interactions where the expectation and the experience don't match. In our proposed method, an initial task-independent exploration phase is followed by an on-task learning phase, in which the original interactions are relabeled with on-task rewards. We test our approach on a range of touchintensive robot arm tasks (e.g. In the former, the environment is often fully observable, and the reward is dense and well-defined. In the Recent works in RL have focused on curiosity-driven latter, a large amount of work is required to design useful exploration through prediction-based surprise [6, 45, 48]. While it may be possible to hand-craft dense formulation, a forward dynamics models predicts the future, and reward signals for many real-world tasks, we believe that it's if its prediction is incorrect when compared to the real future, a worthwhile endeavor to investigate learning methods that do the agent is surprised and is thus rewarded.

agent, exploration, interaction, (14 more...)

arXiv.org Artificial Intelligence

2104.00442

Country:

North America > United States > Virginia (0.04)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots > Manipulation (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Playing Pong using Reinforcement Learning

#artificialintelligenceMar-31-2021, 09:00:10 GMT

In the 1970s, Pong was a very popular video arcade game. It is a 2D video game emulating table tennis, i.e. you got a bat (a rectangle) you can move vertically and try to hit a "ball" (a moving square). If the ball hits the bounding box of the game, it bounces back like a billiard ball. If you miss the ball, the opponent scores. A single-player adaptation Breakout came out later, where the ball had the ability to destroy some blocks on the top of the screen and the bat moved to the bottom of the screen.

actor, agent, reinforcement learning, (14 more...)

#artificialintelligence

Industry:

Leisure & Entertainment > Games > Computer Games (0.56)
Leisure & Entertainment > Sports (0.55)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

Learning Generalizable Robotic Reward Functions from "In-The-Wild" Human Videos

Chen, Annie S., Nair, Suraj, Finn, Chelsea

arXiv.org Artificial IntelligenceMar-31-2021

We are motivated by the goal of generalist robots that can complete a wide range of tasks across many environments. Critical to this is the robot's ability to acquire some metric of task success or reward, which is necessary for reinforcement learning, planning, or knowing when to ask for help. For a general-purpose robot operating in the real world, this reward function must also be able to generalize broadly across environments, tasks, and objects, while depending only on on-board sensor observations (e.g. RGB images). While deep learning on large and diverse datasets has shown promise as a path towards such generalization in computer vision and natural language, collecting high quality datasets of robotic interaction at scale remains an open challenge. In contrast, "in-the-wild" videos of humans (e.g. YouTube) contain an extensive collection of people doing interesting tasks across a diverse range of settings. In this work, we propose a simple approach, Domain-agnostic Video Discriminator (DVD), that learns multitask reward functions by training a discriminator to classify whether two videos are performing the same task, and can generalize by virtue of learning from a small amount of robot data with a broad dataset of human videos. We find that by leveraging diverse human datasets, this reward function (a) can generalize zero shot to unseen environments, (b) generalize zero shot to unseen tasks, and (c) can be combined with visual model predictive control to solve robotic manipulation tasks on a real WidowX200 robot in an unseen environment from a single human demo.

deep learning, neural network, video, (19 more...)

arXiv.org Artificial Intelligence

2103.16817

Country:

North America > United States > Pennsylvania (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.82)

Industry: Energy > Oil & Gas (0.49)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback