AITopics

arXiv.org Artificial IntelligenceJul-1-2020

Exploring Exploration: Comparing Children with RL Agents in Unified Environments

Kosoy, Eliza, Collins, Jasmine, Chan, David M., Huang, Sandy, Pathak, Deepak, Agrawal, Pulkit, Canny, John, Gopnik, Alison, Hamrick, Jessica B.

Research in developmental psychology consistently shows that children explore the world thoroughly and efficiently and that this exploration allows them to learn. In turn, this early learning supports more robust generalization and intelligent behavior later in life. While much work has gone into developing methods for exploration in machine learning, artificial agents have not yet reached the high standard set by their human counterparts. In this work we propose using DeepMind Lab (Beattie et al., 2016) as a platform to directly compare child and agent behaviors and to develop new exploration techniques. We outline two ongoing experiments to demonstrate the effectiveness of a direct comparison, and outline a number of open research questions that we believe can be tested using this methodology.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2005.0288

Country:

North America > United States > Massachusetts (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)

Genre:

Research Report > New Finding (0.48)
Research Report > Experimental Study (0.34)

Industry:

Leisure & Entertainment > Games > Computer Games (0.46)
Education > Educational Setting (0.34)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.37)

Student-Teacher Curriculum Learning via Reinforcement Learning: Predicting Hospital Inpatient Admission Location

el-Bouri, Rasheed, Eyre, David, Watkinson, Peter, Zhu, Tingting, Clifton, David

Accurate and reliable prediction of hospital admission location is important due to resource-constraints and space availability in a clinical setting, particularly when dealing with patients who come from the emergency department. In this work we propose a student-teacher network via reinforcement learning to deal with this specific problem. A representation of the weights of the student network is treated as the state and is fed as an input to the teacher network. The teacher network's action is to select the most appropriate batch of data to train the student network on from a training set sorted according to entropy. By validating on three datasets, not only do we show that our approach outperforms state-of-the-art methods on tabular data and performs competitively on image recognition, but also that novel curricula are learned by the teacher network. We demonstrate experimentally that the teacher network can actively learn about the student network and guide it to achieve better performance than if trained alone.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

2007.01135

Country:

Europe > Austria > Vienna (0.14)
Europe > United Kingdom > England > Staffordshire (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Health Care Providers & Services (1.00)
Leisure & Entertainment > Games > Computer Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Cassel, Asaf, Koren, Tomer

Bandit Linear Control

Reinforcement learning studies sequential decision making problems where a learning agent repeatedly interacts with an environment and aims to improve her strategy over time based on the received feedback. One of the most fundamental tradeoffs in reinforcement learning theory is the exploration vs. exploitation tradeoff, that arises whenever the learner observes only partial feedback after each of her decisions, thus having to balance between exploring new strategies and exploiting those that are already known to perform well. The most basic and well-studied form of partial feedback is the so-called "bandit" feedback, where the learner only observes the cost of her chosen action on each decision round, while obtaining no information about the performance of other actions. Traditionally, the environment dynamics in reinforcement learning are modeled as a Markov Decision Process (MDP) with a finite number of possible states and actions. The MDP model has been studied and analyzed in numerous different settings and under various assumptions on the transition parameters, the nature of the reward functions, and the feedback model. Recently, a particular focus has been given to continuous state-action MDPs, and in particular, to a specific family of models in classic control where the state transition function is linear.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

2007.00759

Country: Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Tirinzoni, Andrea, Poiani, Riccardo, Restelli, Marcello

Sequential Transfer in Reinforcement Learning with a Generative Model

We are interested in how to design reinforcement learning agents that provably reduce the sample complexity for learning new tasks by transferring knowledge from previously-solved ones. The availability of solutions to related problems poses a fundamental trade-off: whether to seek policies that are expected to achieve high (yet sub-optimal) performance in the new task immediately or whether to seek information to quickly identify an optimal solution, potentially at the cost of poor initial behavior. In this work, we focus on the second objective when the agent has access to a generative model of state-action pairs. First, given a set of solved tasks containing an approximation of the target one, we design an algorithm that quickly identifies an accurate solution by seeking the state-action pairs that are most informative for this purpose. We derive PAC bounds on its sample complexity which clearly demonstrate the benefits of using this kind of prior knowledge. Then, we show how to learn these approximate tasks sequentially by reducing our transfer setting to a hidden Markov model and employing spectral methods to recover its parameters. Finally, we empirically verify our theoretical findings in simple simulated domains.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

2007.00722

Country:

Europe > Austria > Vienna (0.14)
Europe > Sweden > Stockholm > Stockholm (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > Italy > Lombardy > Milan (0.04)

Genre:

Research Report (1.00)
Workflow (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Troussard, Martin, Pignat, Emmanuel, Kamalaruban, Parameswaran, Calinon, Sylvain, Cevher, Volkan

Interaction-limited Inverse Reinforcement Learning

Learning from Demonstrations (LfD) is an active research area that addresses the problem of learning how to perform a task by observing the demonstrations provided by an expert. This approach plays an important role in many real-life learning settings, including human-to-robot interaction [1, 2, 3, 4, 5]. The two popular approaches for LfD include (i) behavioral cloning, which directly mimics the expert behavior, without understanding the objective [6], and (ii) inverse reinforcement learning (IRL), which infers the reward function (i.e., the objective of the task) explaining the expert behavior [7]. In this work, we focus on the IRL approach to LfD. Typically, the IRL learner assumes that the demonstrated expert behavior is optimal with respect to some reward function, even if the reward function cannot be specified explicitly as in typical reinforcement learning (RL).

demonstration, machine learning, reinforcement learning, (13 more...)

2007.00425

Country:

Asia > Vietnam > Hanoi > Hanoi (0.05)
North America > United States > New Jersey > Hudson County > Secaucus (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Education (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Bello, Kevin, Xu, Qiuling, Honorio, Jean

Fundamental Limits of Adversarial Learning

Robustness of machine learning methods is essential for modern practical applications. Given the arms race between attack and defense methods, one may be curious regarding the fundamental limits of any defense mechanism. In this work, we focus on the problem of learning from noise-injected data, where the existing literature falls short by either assuming a specific attack method or by over-specifying the learning problem. We shed light on the information-theoretic limits of adversarial learning without assuming a particular learning process or attacker. Finally, we apply our general bounds to a canonical set of non-trivial learning problems and provide examples of common types of attacks.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

2007.00289

Country:

North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry: Education > Focused Education > Special Education (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.61)

#artificialintelligenceJun-30-2020, 14:00:40 GMT

Reinforcement Learning: Scaling Personalized Marketing

Personalized marketing for retail consumers and account-based marketing for B2B customers now have proven value. Online interactions with customers generate large volumes of data for granular learning about consumer behavior for customization of product recommendations, messages, and content. The missing piece is a scalable and just-in-time way to gauge customer preferences and make product recommendations while visitors engage with websites. Deep reinforcement learning algorithms have been trained at the threshold level where they begin to achieve conversion rates to match the costs of data analysis. The touchstone of reinforcement learning (RL) is that it experiments with multiple pathways to achieve the objective of acquiring customers or any other goal.

customer, machine learning, reinforcement learning, (13 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

AIHubJun-30-2020, 12:10:00 GMT

The ingredients of real world robotic reinforcement learning

Robots have been useful in environments that can be carefully controlled, such as those commonly found in industrial settings (e.g. assembly lines). However, in unstructured settings like the home, we need robotic systems that are adaptive to the diversity of the real world. Learning-based algorithms have the potential to enable robots to acquire complex behaviors adaptively in unstructured environments, by leveraging data collected from the environment. In particular, with reinforcement learning, robots learn novel behaviors through trial and error interactions. This is particularly important as we deploy robots in scenarios where the environment may not be known.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

AIHub

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)

#artificialintelligenceJun-30-2020, 08:45:42 GMT

Reinforcement Learning: A Brief Introduction to Rules and Applications

The brain of a human child is spectacularly amazing. Even in any previously unknown situation, the brain makes a decision based on its primal knowledge. Depending on the outcome, it learns and remembers the most optimal choices to be taken in that particular scenario. On a high level, this process of learning can be understood as a ’trial and error’ process, where the brain tries to maximise the occurrence of positive outcomes.

artificial intelligence, machine learning, reinforcement learning, (4 more...)

#artificialintelligence

Country: North America > United States > California > San Francisco County > San Francisco (0.60)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)