AITopics

Liang, Junchi, Boularias, Abdeslam

Learning Transition Models with Time-delayed Causal Relations

arXiv.org Machine LearningAug-4-2020

This paper introduces an algorithm for discovering implicit and delayed causal relations between events observed by a robot at arbitrary times, with the objective of improving data-efficiency and interpretability of model-based reinforcement learning (RL) techniques. The proposed algorithm initially predicts observations with the Markov assumption, and incrementally introduces new hidden variables to explain and reduce the stochasticity of the observations. The hidden variables are memory units that keep track of pertinent past events. Such events are systematically identified by their information gains. The learned transition and reward models are then used for planning. Experiments on simulated and real robotic tasks show that this method significantly improves over current RL techniques.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

2008.01593

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Virginia > Arlington County > Arlington (0.04)
North America > United States > New York (0.04)
(4 more...)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)

#artificialintelligenceAug-3-2020, 07:16:09 GMT

What's that? Reinforcement Learning in the Real-world?

Reinforcement Learning offers a distinctive way of solving the Machine Learning puzzle. It's sequential decision-making ability, and suitability to tasks requiring a trade-off between immediate and long-term returns are some components that make it desirable in settings where supervised-learning or unsupervised learning approaches would, in comparison, not fit as well. By having agents start with zero knowledge then learn qualitatively good behaviour through interaction with the environment, it's almost fair to say Reinforcement Learning (RL) is the closest thing we have to Artificial General Intelligence yet. We can see RL being used in robotics control, treatment design in healthcare, among others; but why aren't we boasting of many RL agents being scaled up to real-world production systems? There's a reason why games, like Atari, are such nice RL benchmarks -- they let us care only about maximizing the score and not worry about designing a reward function.

agent, artificial intelligence, upstream oil & gas, (20 more...)

#artificialintelligence

Industry: Energy > Oil & Gas > Upstream (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Reinforced Epidemic Control: Saving Both Lives and Economy

Song, Sirui, Zong, Zefang, Li, Yong, Liu, Xue, Yu, Yang

Saving lives or economy is a dilemma for epidemic control in most cities while smart-tracing technology raises people's privacy concerns. In this paper, we propose a solution for the life-or-economy dilemma that does not require private data. We bypass the private-data requirement by suppressing epidemic transmission through a dynamic control on inter-regional mobility that only relies on Origin-Designation (OD) data. We develop DUal-objective Reinforcement-Learning Epidemic Control Agent (DURLECA) to search mobility-control policies that can simultaneously minimize infection spread and maximally retain mobility. DURLECA hires a novel graph neural network, namely Flow-GNN, to estimate the virus-transmission risk induced by urban mobility. The estimated risk is used to support a reinforcement learning agent to generate mobility-control actions. The training of DURLECA is guided with a well-constructed reward function, which captures the natural trade-off relation between epidemic control and mobility retaining. Besides, we design two exploration strategies to improve the agent's searching efficiency and help it get rid of local optimums. Extensive experimental results on a real-world OD dataset show that DURLECA is able to suppress infections at an extremely low level while retaining 76\% of the mobility in the city. Our implementation is available at https://github.com/anyleopeace/DURLECA/.

immunology, mobility, upstream oil & gas, (20 more...)

2008.01257

Country:

North America (0.29)
Asia (0.29)

Genre: Research Report > Promising Solution (0.34)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Epidemiology (1.00)
(3 more...)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Getting to Know One Another: Calibrating Intent, Capabilities and Trust for Human-Robot Collaboration

Lee, Joshua, Fong, Jeffrey, Kok, Bing Cai, Soh, Harold

Common experience suggests that agents who know each other well are better able to work together. In this work, we address the problem of calibrating intention and capabilities in human-robot collaboration. In particular, we focus on scenarios where the robot is attempting to assist a human who is unable to directly communicate her intent. Moreover, both agents may have differing capabilities that are unknown to one another. We adopt a decision-theoretic approach and propose the TICC-POMDP for modeling this setting, with an associated online solver. Experiments show our approach leads to better team performance both in simulation and in a real-world study with human subjects.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2008.00699

Country:

Asia > Singapore (0.04)
North America > United States > Hawaii (0.04)

Genre: Research Report > Experimental Study (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.47)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.47)
(3 more...)

Gros, Timo P., Höller, Daniel, Hoffmann, Jörg, Wolf, Verena

Tracking the Race Between Deep Reinforcement Learning and Imitation Learning -- Extended Version

Learning-based approaches for solving large sequential decision making problems have become popular in recent years. The resulting agents perform differently and their characteristics depend on those of the underlying learning approach. Here, we consider a benchmark planning problem from the reinforcement learning domain, the Racetrack, to investigate the properties of agents derived from different deep (reinforcement) learning approaches. We compare the performance of deep supervised learning, in particular imitation learning, to reinforcement learning for the Racetrack model. We find that imitation learning yields agents that follow more risky paths. In contrast, the decisions of deep reinforcement learning are more foresighted, i.e., avoid states in which fatal decisions are more likely. Our evaluations show that for this sequential decision making problem, deep reinforcement learning performs best in many aspects even though for imitation learning optimal decisions are considered.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2008.00766

Country: Europe > Germany > Saarland > Saarbrücken (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Learning to Play Two-Player Perfect-Information Games without Knowledge

Cohen-Solal, Quentin

In this paper, several techniques for learning game state evaluation functions by reinforcement are proposed. The first is a generalization of tree bootstrapping (tree learning): it is adapted to the context of reinforcement learning without knowledge based on non-linear functions. With this technique, no information is lost during the reinforcement learning process. The second is a modification of minimax with unbounded depth extending the best sequences of actions to the terminal states. This modified search is intended to be used during the learning process. The third is to replace the classic gain of a game (+1 / -1) with a reinforcement heuristic. We study particular reinforcement heuristics such as: quick wins and slow defeats ; scoring ; mobility or presence. The four is another variant of unbounded minimax, which plays the safest action instead of playing the best action. This modified search is intended to be used after the learning process. The five is a new action selection distribution. The conducted experiments suggest that these techniques improve the level of play. Finally, we apply these different techniques to design program-players to the game of Hex (size 11 and 13) surpassing the level of Mohex 2.0 with reinforcement learning from self-play without knowledge. At Hex size 11 (without swap), the program-player reaches the level of Mohex 3HNN.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2008.01188

Country:

Europe > France (0.04)
North America > United States > Massachusetts > Middlesex County > Natick (0.04)
North America > United States > Arizona > Maricopa County > Phoenix (0.04)
Europe > Italy > Piedmont > Turin Province > Turin (0.04)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Games > Chess (0.47)
Leisure & Entertainment > Games > Go (0.46)
Leisure & Entertainment > Games > Hex (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

QPLEX: Duplex Dueling Multi-Agent Q-Learning

Wang, Jianhao, Ren, Zhizhou, Liu, Terry, Yu, Yang, Zhang, Chongjie

We explore value-based multi-agent reinforcement learning (MARL) in the popular paradigm of centralized training with decentralized execution (CTDE). CTDE requires the consistency of the optimal joint action selection with optimal individual action selections, which is called the IGM (Individual-Global-Max) principle. However, in order to achieve scalability, existing MARL methods either limit representation expressiveness of their value function classes or relax the IGM consistency, which may lead to poor policies or even divergence. This paper presents a novel MARL approach, called duPLEX dueling multi-agent Q-learning (QPLEX), that takes a duplex dueling network architecture to factorize the joint value function. This duplex dueling architecture transforms the IGM principle to easily realized constraints on advantage functions and thus enables efficient value function learning. Theoretical analysis shows that QPLEX solves a rich class of tasks. Empirical experiments on StarCraft II unit micromanagement tasks demonstrate that QPLEX significantly outperforms state-of-the-art baselines in both online and offline task settings, and also reveal that QPLEX achieves high sample efficiency and can benefit from offline datasets without additional exploration.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

2008.01062

Country:

Europe > Hungary (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Brown, Kyle, Driggs-Campbell, Katherine, Kochenderfer, Mykel J.

Modeling and Prediction of Human Driver Behavior: A Survey

We present a review and taxonomy of 200 models from the literature on driver behavior modeling. We begin by introducing a mathematical formulation based on the partially observable stochastic game, which serves as a common framework for comparing and contrasting different driver models. Our taxonomy is constructed around the core modeling tasks of state estimation, intention estimation, trait estimation, and motion prediction, and also discusses the auxiliary tasks of risk estimation, anomaly detection, behavior imitation and microscopic traffic simulation. Existing driver models are categorized based on the specific tasks they address and key attributes of their approach.

data mining, machine learning, reinforcement learning, (17 more...)

2006.08832

Country:

North America > United States > Illinois (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
North America > United States > Arizona (0.04)

Genre:

Overview (1.00)
Research Report (0.81)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
(9 more...)

Robertson, Zachary W., Walter, Matthew R.

Concurrent Training Improves the Performance of Behavioral Cloning from Observation

arXiv.org Machine LearningAug-3-2020

Learning from demonstration is widely used as an efficient way for robots to acquire new skills. However, it typically requires that demonstrations provide full access to the state and action sequences. In contrast, learning from observation offers a way to utilize unlabeled demonstrations (e.g., video) to perform imitation learning. One approach to this is behavioral cloning from observation (BCO). The original implementation of BCO proceeds by first learning an inverse dynamics model and then using that model to estimate action labels, thereby reducing the problem to behavioral cloning. However, existing approaches to BCO require a large number of initial interactions in the first step. Here, we provide a novel theoretical analysis of BCO, introduce a modification BCO*, and show that in the semi-supervised setting, BCO* can concurrently improve both its estimate for the inverse dynamics model and the expert policy. This result allows us to eliminate the dependence on initial interactions and dramatically improve the sample complexity of BCO. We evaluate the effectiveness of our algorithm through experiments on various benchmark domains. The results demonstrate that concurrent training not only improves over the performance of BCO but also results in performance that is competitive with state-of-the-art imitation learning methods such as GAIL and Value-Dice.

inverse dynamic model, machine learning, reinforcement learning, (14 more...)

arXiv.org Machine Learning

2008.01205

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)