AITopics

2102.11107

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(13 more...)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Games (0.92)
Health & Medicine > Therapeutic Area > Neurology (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)
(3 more...)

Agarwal, Mridul, Ganguly, Bhargav, Aggarwal, Vaneet

Communication Efficient Parallel Reinforcement Learning

arXiv.org Artificial IntelligenceFeb-21-2021

We consider the problem where $M$ agents interact with $M$ identical and independent environments with $S$ states and $A$ actions using reinforcement learning for $T$ rounds. The agents share their data with a central server to minimize their regret. We aim to find an algorithm that allows the agents to minimize the regret with infrequent communication rounds. We provide \NAM\ which runs at each agent and prove that the total cumulative regret of $M$ agents is upper bounded as $\Tilde{O}(DS\sqrt{MAT})$ for a Markov Decision Process with diameter $D$, number of states $S$, and number of actions $A$. The agents synchronize after their visitations to any state-action pair exceeds a certain threshold. Using this, we obtain a bound of $O\left(MSA\log(MT)\right)$ on the total number of communications rounds. Finally, we evaluate the algorithm against multiple environments and demonstrate that the proposed algorithm performs at par with an always communication version of the UCRL2 algorithm, while with significantly lower communication.

agent, algorithm, equation, (14 more...)

2102.1074

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)
(2 more...)

Genre: Research Report (0.63)

Industry: Transportation (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Chhibber, Nalin, Law, Edith

Towards Teachable Conversational Agents

arXiv.org Artificial IntelligenceFeb-20-2021

The traditional process of building interactive machine learning systems can be viewed as a teacher-learner interaction scenario where the machine-learners are trained by one or more human-teachers. In this work, we explore the idea of using a conversational interface to investigate the interaction between human-teachers and interactive machine-learners. Specifically, we examine whether teachable AI agents can reliably learn from human-teachers through conversational interactions, and how this learning compare with traditional supervised learning algorithms. Results validate the concept of teachable conversational agents and highlight the factors relevant for the development of machine learning systems that intend to learn from conversational interactions.

agent, conversational interaction, interaction, (17 more...)

2102.10387

Country:

North America > Canada > Ontario > Waterloo Region > Waterloo (0.14)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Genre: Research Report (0.83)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.74)

Spooner, Thomas, Vadori, Nelson, Ganesh, Sumitra

Causal Policy Gradients

arXiv.org Artificial IntelligenceFeb-20-2021

Policy gradient methods can solve complex tasks but often fail when the dimensionality of the action-space or objective multiplicity grow very large. This occurs, in part, because the variance on score-based gradient estimators scales quadratically with the number of targets. In this paper, we propose a causal baseline which exploits independence structure encoded in a novel action-target influence network. Causal policy gradients (CPGs), which follow, provide a common framework for analysing key state-of-the-art algorithms, are shown to generalise traditional policy gradients, and yield a principled way of incorporating prior knowledge of a problem domain's generative processes. We provide an analysis of the proposed estimator and identify the conditions under which variance is guaranteed to improve. The algorithmic aspects of CPGs are also discussed, including optimal policy factorisations, their complexity, and the use of conditioning to efficiently scale to extremely large, concurrent tasks. The performance advantages for two variants of the algorithm are demonstrated on large-scale bandit and concurrent inventory management problems.

baseline, influence network, reinforcement learning, (12 more...)

2102.10362

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Experimental Study (0.67)

Industry: Banking & Finance (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
Information Technology > Data Science (0.67)

Girgis, Roger, Golemo, Florian, Codevilla, Felipe, D'Souza, Jim Aldon, Kahou, Samira Ebrahimi, Heide, Felix, Pal, Christopher

Latent Variable Nested Set Transformers & AutoBots

Humans have the innate ability to attend to the most relevant actors in their vicinity and can forecast how they may behave in the future. This ability will be crucial for the deployment of safety-critical agents such as robots or vehicles which interact with humans. We propose a theoretical framework for this problem setting based on autoregressively modelling sequences of nested sets, using latent variables to better capture multimodal distributions over future sets of sets. We present a new model architecture which we call a Nested Set Transformer which employs multi-head self-attention blocks over sets of sets that serve as a form of social attention between the elements of the sets at every timestep. Our approach can produce a distribution over future trajectories for all agents under consideration, or focus upon the trajectory of an ego-agent. We validate the Nested Set Transformer for autonomous driving settings which we refer to as ("AutoBot"), where we model the trajectory of an ego-agent based on the sequential observations of key attributes of multiple agents in a scene. AutoBot produces results better than state-of-the-art published prior work on the challenging nuScenes vehicle trajectory modeling benchmark. We also examine the multi-agent prediction version of our model and jointly forecast an ego-agent's future trajectory along with the other agents in the scene. We validate the behavior of our proposed Nested Set Transformer for scene level forecasting with a pedestrian trajectory dataset.

autobot, latent variable nested set transformer, trajectory, (12 more...)

2104.00563

Country: North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report (1.00)

Industry:

Transportation > Ground > Road (0.49)
Automobiles & Trucks (0.48)
Information Technology > Robotics & Automation (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Green, Michael Cerny, Khalifa, Ahmed, Bontrager, Philip, Canaan, Rodrigo, Togelius, Julian

Game Mechanic Alignment Theory and Discovery

We present a new concept called Game Mechanic Alignment theory as a way to organize game mechanics through the lens of environmental rewards and intrinsic player motivations. By disentangling player and environmental influences, mechanics may be better identified for use in an automated tutorial generation system, which could tailor tutorials for a particular playstyle or player. Within, we apply this theory to several well-known games to demonstrate how designers can benefit from it, we describe a methodology for how to estimate mechanic alignment, and we apply this methodology on multiple games in the GVGAI framework. We discuss how effectively this estimation captures intrinsic/extrinsic rewards and how our theory could be used as an alternative to critical mechanic discovery methods for tutorial generation.

agent, julian togelius, mechanics, (11 more...)

2102.10247

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > Canada > Quebec > Montreal (0.05)
North America > United States > New York > Richmond County > New York City (0.05)
(9 more...)

Genre:

Research Report (0.50)
Instructional Material (0.46)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Leisure & Entertainment > Sports (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Games > Computer Games (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.46)

Analytics and Machine Learning in Vehicle Routing Research

Bai, Ruibin, Chen, Xinan, Chen, Zhi-Long, Cui, Tianxiang, Gong, Shuhui, He, Wentao, Jiang, Xiaoping, Jin, Huan, Jin, Jiahuan, Kendall, Graham, Li, Jiawei, Lu, Zheng, Ren, Jianfeng, Weng, Paul, Xue, Ning, Zhang, Huayan

The Vehicle Routing Problem (VRP) is one of the most intensively studied combinatorial optimisation problems for which numerous models and algorithms have been proposed. To tackle the complexities, uncertainties and dynamics involved in real-world VRP applications, Machine Learning (ML) methods have been used in combination with analytical approaches to enhance problem formulations and algorithmic performance across different problem solving scenarios. However, the relevant papers are scattered in several traditional research fields with very different, sometimes confusing, terminologies. This paper presents a first, comprehensive review of hybrid methods that combine analytical techniques with ML tools in addressing VRP problems. Specifically, we review the emerging research streams on ML-assisted VRP modelling and ML-assisted VRP optimisation. We conclude that ML can be beneficial in enhancing VRP modelling, and improving the performance of algorithms for both online and offline VRP optimisations. Finally, challenges and future opportunities of VRP research are discussed.

algorithm, vehicle, vehicle routing problem, (15 more...)

2102.10012

Country:

Europe > United Kingdom > England > Nottinghamshire > Nottingham (0.14)
Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Zhejiang Province > Ningbo (0.04)
(12 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.68)
Research Report > Promising Solution (0.45)

Industry: Transportation > Freight & Logistics Services (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(4 more...)

Duminy, Nicolas, Nguyen, Sao Mai, Zhu, Junshuai, Duhaut, Dominique, Kerdreux, Jerome

Intrinsically Motivated Open-Ended Multi-Task Learning Using Transfer Learning to Discover Task Hierarchy

In open-ended continuous environments, robots need to learn multiple parameterised control tasks in hierarchical reinforcement learning. We hypothesise that the most complex tasks can be learned more easily by transferring knowledge from simpler tasks, and faster by adapting the complexity of the actions to the task. We propose a task-oriented representation of complex actions, called procedures, to learn online task relationships and unbounded sequences of action primitives to control the different observables of the environment. Combining both goal-babbling with imitation learning, and active learning with transfer of knowledge based on intrinsic motivation, our algorithm self-organises its learning process. It chooses at any given time a task to focus on; and what, how, when and from whom to transfer knowledge. We show with a simulation and a real industrial robot arm, in cross-task and cross-learner transfer settings, that task composition is key to tackle highly complex tasks. Task decomposition is also efficiently transferred across different embodied learners and by active imitation, where the robot requests just a small amount of demonstrations and the adequate type of information. The robot learns and exploits task dependencies so as to learn tasks of every complexity.

intrinsically motivated open-ended multi-task learning, learner, learning, (9 more...)

doi: 10.3390/app11030975

2102.09854

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > Germany > Baden-Württemberg > Karlsruhe Region > Heidelberg (0.04)
(10 more...)

Genre: Research Report > New Finding (0.46)

Industry: Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.52)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.46)

Deep Latent Competition: Learning to Race Using Visual Control Policies in Latent Space

Schwarting, Wilko, Seyde, Tim, Gilitschenski, Igor, Liebenwein, Lucas, Sander, Ryan, Karaman, Sertac, Rus, Daniela

Learning competitive behaviors in multi-agent settings such as racing requires long-term reasoning about potential adversarial interactions. This paper presents Deep Latent Competition (DLC), a novel reinforcement learning algorithm that learns competitive visual control policies through self-play in imagination. The DLC agent imagines multi-agent interaction sequences in the compact latent space of a learned world model that combines a joint transition function with opponent viewpoint prediction. Imagined self-play reduces costly sample generation in the real world, while the latent representation enables planning to scale gracefully with observation dimensionality. We demonstrate the effectiveness of our algorithm in learning competitive behaviors on a novel multi-agent racing benchmark that requires planning from image observations. Code and videos available at https://sites.google.com/view/deep-latent-competition.

agent, learning, prediction, (16 more...)

2102.09812

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.05)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Li, Wenjie, Barik, Adarsh, Honorio, Jean

A Simple Unified Framework for High Dimensional Bandit Problems

arXiv.org Machine LearningFeb-18-2021

Stochastic high dimensional bandit problems with low dimensional structure are useful in different applications such as online advertising and drug discovery. In this work, we propose a simple unified algorithm for such problems and present a general analysis framework for the regret upper bound of our algorithm. We show that under some mild unified assumptions, our algorithm can be applied to different high dimensional bandit problems. Our framework utilizes the low dimensional structure to guide the parameter estimation in the problem, therefore our algorithm achieves the best regret bounds in the LASSO bandit, better bounds in the low-rank matrix bandit and the group sparse matrix bandit, as well as a novel bound in the multi-agent LASSO bandit.

algorithm, bandit problem, inequality, (17 more...)

arXiv.org Machine Learning

2102.09626

Country:

North America > United States > Florida > Broward County > Fort Lauderdale (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.34)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.66)