AITopics

1902.0167

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.04)
South America > Argentina (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.89)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)

Teng, Ervin, Iannucci, Bob

Learning to Learn in Simulation

arXiv.org Artificial IntelligenceFeb-5-2019

Deep learning often requires the manual collection and annotation of a training set. On robotic platforms, can we partially automate this task by training the robot to be curious, i.e., to seek out beneficial training information in the environment? In this work, we address the problem of curiosity as it relates to online, real-time, human-in-the-loop training of an object detection algorithm onboard a drone, where motion is constrained to two dimensions. We use a 3D simulation environment and deep reinforcement learning to train a curiosity agent to, in turn, train the object detection model. This agent could have one of two conflicting objectives: train as quickly as possible, or train with minimal human input. We outline a reward function that allows the curiosity agent to learn either of these objectives, while taking into account some of the physical characteristics of the drone platform on which it is meant to run. In addition, We show that we can weigh the importance of achieving these objectives by adjusting a parameter in the reward function.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

1902.01569

Country: North America > United States (0.93)

Genre: Research Report (0.82)

Industry:

Education (1.00)
Leisure & Entertainment > Games > Computer Games (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

arXiv.org Artificial IntelligenceFeb-5-2019

Learning to Schedule Communication in Multi-agent Reinforcement Learning

Kim, Daewoo, Moon, Sangwoo, Hostallero, David, Kang, Wan Ju, Lee, Taeyoung, Son, Kyunghwan, Yi, Yung

Many real-world reinforcement learning tasks require multiple agents to make sequential decisions under the agents' interaction, where well-coordinated actions among the agents are crucial to achieve the target goal better at these tasks. One way to accelerate the coordination effect is to enable multiple agents to communicate with each other in a distributed manner and behave as a group. In this paper, we study a practical scenario when (i) the communication bandwidth is limited and (ii) the agents share the communication medium so that only a restricted number of agents are able to simultaneously use the medium, as in the state-of-the-art wireless networking standards. This calls for a certain form of communication scheduling. In that regard, we propose a multi-agent deep reinforcement learning framework, called SchedNet, in which agents learn how to schedule themselves, how to encode the messages, and how to select actions based on received messages. SchedNet is capable of deciding which agents should be entitled to broadcasting their (encoded) messages, by learning the importance of each agent's partially observed information. We evaluate SchedNet against multiple baselines under two different applications, namely, cooperative communication and navigation, and predator-prey. Our experiments show a non-negligible performance gap between SchedNet and other mechanisms such as the ones without communication and with vanilla scheduling methods, e.g., round robin, ranging from 32% to 43%.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

1902.01554

Country: Asia (0.28)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.88)

Tennenholtz, Guy, Mannor, Shie

The Natural Language of Actions

arXiv.org Artificial IntelligenceFeb-4-2019

We introduce Act2Vec, a general framework for learning context-based action representation for Reinforcement Learning. Representing actions in a vector space help reinforcement learning algorithms achieve better performance by grouping similar actions and utilizing relations between different actions. We show how prior knowledge of an environment can be extracted from demonstrations and injected into action vector representations that encode natural compatible behavior. We then use these for augmenting state representations as well as improving function approximation of Q-values. We visualize and test action embeddings in three domains including a drawing task, a high dimensional navigation task, and the large action space domain of StarCraft II.

machine learning, natural language, reinforcement learning, (18 more...)

1902.01119

Country:

North America > United States > Texas > Bee County (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games > Computer Games (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Chaplot, Devendra Singh, Lee, Lisa, Salakhutdinov, Ruslan, Parikh, Devi, Batra, Dhruv

Embodied Multimodal Multitask Learning

arXiv.org Machine LearningFeb-4-2019

Recent efforts on training visual navigation agents conditioned on language using deep reinforcement learning have been successful in learning policies for different multimodal tasks, such as semantic goal navigation and embodied question answering. In this paper, we propose a multitask model capable of jointly learning these multimodal tasks, and transferring knowledge of words and their grounding in visual objects across the tasks. The proposed model uses a novel Dual-Attention unit to disentangle the knowledge of words in the textual representations and visual concepts in the visual representations, and align them with each other. This disentangled task-invariant alignment of representations facilitates grounding and knowledge transfer across both tasks. We show that the proposed model outperforms a range of baselines on both tasks in simulated 3D environments. We also show that this disentanglement of representations makes our model modular, interpretable, and allows for transfer to instructions containing new words by leveraging object detectors.

instruction, pillar, representation, (14 more...)

1902.01385

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.56)
(2 more...)

Tasfi, Norman, Capretz, Miriam

Dynamic Planning Networks

arXiv.org Machine LearningFeb-4-2019

We introduce Dynamic Planning Networks (DPN), a novel architecture for deep reinforcement learning, that combines model-based and model-free aspects for online planning. Our architecture learns to dynamically construct plans using a learned state-transition model by selecting and traversing between simulated states and actions to maximize information before acting. In contrast to model-free methods, model-based planning lets the agent efficiently test action hypotheses without performing costly trial-and-error in the environment. DPN learns to efficiently form plans by expanding a single action-conditional state transition at a time instead of exhaustively evaluating each action, reducing the required number of state-transitions during planning by up to 96%. We observe various emergent planning patterns used to solve environments, including classical search methods such as breadth-first and depth-first search. DPN shows improved data efficiency, performance, and generalization to new and unseen domains in comparison to several baselines.

agent, arxiv preprint arxiv, state-transition model, (12 more...)

1812.1124

Country:

North America > Canada > Ontario > Middlesex County > London (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.66)

arXiv.org Artificial IntelligenceFeb-4-2019

Obstacle Tower: A Generalization Challenge in Vision, Control, and Planning

Juliani, Arthur, Khalifa, Ahmed, Berges, Vincent-Pierre, Harper, Jonathan, Henry, Hunter, Crespi, Adam, Togelius, Julian, Lange, Danny

The rapid pace of research in Deep Reinforcement Learning has been driven by the presence of fast and challenging simulation environments. These environments often take the form of games; with tasks ranging from simple board games, to classic home console games, to modern strategy games. We propose a new benchmark called Obstacle Tower: a high visual fidelity, 3D, 3rd person, procedurally generated game environment. An agent in the Obstacle Tower must learn to solve both low-level control and high-level planning problems in tandem while learning from pixels and a sparse reward signal. Unlike other similar benchmarks such as the ALE, evaluation of agent performance in Obstacle Tower is based on an agent's ability to perform well on unseen instances of the environment. In this paper we outline the environment and provide a set of initial baseline results produced by current state-of-the-art Deep RL methods as well as human players. In all cases these algorithms fail to produce agents capable of performing anywhere near human level on a set of evaluations designed to test both memorization and generalization ability. As such, we believe that the Obstacle Tower has the potential to serve as a helpful Deep RL benchmark now and into the future.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

1902.01378

Country: North America > United States (0.46)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

arXiv.org Artificial IntelligenceFeb-4-2019

Intelligent Traffic Signal Control: Using Reinforcement Learning with Partial Detection

Zhang, Rusheng, Ishikawa, Akihiro, Wang, Wenli, Striner, Benjamin, Tonguz, Ozan

Intelligent Transportation Systems (ITS) have attracted the attention of researchers and the general public alike as a means to alleviate traffic congestion. Recently, the maturity of wireless technology has enabled a cost-efficient way to achieve ITS by detecting vehicles using Vehicle to Infrastructure (V2I) communications. Traditional ITS algorithms, in most cases, assume that every vehicle is observed, such as by a camera or a loop detector, but a V2I implementation would detect only those vehicles with wireless communications capability. We examine a family of transportation systems, which we will refer to as `Partially Detected Intelligent Transportation Systems'. An algorithm that can act well under a small detection rate is highly desirable due to gradual penetration rates of the underlying wireless technologies such as Dedicated Short Range Communications (DSRC) technology. Artificial Intelligence (AI) techniques for Reinforcement Learning (RL) are suitable tools for finding such an algorithm due to utilizing varied inputs and not requiring explicit analytic understanding or modeling of the underlying system dynamics. In this paper, we report a RL algorithm for partially observable ITS based on DSRC. The performance of this system is studied under different car flows, detection rates, and topologies of the road network. Our system is able to efficiently reduce the average waiting time of vehicles at an intersection, even with a low detection rate.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

1807.01628

Country:

Asia (1.00)
Europe (0.68)
North America > United States > California (0.28)

Genre: Research Report (1.00)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Machine LearningFeb-3-2019

Study of Robust Distributed Diffusion RLS Algorithms with Side Information for Adaptive Networks

Yu, Y., Zhao, H., de Lamare, R. C., Zakharov, Y., Lu, L.

This work develops robust diffusion recursive least squares algorithms to mitigate the performance degradation often experienced in networks of agents in the presence of impulsive noise. The first algorithm minimizes an exponentially weighted least-squares cost function subject to a time-dependent constraint on the squared norm of the intermediate update at each node. A recursive strategy for computing the constraint is proposed using side information from the neighboring nodes to further improve the robustness. We also analyze the mean-square convergence behavior of the proposed algorithm. The second proposed algorithm is a modification of the first one based on the dichotomous coordinate descent iterations. It has a performance similar to that of the former, however its complexity is significantly lower especially when input regressors of agents have a shift structure and it is well suited to practical implementation. Simulations show the superiority of the proposed algorithms over previously reported techniques in various impulsive noise scenarios.

algorithm, ieee transaction, signal processing, (15 more...)

1902.01005

Country:

Asia > China > Sichuan Province > Chengdu (0.04)
South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
North America > Canada > Alberta (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.40)

Industry: Telecommunications (0.46)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.41)

Raffin, Antonin, Hill, Ashley, Traoré, René, Lesort, Timothée, Díaz-Rodríguez, Natalia, Filliat, David

Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics

arXiv.org Machine LearningFeb-3-2019

Scaling end-to-end reinforcement learning to control real robots from vision presents a series of challenges, in particular in terms of sample efficiency. Against end-to-end learning, state representation learning can help learn a compact, efficient and relevant representation of states that speeds up policy learning, reducing the number of samples needed, and that is easier to interpret. We evaluate several state representation learning methods on goal based robotics tasks and propose a new unsupervised model that stacks representations and combines strengths of several of these approaches. This method encodes all the relevant features, performs on par or better than end-to-end learning, and is robust to hyper-parameters change.

decoupling feature extraction, representation, state representation, (13 more...)

1901.08651

Country:

Europe > France (0.04)
Asia > Middle East > Qatar > Ad-Dawhah > Doha (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.50)
Information Technology > Data Science > Data Mining > Feature Extraction (0.43)