AITopics

1809.02869

Country:

North America > United States (0.04)
Europe > Finland > Uusimaa > Helsinki (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.67)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

arXiv.org Artificial IntelligenceSep-8-2018

Optimal and Low-Complexity Dynamic Spectrum Access for RF-Powered Ambient Backscatter System with Online Reinforcement Learning

Van Huynh, Nguyen, Hoang, Dinh Thai, Nguyen, Diep N., Dutkiewicz, Eryk, Niyato, Dusit, Wang, Ping

Ambient backscatter has been introduced with a wide range of applications for low power wireless communications. In this article, we propose an optimal and low-complexity dynamic spectrum access framework for RFpowered ambient backscatter system. Under the dynamics of the ambient signals, we first adopt the Markov decision process (MDP) framework to obtain the optimal policy for the secondary transmitter, aiming to maximize the system throughput. However, the MDP-based optimization requires complete knowledge of environment parameters, e.g., the probability of a channel to be idle and the probability of a successful packet transmission, that may not be practical to obtain. To cope with such incomplete knowledge of the environment, we develop a low-complexity online reinforcement learning algorithm that allows the secondary transmitter to "learn" from its decisions and then attain the optimal policy. Simulation results show that the proposed learning algorithm not only efficiently deals with the dynamics of the environment, but also improves the average throughput up to 50% and reduces the blocking probability and delay up to 80% compared with conventional methods. Dynamic spectrum access (DSA) has been considered as a promising solution to improve the utilization of radio spectrum [2]. As DSA standard frameworks, the Federal Communications Commission and the European Telecommunications Standardization Institute have recently proposed Spectrum Access Systems (SAS) and Licensed Shared Access (LSA) respectively [3]. In both SAS and LSA, spectrum users are prioritized at different levels/tiers (e.g., there are three types of users with a decreasing order of priority: Incumbent Users (IUs), Priority Access Licensees (PALs), and General Authorized Access (GAAs)). Without loss of generality, in this work, we refer users with higher priority as IUs and users with lower priority as secondary users (SUs). DSA harvests under-utilized spectrum chunks by allowing an SU to dynamically access (temporarily) idle spectrum bands/whitespaces to transmit data.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

1809.02753

Country:

North America > United States (1.00)
Europe (1.00)
Asia > Middle East > UAE (0.28)

Genre: Research Report > New Finding (0.48)

Industry:

Electrical Industrial Apparatus (1.00)
Energy > Energy Storage (0.95)
Telecommunications (0.86)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

arXiv.org Machine LearningSep-7-2018

Unity: A General Platform for Intelligent Agents

Juliani, Arthur, Berges, Vincent-Pierre, Vckay, Esh, Gao, Yuan, Henry, Hunter, Mattar, Marwan, Lange, Danny

Recent advances in Deep Reinforcement Learning and Robotics have been driven by the presence of increasingly realistic and complex simulation environments. Many of the existing platforms, however, provide either unrealistic visuals, inaccurate physics, low task complexity, or a limited capacity for interaction among artificial agents. Furthermore, many platforms lack the ability to flexibly configure the simulation, hence turning the simulation environment into a black-box from the perspective of the learning system. Here we describe a new open source toolkit for creating and interacting with simulation environments using the Unity platform: Unity ML-Agents Toolkit. By taking advantage of Unity as a simulation platform, the toolkit enables the development of learning environments which are rich in sensory and physical complexity, provide compelling cognitive challenges, and support dynamic multi-agent interaction. We detail the platform design, communication protocol, set of example environments, and variety of training scenarios made possible via the toolkit.

machine learning, platform, reinforcement learning, (17 more...)

1809.02627

Country:

Europe > Sweden > Skåne County > Malmö (0.04)
North America > United States > New York (0.04)
North America > United States > New Jersey (0.04)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Information Technology (1.00)
Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Combes, Remi Tachet des, Bachman, Philip, van Seijen, Harm

Learning Invariances for Policy Generalization

arXiv.org Artificial IntelligenceSep-7-2018

The grey rectangle starts on the left of the screen and can be moved with two actions, "Right" and "Jump". The goal of this game is to reach the right of the screen while avoiding the white obstacle. There is only one specific distance (measured in number of pixels) to the obstacle where the agent has to chose the action "Jump" in order to pass over the obstacle. If jumping is chosen at any other point, the agent will inevitably crash into the obstacle. A reward of 1 is granted anytime the agent moves one pixel to the right (even in the air). The episode terminates if the agent reaches the right of the screen or touches the obstacle. We build a set of related tasks by varying two factors: the floor height and the position of the obstacle on the floor. The resulting set contains 1271 tasks. We use 6 of those for training and evaluate the generalization performance as the fraction of the remaining 1265 tasks the agent can solve.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

1809.02591

Country:

North America > Canada > Quebec > Montreal (0.05)
North America > United States > California > San Diego County > San Diego (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)

arXiv.org Artificial IntelligenceSep-7-2018

Improving On-policy Learning with Statistical Reward Accumulation

Deng, Yubin, Yu, Ke, Lin, Dahua, Tang, Xiaoou, Loy, Chen Change

Deep reinforcement learning has obtained significant breakthroughs in recent years. Most methods in deep-RL achieve good results via the maximization of the reward signal provided by the environment, typically in the form of discounted cumulative returns. Such reward signals represent the immediate feedback of a particular action performed by an agent. However, tasks with sparse reward signals are still challenging to on-policy methods. In this paper, we introduce an effective characterization of past reward statistics (which can be seen as long-term feedback signals) to supplement this immediate reward feedback. In particular, value functions are learned with multi-critics supervision, enabling complex value functions to be more easily approximated in on-policy learning, even when the reward signals are sparse. We also introduce a novel exploration mechanism called "hot-wiring" that can give a boost to seemingly trapped agents. We demonstrate the effectiveness of our advantage actor multi-critic (A2MC) method across the discrete domains in Atari games as well as continuous domains in the MuJoCo environments. A video demo is provided at https://youtu.be/zBmpf3Yz8tc.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

1809.02387

Country:

Asia > Middle East > Jordan (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games > Computer Games (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceSep-7-2018

Hierarchically Structured Reinforcement Learning for Topically Coherent Visual Story Generation

Huang, Qiuyuan, Gan, Zhe, Celikyilmaz, Asli, Wu, Dapeng, Wang, Jianfeng, He, Xiaodong

We propose a hierarchically structured reinforcement learning approach to address the challenges of planning for generating coherentmulti-sentence stories for the visual storytelling task. Within our framework, the task of generating a story given a sequence of images is divided across a two-level hierarchical decoder.The high-level decoder constructs a plan by generating a semantic concept (i.e., topic) for each image in sequence. The low-level decoder generates a sentence for each image using a semantic compositional network, which effectively grounds the sentence generation conditioned on the topic. The two decoders are jointly trained end-to-end using reinforcement learning. We evaluate our model on the visual storytelling (VIST) dataset. Empirical results from both automatic and human evaluations demonstrate that the proposed hierarchicallystructured reinforced training achieves significantly better performance compared to a strong flat deep reinforcement learning baseline.

great time, machine learning, reinforcement learning, (16 more...)

1805.08191

Country: North America (0.46)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.73)

Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning

Zahavy, Tom, Haroush, Matan, Merlis, Nadav, Mankowitz, Daniel J., Mannor, Shie

Learning how to act when there are many available actions in each state is a challenging task for Reinforcement Learning (RL) agents, especially when many of the actions are redundant or irrelevant. In such cases, it is sometimes easier to learn which actions not to take. In this work, we propose the Action-Elimination Deep Q-Network (AE-DQN) architecture that combines a Deep RL algorithm with an Action Elimination Network (AEN) that eliminates sub-optimal actions. The AEN is trained to predict invalid actions, supervised by an external elimination signal provided by the environment. Simulations demonstrate a considerable speedup and added robustness over vanilla DQN in text-based games with over a thousand discrete actions.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

1809.02121

Country:

Europe > United Kingdom > England > Greater London > London (0.14)
Asia > Middle East > Israel > Haifa District > Haifa (0.04)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment > Games (1.00)
Energy > Power Industry (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Blondé, Lionel, Kalousis, Alexandros

Sample-Efficient Imitation Learning via Generative Adversarial Nets

Recent work in imitation learning articulate their formulation around the GAIL architecture, relying on the adversarial training procedure introduced in GANs. Albeit successful at generating behaviours similar to those demonstrated to the agent, GAIL suffers from a high sample complexity in the number of interactions it has to carry out in the environment in order to achieve satisfactory performance. In this work, we dramatically shrink the amount of interactions with the environment by leveraging an off-policy actor-critic architecture. Additionally, employing deterministic policy gradients allows us to treat the learned reward as a differentiable node in the computational graph, while preserving the model-free nature of our approach. Our experiments span a variety of continuous control tasks.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

1809.02064

Country:

Europe > Switzerland (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Leibfried, Felix, Tutunov, Rasul, Vrancx, Peter, Bou-Ammar, Haitham

Model-Based Stabilisation of Deep Reinforcement Learning

Though successful in high-dimensional domains, deep reinforcement learning exhibits high sample complexity and suffers from stability issues as reported by researchers and practitioners in the field. These problems hinder the application of such algorithms in real-world and safety-critical scenarios. In this paper, we take steps towards stable and efficient reinforcement learning by following a model-based approach that is known to reduce agent-environment interactions. Namely, our method augments deep Q-networks (DQNs) with model predictions for transitions, rewards, and termination flags. Having the model at hand, we then conduct a rigorous theoretical study of our algorithm and show, for the first time, convergence to a stationary point. En route, we provide a counter-example showing that 'vanilla' DQNs can diverge confirming practitioners' and researchers' experiences. Our proof is novel in its own right and can be extended to other forms of deep reinforcement learning. In particular, we believe exploiting the relation between reinforcement (with deep function approximators) and online learning can serve as a recipe for future proofs in the domain. Finally, we validate our theoretical results in 20 games from the Atari benchmark. Our results show that following the proposed model-based learning approach not only ensures convergence but leads to a reduction in sample complexity and superior performance.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

1809.01906

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > California > Alameda County > Berkeley (0.04)

Genre: Research Report > New Finding (0.54)

Industry:

Leisure & Entertainment > Games (0.69)
Education > Educational Setting > Online (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Nabi, Razieh, Malinsky, Daniel, Shpitser, Ilya

Learning Optimal Fair Policies

We consider the problem of learning optimal policies from observational data in a way that satisfies certain fairness criteria. The issue of fairness arises where some covariates used in decision making are sensitive features, or are correlated with sensitive features. (Nabi and Shpitser 2018) formalized fairness in the context of regression problems as constraining the causal effects of sensitive features along certain disallowed causal pathways. The existence of these causal effects may be called retrospective unfairness in the sense of already being present in the data before analysis begins, and may be due to discriminatory practices or the biased way in which variables are defined or recorded. In the context of learning policies, what we call prospective bias, i.e., the inappropriate dependence of learned policies on sensitive features, is also possible. In this paper, we use methods from causal and semiparametric inference to learn optimal policies in a way that addresses both retrospective bias in the data, and prospective bias due to the policy. In addition, our methods appropriately address statistical bias due to model misspecification and confounding bias, which are important in the estimation of path-specific causal effects from observational data. We apply our methods to both synthetic data and real criminal justice data.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

1809.02244

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > New York (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Industry:

Health & Medicine (1.00)
Law > Criminal Law (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.31)