AITopics

2012.13026

Country:

North America > United States > Georgia > Fulton County > Atlanta (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Energy > Power Industry (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceDec-23-2020

Commission Fee is not Enough: A Hierarchical Reinforced Framework for Portfolio Management

Wang, Rundong, Wei, Hongxin, An, Bo, Feng, Zhouyan, Yao, Jun

Portfolio management via reinforcement learning is at the forefront of fintech research, which explores how to optimally reallocate a fund into different financial assets over the long term by trial-and-error. Existing methods are impractical since they usually assume each reallocation can be finished immediately and thus ignoring the price slippage as part of the trading cost. To address these issues, we propose a hierarchical reinforced stock trading system for portfolio management (HRPM). Concretely, we decompose the trading process into a hierarchy of portfolio management over trade execution and train the corresponding policies. The high-level policy gives portfolio weights at a lower frequency to maximize the long term profit and invokes the low-level policy to sell or buy the corresponding shares within a short time window at a higher frequency to minimize the trading cost. We train two levels of policies via pre-training scheme and iterative training scheme for data efficiency. Extensive experimental results in the U.S. market and the China market demonstrate that HRPM achieves significant improvement against many state-of-the-art approaches.

low-level policy, portfolio weight, trading cost, (14 more...)

2012.1262

Country:

Asia > China (0.26)
North America > United States (0.25)
Asia > Singapore (0.04)

Genre: Research Report (1.00)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceDec-23-2020

Augmenting Policy Learning with Routines Discovered from a Demonstration

Zhao, Zelin, Gan, Chuang, Wu, Jiajun, Guo, Xiaoxiao, Tenenbaum, Joshua B.

Humans can abstract prior knowledge from very little data and use it to boost skill learning. In this paper, we propose routine-augmented policy learning (RAPL), which discovers routines composed of primitive actions from a single demonstration and uses discovered routines to augment policy learning. To discover routines from the demonstration, we first abstract routine candidates by identifying grammar over the demonstrated action trajectory. Then, the best routines measured by length and frequency are selected to form a routine library. We propose to learn policy simultaneously at primitive-level and routine-level with discovered routines, leveraging the temporal structure of routines. Our approach enables imitating expert behavior at multiple temporal scales for imitation learning and promotes reinforcement learning exploration. Extensive experiments on Atari games demonstrate that RAPL improves the state-of-the-art imitation learning method SQIL and reinforcement learning method A2C. Further, we show that discovered routines can generalize to unseen levels and difficulties on the CoinRun benchmark.

agent, demonstration, learning, (13 more...)

2012.12469

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (0.65)

Industry: Leisure & Entertainment > Games > Computer Games (0.71)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

QVMix and QVMix-Max: Extending the Deep Quality-Value Family of Algorithms to Cooperative Multi-Agent Reinforcement Learning

Leroy, Pascal, Ernst, Damien, Geurts, Pierre, Louppe, Gilles, Pisane, Jonathan, Sabatelli, Matthia

This paper introduces four new algorithms that can be used for tackling multi-agent reinforcement learning (MARL) problems occurring in cooperative settings. All algorithms are based on the Deep Quality-Value (DQV) family of algorithms, a set of techniques that have proven to be successful when dealing with single-agent reinforcement learning problems (SARL). The key idea of DQV algorithms is to jointly learn an approximation of the state-value function $V$, alongside an approximation of the state-action value function $Q$. We follow this principle and generalise these algorithms by introducing two fully decentralised MARL algorithms (IQV and IQV-Max) and two algorithms that are based on the centralised training with decentralised execution training paradigm (QVMix and QVMix-Max). We compare our algorithms with state-of-the-art MARL techniques on the popular StarCraft Multi-Agent Challenge (SMAC) environment. We show competitive results when QVMix and QVMix-Max are compared to well-known MARL techniques such as QMIX and MAVEN and show that QVMix can even outperform them on some of the tested environments, being the algorithm which performs best overall. We hypothesise that this is due to the fact that QVMix suffers less from the overestimation bias of the $Q$ function.

agent, algorithm, qvmix-max, (12 more...)

2012.12062

Country:

North America > United States > California > Los Angeles County > Long Beach (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
Europe > Belgium > Wallonia > Liège Province > Liège (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Computer Games (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Distributed Q-Learning with State Tracking for Multi-agent Networked Control

Wang, Hang, Lin, Sen, Jafarkhani, Hamid, Zhang, Junshan

This paper studies distributed Q-learning for Linear Quadratic Regulator (LQR) in a multi-agent network. The existing results often assume that agents can observe the global system state, which may be infeasible in large-scale systems due to privacy concerns or communication constraints. In this work, we consider a setting with unknown system models and no centralized coordinator. We devise a state tracking (ST) based Q-learning algorithm to design optimal controllers for agents. Specifically, we assume that agents maintain local estimates of the global state based on their local information and communications with neighbors. At each step, every agent updates its local global state estimation, based on which it solves an approximate Q-factor locally through policy iteration. Assuming decaying injected excitation noise during the policy evaluation, we prove that the local estimation converges to the true global state, and establish the convergence of the proposed distributed ST-based Q-learning algorithm. The experimental studies corroborate our theoretical results by showing that our proposed method achieves comparable performance with the centralized case.

agent, controller, q-learning, (16 more...)

2012.12383

Country:

North America > United States > California > Orange County > Irvine (0.14)
North America > United States > Arizona > Maricopa County > Tempe (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Energy > Power Industry (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Núñez-Molina, Carlos, Nikolov, Vladislav, Vellido, Ignacio, Fernández-Olivares, Juan

Goal Reasoning by Selecting Subgoals with Deep Q-Learning

In this work we propose a goal reasoning method which learns to select subgoals with Deep Q-Learning in order to decrease the load of a planner when faced with scenarios with tight time restrictions, such as online execution systems. We have designed a CNN-based goal selection module and trained it on a standard video game environment, testing it on different games (planning domains) and levels (planning problems) to measure its generalization abilities. When comparing its performance with a satisfying planner, the results obtained show both approaches are able to find plans of good quality, but our method greatly decreases planning time. We conclude our approach can be successfully applied to different types of domains (games), and shows good generalization properties when evaluated on new levels (problems) of the same game (domain).

architecture, deep q-learning, subgoal, (12 more...)

2012.12335

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Ilahi, Inaam, Usama, Muhammad, Farooq, Muhammad Omer, Janjua, Muhammad Umar, Qadir, Junaid

Intelligent Resource Allocation in Dense LoRa Networks using Deep Reinforcement Learning

The anticipated increase in the count of IoT devices in the coming years motivates the development of efficient algorithms that can help in their effective management while keeping the power consumption low. In this paper, we propose LoRaDRL and provide a detailed performance evaluation. We propose a multi-channel scheme for LoRaDRL. We perform extensive experiments, and our results demonstrate that the proposed algorithm not only significantly improves long-range wide area network (LoRaWAN)'s packet delivery ratio (PDR) but is also able to support mobile end-devices (EDs) while ensuring lower power consumption. Most previous works focus on proposing different MAC protocols for improving the network capacity. We show that through the use of LoRaDRL, we can achieve the same efficiency with ALOHA while moving the complexity from EDs to the gateway thus making the EDs simpler and cheaper. Furthermore, we test the performance of LoRaDRL under large-scale frequency jamming attacks and show its adaptiveness to the changes in the environment. We show that LoRaDRL's output improves the performance of state-of-the-art techniques resulting in some cases an improvement of more than 500% in terms of PDR compared to learning-based techniques.

gateway, lora network, loradrl, (16 more...)

2012.11867

Country:

North America > Canada > Ontario > National Capital Region > Ottawa (0.04)
Asia > Pakistan > Punjab > Lahore Division > Lahore (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Kouzehgar, Maryam, Meghjani, Malika, Bouffanais, Roland

Multi-Agent Reinforcement Learning for Dynamic Ocean Monitoring by a Swarm of Buoys

arXiv.org Artificial IntelligenceDec-21-2020

Autonomous marine environmental monitoring problem traditionally encompasses an area coverage problem which can only be effectively carried out by a multi-robot system. In this paper, we focus on robotic swarms that are typically operated and controlled by means of simple swarming behaviors obtained from a subtle, yet ad hoc combination of bio-inspired strategies. We propose a novel and structured approach for area coverage using multi-agent reinforcement learning (MARL) which effectively deals with the non-stationarity of environmental features. Specifically, we propose two dynamic area coverage approaches: (1) swarm-based MARL, and (2) coverage-range-based MARL. The former is trained using the multi-agent deep deterministic policy gradient (MADDPG) approach whereas, a modified version of MADDPG is introduced for the latter with a reward function that intrinsically leads to a collective behavior. Both methods are tested and validated with different geometric shaped regions with equal surface area (square vs. rectangle) yielding acceptable area coverage, and benefiting from the structured learning in non-stationary environments. Both approaches are advantageous compared to a na\"{i}ve swarming method. However, coverage-range-based MARL outperforms the swarm-based MARL with stronger convergence features in learning criteria and higher spreading of agents for area coverage.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

doi: 10.1109/IEEECONF38699.2020.9389128

2012.11641

Country:

Asia > Singapore > Central Region > Singapore (0.04)
North America > United States (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Amiri, Sorour E., Adhikari, Bijaya, Wenskovitch, John, Rodriguez, Alexander, Dowling, Michelle, North, Chris, Prakash, B. Aditya

NetReAct: Interactive Learning for Network Summarization

arXiv.org Artificial IntelligenceDec-21-2020

Generating useful network summaries is a challenging and important problem with several applications like sensemaking, visualization, and compression. However, most of the current work in this space do not take human feedback into account while generating summaries. Consider an intelligence analysis scenario, where the analyst is exploring a similarity network between documents. The analyst can express her agreement/disagreement with the visualization of the network summary via iterative feedback, e.g. closing or moving documents ("nodes") together. How can we use this feedback to improve the network summary quality? In this paper, we present NetReAct, a novel interactive network summarization algorithm which supports the visualization of networks induced by text corpora to perform sensemaking. NetReAct incorporates human feedback with reinforcement learning to summarize and visualize document networks. Using scenarios from two datasets, we show how NetReAct is successful in generating high-quality summaries and visualizations that reveal hidden patterns better than other non-trivial baselines.

netreact, relevant document, visualization, (16 more...)

2012.11821

Country:

North America > United States > Virginia (0.04)
North America > United States > Iowa (0.04)
North America > United States > District of Columbia > Washington (0.04)
(3 more...)

Genre: Research Report (0.50)

Industry:

Law Enforcement & Public Safety (0.46)
Education > Educational Setting > Online (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.48)

Zakershahrak, Mehrdad, Ghodratnama, Samira

Are We On The Same Page? Hierarchical Explanation Generation for Planning Tasks in Human-Robot Teaming using Reinforcement Learning

arXiv.org Artificial IntelligenceDec-21-2020

Providing explanations is considered an imperative ability for an AI agent in a human-robot teaming framework. The right explanation provides the rationale behind an AI agent's decision making. However, to maintain the human teammate's cognitive demand to comprehend the provided explanations, prior works have focused on providing explanations in a specific order or intertwining the explanation generation with plan execution. These approaches, however, do not consider the degree of details they share throughout the provided explanations. In this work, we argue that the explanations, especially the complex ones, should be abstracted to be aligned with the level of details the teammate desires to maintain the cognitive load of the recipient. The challenge here is to learn a hierarchical model of explanations and details the agent requires to yield the explanations as an objective. Moreover, the agent needs to follow a high-level plan in a task domain such that the agent can transfer learned teammate preferences to a scenario where lower-level control policies are different, while the high-level plan remains the same. Results confirmed our hypothesis that the process of understanding an explanation was a dynamic hierarchical process. The human preference that reflected this aspect corresponded exactly to creating and employing abstraction for knowledge assimilation hidden deeper in our cognitive process. We showed that hierarchical explanations achieved better task performance and behavior interpretability while reduced cognitive load. These results shed light on designing explainable agents utilizing reinforcement learning and planning across various domains.

agent, explanation, robot, (12 more...)

2012.11792

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Arizona > Maricopa County > Tempe (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)