LearningGroup: A Real-Time Sparse Training on FPGA via Learnable Weight Grouping for Multi-Agent Reinforcement Learning

Yang, Je, Kim, JaeUk, Kim, Joo-Young

arXiv.org Artificial Intelligence 

Abstract--Multi-agent reinforcement learning (MARL) is a powerful technology to construct interactive artificial intelligent systems in various applications such as multi-robot control and self-driving cars. Unlike supervised model or single-agent reinforcement learning, which actively exploits network pruning, it is obscure that how pruning will work in multi-agent reinforcement learning with its cooperative and interactive characteristics. MARL, which are 7.13 higher and 12.43 more energy efficient Most importantly, the accelerator shows speedup up to 12.52 for MARL requires up to 942.9 GFLOPS for effective realtime In addition, as the MARL system is I. Current CPU and GPU-based systems cannot learning, known for solving long-term decision-making problems meet the above requirements due to the lack of computing effectively. It aims to train the action policy, which is units, high power consumption or low utilization for small about how an agent should take actions based on the feedback batch sizes. Instead, FPGA is emerging as a new solution for from the given environment to maximize cumulative rewards. For example, Recently, deep reinforcement learning (DRL) that utilizes a the Xilinx U280 acceleration card provides robust computing deep neural network (DNN) as an action policy has been proposed potential through 9,024 DSPs over 41MB of on-chip BRAM [1]-[4]. Although DRL stands out in various domains while showing less power consumption than GPU. In addition, such as industrial control and robotics [5]-[7], all of them the reconfigurability of FPGA allows the optimization of are limited to a single agent. Other significant applications irregular data access and parallelism with customized compact have started to employ interaction between multiple agents, for data format, where these hardware overhead occurs in network instance, analysis of language communication and the network pruning to handle computation-bound applications. Hence, extending DRL to have In this paper, we propose a FPGA-based acceleration system many agents is critical for developing intelligent systems named LearningGroup, to yield high performance for where agents can interact with each other or even with people.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found