Cooperative Multi-Agent Reinforcement Learning Based Distributed Dynamic Spectrum Access in Cognitive Radio Networks

Tan, Xiang, Zhou, Li, Wang, Haijun, Sun, Yuli, Zhao, Haitao, Seet, Boon-Chong, Wei, Jibo, Leung, Victor C. M.

Jun-17-2021–arXiv.org Artificial Intelligence

This work has been submitted to the IEEE for possible publication. Abstract With the development of the 5G and Internet of Things, amounts of wireless devices need to share the limited spectrum resources. Dynamic spectrum access (DSA) is a promising paradigm to remedy the problem of inef!cient spectrum utilization brought upon by the historical command-and-control approach to spectrum allocation. In this paper, we investigate the distributed DSA problem for multiuser in a typical multi-channel cognitive radio network. The problem is formulated as a decentralized partially observable Markov decision process (Dec-POMDP), and we proposed a centralized off-line training and distributed on-line execution framework based on cooperative multi-agent reinforcement learning (MARL). We employ the deep recurrent Q-network (DRQN) to address the partial observability of the state for each cognitive user. The ultimate goal is to learn a cooperative strategy which maximizes the sum throughput of cognitive radio network in distributed fashion without coordination information exchange between cognitive users. This work was supported in part by the National Natural Science Foundation of China under Grant 6193000305. X. Tan, L. Zhou, Y. Sun, H. Wang, H. Zhao and J. Wei are all with College of Electronic Science and Technology, National University of Defense Technology, Changsha, 410073, China (E-mail: {tanxiang, zhouli2035, haijunwang14, sunyuli19, haitaozhao, wjbhw}@nudt.edu.cn). Boon-Chong Seet is with the Department of Electrical and Electronic Engineering, Auckland University of Technology, Auckland 1142, New Zealand (E-mail: boon-chong.seet@aut.ac.nz). Victor C. M. Leung is with Shenzhen University, Shenzhen, China and the University of British Columbia, Vancouver, Canada (E-mail: vleung@ieee.org). 2 From the simulation results, we can observe that the proposed algorithm can converge fast and achieve almost the optimal performance. The future network is involving into the Internet of Everything.

algorithm, cognitive user, time slot, (16 more...)

arXiv.org Artificial Intelligence

Jun-17-2021

arXiv.org PDF

Add feedback

Country:
- Oceania > New Zealand
  - North Island > Auckland Region > Auckland (0.44)
- North America
  - United States
    - Virginia > Arlington County
      - Arlington (0.04)
    - California > San Francisco County
      - San Francisco (0.14)
  - Canada > British Columbia
    - Metro Vancouver Regional District > Vancouver (0.24)
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.04)
- Asia > China
  - Guangdong Province > Shenzhen (0.44)

Genre:
- Research Report (1.00)
- Overview (0.67)

Industry:
- Media (0.68)
- Leisure & Entertainment (0.46)
- Information Technology > Smart Houses & Appliances (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Agents (1.00)
  - Machine Learning
    - Reinforcement Learning (1.00)
    - Learning Graphical Models > Undirected Networks
      - Markov Models (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found