Goto

Collaborating Authors

 Huang, Libo


Online Policy Distillation with Decision-Attention

arXiv.org Artificial Intelligence

Policy Distillation (PD) has become an effective method to improve deep reinforcement learning tasks. The core idea of PD is to distill policy knowledge from a teacher agent to a student agent. However, the teacher-student framework requires a well-trained teacher model which is computationally expensive.In the light of online knowledge distillation, we study the knowledge transfer between different policies that can learn diverse knowledge from the same environment.In this work, we propose Online Policy Distillation (OPD) with Decision-Attention (DA), an online learning framework in which different policies operate in the same environment to learn different perspectives of the environment and transfer knowledge to each other to obtain better performance together. With the absence of a well-performance teacher policy, the group-derived targets play a key role in transferring group knowledge to each student policy. However, naive aggregation functions tend to cause student policies quickly homogenize. To address the challenge, we introduce the Decision-Attention module to the online policies distillation framework. The Decision-Attention module can generate a distinct set of weights for each policy to measure the importance of group members. We use the Atari platform for experiments with various reinforcement learning algorithms, including PPO and DQN. In different tasks, our method can perform better than an independent training policy on both PPO and DQN algorithms. This suggests that our OPD-DA can transfer knowledge between different policies well and help agents obtain more rewards.


A Survey on Causal Reinforcement Learning

arXiv.org Artificial Intelligence

While Reinforcement Learning (RL) achieves tremendous success in sequential decision-making problems of many domains, it still faces key challenges of data inefficiency and the lack of interpretability. Interestingly, many researchers have leveraged insights from the causality literature recently, bringing forth flourishing works to unify the merits of causality and address well the challenges from RL. As such, it is of great necessity and significance to collate these Causal Reinforcement Learning (CRL) works, offer a review of CRL methods, and investigate the potential functionality from causality toward RL. In particular, we divide existing CRL approaches into two categories according to whether their causality-based information is given in advance or not. We further analyze each category in terms of the formalization of different models, ranging from the Markov Decision Process (MDP), Partially Observed Markov Decision Process (POMDP), Multi-Arm Bandits (MAB), and Dynamic Treatment Regime (DTR). Moreover, we summarize the evaluation matrices and open sources while we discuss emerging applications, along with promising prospects for the future development of CRL.


TJ4DRadSet: A 4D Radar Dataset for Autonomous Driving

arXiv.org Artificial Intelligence

Abstract-- The next-generation high-resolution automotive radar (4D radar) can provide additional elevation measurement and denser point clouds, which has great potential for 3D sensing in autonomous driving. In this paper, we introduce a dataset named TJ4DRadSet with 4D radar points for autonomous driving research. The dataset was collected in various driving scenarios, with a total of 7757 synchronized frames in 44 consecutive sequences, which are well annotated with 3D bounding boxes and track ids. We provide a 4D radar-based 3D object detection baseline for our dataset to demonstrate the effectiveness of deep learning methods for 4D radar point clouds. Autonomous driving technology [1] has recently received much attention.


Lifelong Generative Learning via Knowledge Reconstruction

arXiv.org Artificial Intelligence

Generative models often incur the catastrophic forgetting problem when they are used to sequentially learning multiple tasks, i.e., lifelong generative learning. Although there are some endeavors to tackle this problem, they suffer from high time-consumptions or error accumulation. In this work, we develop an efficient and effective lifelong generative model based on variational autoencoder (VAE). Unlike the generative adversarial network, VAE enjoys high efficiency in the training process, providing natural benefits with few resources. We deduce a lifelong generative model by expending the intrinsic reconstruction character of VAE to the historical knowledge retention. Further, we devise a feedback strategy about the reconstructed data to alleviate the error accumulation. Experiments on the lifelong generating tasks of MNIST, FashionMNIST, and SVHN verified the efficacy of our approach, where the results were comparable to SOTA.


Fast Convolution based on Winograd Minimum Filtering: Introduction and Development

arXiv.org Artificial Intelligence

Convolutional Neural Network (CNN) has been widely used in various fields and played an important role. Convolution operators are the fundamental component of convolutional neural networks, and it is also the most time-consuming part of network training and inference. In recent years, researchers have proposed several fast convolution algorithms including FFT and Winograd. Among them, Winograd convolution significantly reduces the multiplication operations in convolution, and it also takes up less memory space than FFT convolution. Therefore, Winograd convolution has quickly become the first choice for fast convolution implementation within a few years. At present, there is no systematic summary of the convolution algorithm. This article aims to fill this gap and provide detailed references for follow-up researchers. This article summarizes the development of Winograd convolution from the three aspects of algorithm expansion, algorithm optimization, implementation, and application, and finally makes a simple outlook on the possible future directions.