Decision ConvFormer: Local Filtering in MetaFormer is Sufficient for Decision Making

Kim, Jeonghye, Lee, Suyoung, Kim, Woojun, Sung, Youngchul

arXiv.org Artificial Intelligence 

The recent success of Transformer in natural language processing has sparked its use in various domains. In offline reinforcement learning (RL), Decision Transformer (DT) is emerging as a promising model based on Transformer. However, we discovered that the attention module of DT is not appropriate to capture the inherent local dependence pattern in trajectories of RL modeled as a Markov decision process. To overcome the limitations of DT, we propose a novel action sequence predictor, named Decision ConvFormer (DC), based on the architecture of MetaFormer, which is a general structure to process multiple entities in parallel and understand the interrelationship among the multiple entities. DC employs local convolution filtering as the token mixer and can effectively capture the inherent local associations of the RL dataset. In extensive experiments, DC achieved stateof-the-art performance across various standard RL benchmarks while requiring fewer resources. Furthermore, we show that DC better understands the underlying meaning in data and exhibits enhanced generalization capability. Transformer (Vaswani et al., 2017) proved successful in various domains including natural language processing (NLP) (Brown et al., 2020; Chowdhery et al., 2022), computer vision (Liu et al., 2021; Hatamizadeh et al., 2023). Transformer is a special instance of a more abstract structure referred to as MetaFormer (Yu et al., 2022), which is a general architecture that takes multiple entities in parallel, understands their interrelationship, and extracts important features for addressing specific tasks while minimizing information loss. As shown in Figure 1, a MetaFormer is composed of blocks, where each block contains normalizations, a token mixer, residual connections, and a feedforward network. Among these components, the token mixer plays a crucial role in information exchange among multiple input entities.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found