Research on Short-Video Platform User Decision-Making via Multimodal Temporal Modeling and Reinforcement Learning
Wang, Jinmeiyang, Dong, Jing, Zhou, Li
–arXiv.org Artificial Intelligence
This paper proposes the MT-DQN model, which integrates a Transformer, Temporal Graph Neural Network (TGNN), and Deep Q-Network (DQN) to address the challenges of predicting user behavior and optimizing recommendation strategies in short-video environments. Experiments demonstrated that MT-DQN consistently outperforms traditional concatenated models, such as Concat-Modal, achieving an average F1-score improvement of 10.97% and an average NDCG@5 improvement of 8.3%. Compared to the classic reinforcement learning model Vanilla-DQN, MT-DQN reduces MSE by 34.8% and MAE by 26.5%. Nonetheless, we also recognize challenges in deploying MT-DQN in real-world scenarios, such as its computational cost and latency sensitivity during online inference, which will be addressed through future architectural optimization.
arXiv.org Artificial Intelligence
Sep-17-2025
- Country:
- Asia > China (0.28)
- North America (0.28)
- Genre:
- Research Report > New Finding (0.66)
- Industry:
- Information Technology (1.00)
- Banking & Finance (0.67)
- Leisure & Entertainment > Games (0.67)
- Energy > Renewable (0.67)
- Health & Medicine > Therapeutic Area
- Neurology (0.67)
- Education > Educational Setting
- Online (0.46)
- Technology: