Research on Short-Video Platform User Decision-Making via Multimodal Temporal Modeling and Reinforcement Learning

Sep-17-2025–arXiv.org Artificial Intelligence

This paper proposes the MT-DQN model, which integrates a Transformer, Temporal Graph Neural Network (TGNN), and Deep Q-Network (DQN) to address the challenges of predicting user behavior and optimizing recommendation strategies in short-video environments. Experiments demonstrated that MT-DQN consistently outperforms traditional concatenated models, such as Concat-Modal, achieving an average F1-score improvement of 10.97% and an average NDCG@5 improvement of 8.3%. Compared to the classic reinforcement learning model Vanilla-DQN, MT-DQN reduces MSE by 34.8% and MAE by 26.5%. Nonetheless, we also recognize challenges in deploying MT-DQN in real-world scenarios, such as its computational cost and latency sensitivity during online inference, which will be addressed through future architectural optimization.

large language model, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

Sep-17-2025

arXiv.org PDF

Add feedback

Country:
- Asia > China (0.28)
- North America (0.28)

Genre:
- Research Report > New Finding (0.66)

Industry:
- Information Technology (1.00)
- Banking & Finance (0.67)
- Leisure & Entertainment > Games (0.67)
- Energy > Renewable (0.67)
- Health & Medicine > Therapeutic Area
  - Neurology (0.67)
- Education > Educational Setting
  - Online (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Personal Assistant Systems (0.93)
  - Natural Language > Large Language Model (0.93)
  - Machine Learning
    - Statistical Learning (1.00)
    - Reinforcement Learning (1.00)
    - Neural Networks > Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found