Role-RL: Online Long-Context Processing with Role Reinforcement Learning for Distinct LLMs in Their Optimal Roles
He, Lewei, Shi, Tianyu, Huang, Pengran, Chen, Bingzhi, Chen, Qianglong, Pan, Jiahui
–arXiv.org Artificial Intelligence
Large language models (LLMs) with long-context processing are still challenging because of their implementation complexity, training efficiency and data sparsity. To address this issue, a new paradigm named Online Long-context Processing (OLP) is proposed when we process a document of unlimited length, which typically occurs in the information reception and organization of diverse streaming media such as automated news reporting, live e-commerce, and viral short videos. Moreover, a dilemma was often encountered when we tried to select the most suitable LLM from a large number of LLMs amidst explosive growth aiming for outstanding performance, affordable prices, and short response delays. In view of this, we also develop Role Reinforcement Learning (Role-RL) to automatically deploy different LLMs in their respective roles within the OLP pipeline according to their actual performance. Extensive experiments are conducted on our OLP-MINI dataset and it is found that OLP with Role-RL framework achieves OLP benchmark with an average recall rate of 93.2% and the LLM cost saved by 79.4%. The code and dataset are publicly available at: https://anonymous.4open.science/r/Role-RL.
arXiv.org Artificial Intelligence
Sep-26-2024
- Country:
- North America
- Canada > Ontario
- Toronto (0.14)
- United States (0.46)
- Canada > Ontario
- North America
- Genre:
- Research Report (0.50)
- Industry:
- Technology: