crm
- Asia > Singapore (0.04)
- North America > United States (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.69)
Computing Optimal Nash Equilibria in Multiplayer Games
There are other approaches (e.g., [ Here, if all team members play strategies according to an NE minimizing the adversary's utility, the Eq.(1c) ensures that binary variable This space is represented by Eq.(1), which involves nonlinear terms in Eq.(1a) Section 3.4 shows that our techniques can significantly reduce the time The procedure of CRM is shown in Algorithm 2, which is illustrated in Appendix A. A collection N of subsets of players is a binary collection if: 1. { i | i N } N ; Eqs.(1b)-(1g), (3), and (4) is the space of NEs. Example 1 provides an example of N .
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Malaysia (0.04)
- Africa > Madagascar (0.04)
Computing Optimal Nash Equilibria in Multiplayer Games
There are other approaches (e.g., [ Here, if all team members play strategies according to an NE minimizing the adversary's utility, the Eq.(1c) ensures that binary variable This space is represented by Eq.(1), which involves nonlinear terms in Eq.(1a) Section 3.4 shows that our techniques can significantly reduce the time The procedure of CRM is shown in Algorithm 2, which is illustrated in Appendix A. A collection N of subsets of players is a binary collection if: 1. { i | i N } N ; Eqs.(1b)-(1g), (3), and (4) is the space of NEs. Example 1 provides an example of N .
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Malaysia (0.04)
- Africa > Madagascar (0.04)
- Asia > Singapore (0.04)
- North America > United States (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (3 more...)
Linking Process to Outcome: Conditional Reward Modeling for LLM Reasoning
Zhang, Zheng, Shan, Ziwei, Song, Kaitao, Li, Yexin, Ren, Kan
Process Reward Models (PRMs) have emerged as a promising approach to enhance the reasoning capabilities of large language models (LLMs) by guiding their step-by-step reasoning toward a final answer. However, existing PRMs either treat each reasoning step in isolation, failing to capture inter-step dependencies, or struggle to align process rewards with the final outcome. Consequently, the reward signal fails to respect temporal causality in sequential reasoning and faces ambiguous credit assignment. These limitations make downstream models vulnerable to reward hacking and lead to suboptimal performance. In this work, we propose Conditional Reward Modeling (CRM) that frames LLM reasoning as a temporal process leading to a correct answer. The reward of each reasoning step is not only conditioned on the preceding steps but also explicitly linked to the final outcome of the reasoning trajectory. Further, through this consistent probabilistic modeling, the rewards produced by CRM enable more reliable cross-sample comparison. Experiments across Best-of-N sampling, beam search and reinforcement learning demonstrate that CRM consistently outperforms existing reward models, offering a principled framework for enhancing LLM reasoning. In particular, CRM is more robust to reward hacking and delivers stable downstream improvements without relying on verifiable rewards derived from ground truth. Recent advances in enhancing reasoning abilities have significantly improved the performance of large language models (LLMs) (Snell et al., 2025; Jaech et al., 2024), where models derive final answers through explicit step-by-step reasoning.
- Europe > Austria > Vienna (0.14)
- Asia > Thailand > Bangkok > Bangkok (0.05)
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > Mexico > Mexico City > Mexico City (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
User-centric Subjective Leaderboard by Customizable Reward Modeling
Jia, Qi, Song, Xiujie, Zhang, Zicheng, Guo, Yijin, Zhang, Kaiwei, Chen, Zijian, Zhai, Guangtao
Existing benchmarks for large language models (LLMs) predominantely focus on assessing their capabilities through verifiable tasks. Such objective and static benchmarks offer limited utility for practical LLM selection, making it difficult for users to find suitable models for their individual needs. To bridge this gap, we present the first User-Centric Subjective Leaderboard (USL), which provides a preference-driven, dynamic ranking of LLMs across diverse real-world scenarios. Our work is built upon a thorough investigation of real human preference data, involving more than 10K subjective queries. Our investigation reveals significant diversity and contradictions in human preferences, which limit the effectiveness of state-of-the-art reward models. To address this, we introduce Customizable Reward Models (CRMs). With only 4B parameters, our CRM surpasses the performance of leading models such as GPT-4.1 and Gemini-2.5-pro, showing exceptional generalization capabilities across new topics and criteria. The USL, powered by CRMs, exhibits strong negative correlations to contradictory preferences.
Bidirectional Knowledge Distillation for Enhancing Sequential Recommendation with Large Language Models
Wu, Jiongran, Liu, Jiahao, Li, Dongsheng, Zhang, Guangping, Han, Mingzhe, Gu, Hansu, Zhang, Peng, Shang, Li, Lu, Tun, Gu, Ning
Large language models (LLMs) have demonstrated exceptional performance in understanding and generating semantic patterns, making them promising candidates for sequential recommendation tasks. However, when combined with conventional recommendation models (CRMs), LLMs often face challenges related to high inference costs and static knowledge transfer methods. In this paper, we propose a novel mutual distillation framework, LLMD4Rec, that fosters dynamic and bidirectional knowledge exchange between LLM-centric and CRM-based recommendation systems. Unlike traditional unidirectional distillation methods, LLMD4Rec enables iterative optimization by alternately refining both models, enhancing the semantic understanding of CRMs and enriching LLMs with collaborative signals from user-item interactions. By leveraging sample-wise adaptive weighting and aligning output distributions, our approach eliminates the need for additional parameters while ensuring effective knowledge transfer. Extensive experiments on real-world datasets demonstrate that LLMD4Rec significantly improves recommendation accuracy across multiple benchmarks without increasing inference costs. This method provides a scalable and efficient solution for combining the strengths of both LLMs and CRMs in sequential recommendation systems.
- Asia > China > Shanghai > Shanghai (0.05)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Asia > China > Jiangsu Province > Yancheng (0.04)