Deng, Yanchen
Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning
Wang, Chaojie, Deng, Yanchen, Lv, Zhiyi, Liang, Zeng, He, Jujie, Yan, Shuicheng, Bo, An
Large Language Models (LLMs) have demonstrated impressive capability in many natural language tasks. However, the auto-regressive generation process makes LLMs prone to produce errors, hallucinations and inconsistent statements when performing multi-step reasoning. In this paper, by casting multi-step reasoning of LLMs as a heuristic search problem, we aim to alleviate the pathology by introducing Q*, a general, versatile and agile framework for guiding LLMs decoding process with deliberative planning. By learning a plug-and-play Q-value model as heuristic function for estimating expected future rewards, our Q* can effectively guide LLMs to select the most promising next reasoning step without fine-tuning LLMs for the current task, which avoids the significant computational overhead and potential risk of performance degeneration on other tasks. Extensive experiments on GSM8K, MATH and MBPP demonstrate the superiority of our method, contributing to improving the reasoning performance of existing open-source LLMs.
Exploring Leximin Principle for Fair Core-Selecting Combinatorial Auctions: Payment Rule Design and Implementation
Cheng, Hao, Kong, Shufeng, Deng, Yanchen, Liu, Caihua, Wu, Xiaohu, An, Bo, Wang, Chongjun
Core-selecting combinatorial auctions (CAs) restrict the auction result in the core such that no coalitions could improve their utilities by engaging in collusion. The minimum-revenue-core (MRC) rule is a widely used core-selecting payment rule to maximize the total utilities of all bidders. However, the MRC rule can suffer from severe unfairness since it ignores individuals' utilities. To address this limitation, we propose to explore the leximin principle to achieve fairness in core-selecting CAs since the leximin principle prefers to maximize the utility of the worst-off; the resulting bidder-leximin-optimal (BLO) payment rule is then theoretically analyzed and an effective algorithm is further provided to compute the BLO outcome. Moreover, we conduct extensive experiments to show that our algorithm returns fairer utility distributions and is faster than existing algorithms of core-selecting payment rules.
Pretrained Cost Model for Distributed Constraint Optimization Problems
Deng, Yanchen, Kong, Shufeng, An, Bo
Distributed Constraint Optimization Problems (DCOPs) are an important subclass of combinatorial optimization problems, where information and controls are distributed among multiple autonomous agents. Previously, Machine Learning (ML) has been largely applied to solve combinatorial optimization problems by learning effective heuristics. However, existing ML-based heuristic methods are often not generalizable to different search algorithms. Most importantly, these methods usually require full knowledge about the problems to be solved, which are not suitable for distributed settings where centralization is not realistic due to geographical limitations or privacy concerns. To address the generality issue, we propose a novel directed acyclic graph representation schema for DCOPs and leverage the Graph Attention Networks (GATs) to embed graph representations. Our model, GAT-PCM, is then pretrained with optimally labelled data in an offline manner, so as to construct effective heuristics to boost a broad range of DCOP algorithms where evaluating the quality of a partial assignment is critical, such as local search or backtracking search. Furthermore, to enable decentralized model inference, we propose a distributed embedding schema of GAT-PCM where each agent exchanges only embedded vectors, and show its soundness and complexity. Finally, we demonstrate the effectiveness of our model by combining it with a local search or a backtracking search algorithm. Extensive empirical evaluations indicate that the GAT-PCM-boosted algorithms significantly outperform the state-of-the-art methods in various benchmarks. The pretrained model is available at https://github.com/dyc941126/GAT-PCM.
An Ant-Based Algorithm to Solve Distributed Constraint Optimization Problems
Chen, Ziyu (Chongqing University) | Wu, Tengfei (Chongqing University) | Deng, Yanchen (Chongqing University) | Zhang, Cheng (Chongqing University)
As an important population-based algorithm, ant colony optimization (ACO) has been successfully applied into various combinatorial optimization problems. However, much existing work in ACO focuses on solving centralized problems. In this paper, we present a novel algorithm that takes the power of ants to solve Distributed Constraint Optimization Problems (DCOPs), called ACO_DCOP. In ACO_DCOP, a new mechanism that captures local benefits is proposed to compute heuristic factors and a new method that considers the cost structure of DCOPs is proposed to compute pheromone deltas appropriately. Moreover, pipelining technique is introduced to make full use of the computational capacity and improve the efficiency. In our theoretical analysis, we prove that ACO_DCOP is an anytime algorithm. Our empirical evaluation indicates that ACO_DCOP is able to find solutions of equal or significantly higher quality than state-of-the-art DCOP algorithms.