Search
SIGMA: Refining Large Language Model Reasoning via Sibling-Guided Monte Carlo Augmentation
Ren, Yanwei, Zhang, Haotian, Wu, Fuxiang, Qiu, Jiayan, Huang, Jiaxing, Yu, Baosheng, Liu, Liu
Enhancing large language models by simply scaling up datasets has begun to yield diminishing returns, shifting the spotlight to data quality. Monte Carlo Tree Search (MCTS) has emerged as a powerful technique for generating high-quality chain-of-thought data, yet conventional approaches typically retain only the top-scoring trajectory from the search tree, discarding sibling nodes that often contain valuable partial insights, recurrent error patterns, and alternative reasoning strategies. This unconditional rejection of non-optimal reasoning branches may waste vast amounts of informative data in the whole search tree. We propose SIGMA (Sibling Guided Monte Carlo Augmentation), a novel framework that reintegrates these discarded sibling nodes to refine LLM reasoning. SIGMA forges semantic links among sibling nodes along each search path and applies a two-stage refinement: a critique model identifies overlooked strengths and weaknesses across the sibling set, and a revision model conducts text-based backpropagation to refine the top-scoring trajectory in light of this comparative feedback. By recovering and amplifying the underutilized but valuable signals from non-optimal reasoning branches, SIGMA substantially improves reasoning trajectories. On the challenging MA TH benchmark, our SIGMA-tuned 7B model achieves 54.92% accuracy using only 30K samples, outperforming state-of-the-art models trained on 590K samples. This result highlights that our sibling-guided optimization not only significantly reduces data usage but also significantly boosts LLM reasoning.
An Expansion-Based Approach for Quantified Integer Programming
Hartisch, Michael, Chew, Leroy
Quantified Integer Programming (QIP) bridges multiple domains by extending Quantified Boolean Formulas (QBF) to incorporate general integer variables and linear constraints while also generalizing Integer Programming through variable quantification. As a special case of Quantified Constraint Satisfaction Problems (QCSP), QIP provides a versatile framework for addressing complex decision-making scenarios. Additionally, the inclusion of a linear objective function enables QIP to effectively model multistage robust discrete linear optimization problems, making it a powerful tool for tackling uncertainty in optimization. While two primary solution paradigms exist for QBF -- search-based and expansion-based approaches -- only search-based methods have been explored for QIP and QCSP. We introduce an expansion-based approach for QIP using Counterexample-Guided Abstraction Refinement (CEGAR), adapting techniques from QBF. We extend this methodology to tackle multistage robust discrete optimization problems with linear constraints and further embed it in an optimization framework, enhancing its applicability. Our experimental results highlight the advantages of this approach, demonstrating superior performance over existing search-based solvers for QIP in specific instances. Furthermore, the ability to model problems using linear constraints enables notable performance gains over state-of-the-art expansion-based solvers for QBF.
Non-linear Multi-objective Optimization with Probabilistic Branch and Bound
Huang, Hao, Zabinsky, Zelda B.
MOPBnB(so) evaluates a noisy function exactly once at any solution and uses neighboring solutions to estimate the objective functions, in contrast to a variant that uses multiple replications at a solution to estimate the objective functions. A finite-time performance analysis for deterministic multi-objective problems provides a bound on the probability that MOPBnB(so) captures the Pareto optimal set. Asymptotic convergence of MOPBnB(so) on stochastic problems is derived, in that the algorithm captures the Pareto optimal set and the estimations converge to the true objective function values. Numerical results reveal that the variant with multiple replications is extremely intensive in terms of computational resources compared to MOPBnB(so). In addition, numerical results show that MOPBnB(so) outperforms a genetic algorithm NSGA-II on test problems. Keywords: global optimization; multiple objectives; branch and bound; stochastic optimization; estimation 1 Introduction Multiple objectives generally exist for practical problems, and providing solutions to multi-objective problems is more challenging than for single objective problems (Miettinen, 2012).
Bandit based Dynamic Candidate Edge Selection in Solving Traveling Salesman Problems
Wang, Long, Zheng, Jiongzhi, Xiong, Zhengda, Li, ChuMin, He, Kun
Algorithms designed for routing problems typically rely on high-quality candidate edges to guide their search, aiming to reduce the search space and enhance the search efficiency. However, many existing algorithms, like the classical Lin-Kernighan-Helsgaun (LKH) algorithm for the Traveling Salesman Problem (TSP), often use predetermined candidate edges that remain static throughout local searches. This rigidity could cause the algorithm to get trapped in local optima, limiting its potential to find better solutions. To address this issue, we propose expanding the candidate sets to include other promising edges, providing them an opportunity for selection. Specifically, we incorporate multi-armed bandit models to dynamically select the most suitable candidate edges in each iteration, enabling LKH to make smarter choices and lead to improved solutions. Extensive experiments on multiple TSP benchmarks show the excellent performance of our method. Moreover, we employ this bandit-based method to LKH-3, an extension of LKH tailored for solving various TSP variant problems, and our method also significantly enhances LKH-3's performance across typical TSP variants.
Latent Guided Sampling for Combinatorial Optimization
Surendran, Sobihan, Fermanian, Adeline, Corff, Sylvain Le
Combinatorial Optimization (CO) consists of finding the best solution from a discrete set of possibilities by optimizing a given objective function subject to constraints. It has widespread applications across various domains, including vehicle routing (Veres and Moussa, 2019), production planning (Dolgui et al., 2019), and drug discovery (Liu et al., 2017). However, its NP-hard nature and the complexity of many problem variants make solving CO problems highly challenging. Traditional heuristic methods (e.g., (Kirkpatrick et al., 1983; Glover, 1989; Mladenovi c and Hansen, 1997)) rely on hand-crafted rules to guide the search, providing near-optimal solutions with significantly lower computational costs. Inspired by the success of deep learning in computer vision (Krizhevsky et al., 2012; He et al., 2016) and natural language processing (Vaswani et al., 2017; Devlin, 2018), recent years have seen a surge in learning-based Neural Combinatorial Optimization (NCO) approaches for solving CO problems, including the Travelling Salesman Problem (TSP) and the Capacitated Vehicle Routing Problem (CVRP). 1
A Generic Branch-and-Bound Algorithm for $\ell_0$-Penalized Problems with Supplementary Material
Elvira, Clรฉment, Guyard, Thรฉo, Herzet, Cรฉdric
We present a generic Branch-and-Bound procedure designed to solve L0-penalized optimization problems. Existing approaches primarily focus on quadratic losses and construct relaxations using "Big-M" constraints and/or L2-norm penalties. In contrast, our method accommodates a broader class of loss functions and allows greater flexibility in relaxation design through a general penalty term, encompassing existing techniques as special cases. We establish theoretical results ensuring that all key quantities required for the Branch-and-Bound implementation admit closed-form expressions under the general blanket assumptions considered in our work. Leveraging this framework, we introduce El0ps, an open-source Python solver with a plug-and-play workflow that enables user-defined losses and penalties in L0-penalized problems. Through extensive numerical experiments, we demonstrate that El0ps achieves state-of-the-art performance on classical instances and extends computational feasibility to previously intractable ones.
Verification-Guided Falsification for Safe RL via Explainable Abstraction and Risk-Aware Exploration
Le, Tuan, Shefin, Risal, Gupta, Debashis, Le, Thai, Alqahtani, Sarra
Ensuring the safety of reinforcement learning (RL) policies in high-stakes environments requires not only formal verification but also interpretability and targeted falsification. While model checking provides formal guarantees, its effectiveness is limited by abstraction quality and the completeness of the underlying trajectory dataset. We propose a hybrid framework that integrates (1) explainability, (2) model checking, and (3) risk-guided falsification to achieve both rigor and coverage. Our approach begins by constructing a human-interpretable abstraction of the RL policy using Comprehensible Abstract Policy Summarization (CAPS). This abstract graph, derived from offline trajectories, is both verifier-friendly, semantically meaningful, and can be used as input to Storm probabilistic model checker to verify satisfaction of temporal safety specifications. If the model checker identifies a violation, it will return an interpretable counterexample trace by which the policy fails the safety requirement. However, if no violation is detected, we cannot conclude satisfaction due to potential limitation in the abstraction and coverage of the offline dataset. In such cases, we estimate associated risk during model checking to guide a falsification strategy that prioritizes searching in high-risk states and regions underrepresented in the trajectory dataset. We further provide PAC-style guarantees on the likelihood of uncovering undetected violations. Finally, we incorporate a lightweight safety shield that switches to a fallback policy at runtime when such a risk exceeds a threshold, facilitating failure mitigation without retraining.
Sparse-vDiT: Unleashing the Power of Sparse Attention to Accelerate Video Diffusion Transformers
Chen, Pengtao, Zeng, Xianfang, Zhao, Maosen, Ye, Peng, Shen, Mingzhu, Cheng, Wei, Yu, Gang, Chen, Tao
While Diffusion Transformers (DiTs) have achieved breakthroughs in video generation, this long sequence generation task remains constrained by the quadratic complexity of attention mechanisms, resulting in significant inference latency. Through detailed analysis of attention maps in Video Diffusion Transformer (vDiT), we identify three recurring sparsity patterns: diagonal, multi-diagonal, and vertical-stripe structures. And even 3-6\% attention heads can be skipped. Crucially, these patterns exhibit strong layer-depth and head-position correlations but show limited dependence on the input content. Leveraging these findings, we propose Sparse-vDiT, a sparsity acceleration framework for vDiT comprising: 1) Pattern-optimized sparse kernels that replace dense attention with computationally efficient implementations for each identified sparsity pattern. 2) An offline sparse diffusion search algorithm that selects the optimal sparse computation strategy per layer and head via hardware-aware cost modeling. After determining the optimal configuration, we fuse heads within the same layer that share the same attention strategy, enhancing inference efficiency. Integrated into state-of-the-art vDiT models (CogVideoX1.5, HunyuanVideo, and Wan2.1), Sparse-vDiT achieves 2.09$\times$, 2.38$\times$, and 1.67$\times$ theoretical FLOP reduction, and actual inference speedups of 1.76$\times$, 1.85$\times$, and 1.58$\times$, respectively, while maintaining high visual fidelity, with PSNR values reaching 24.13, 27.09, and 22.59. Our work demonstrates that latent structural sparsity in vDiTs can be systematically exploited for long video synthesis.
Solving the Pod Repositioning Problem with Deep Reinforced Adaptive Large Neighborhood Search
The Pod Repositioning Problem (PRP) in Robotic Mobile Fulfillment Systems (RMFS) involves selecting optimal storage locations for pods returning from pick stations. This work presents an improved solution method that integrates Adaptive Large Neighborhood Search (ALNS) with Deep Reinforcement Learning (DRL). A DRL agent dynamically selects destroy and repair operators and adjusts key parameters such as destruction degree and acceptance thresholds during the search. Specialized heuristics for both operators are designed to reflect PRP-specific characteristics, including pod usage frequency and movement costs. Computational results show that this DRL-guided ALNS outperforms traditional approaches such as cheapest-place, fixed-place, binary integer programming, and static heuristics. The method demonstrates strong solution quality and illustrating the benefit of learning-driven control within combinatorial optimization for warehouse systems.
EALG: Evolutionary Adversarial Generation of Language Model-Guided Generators for Combinatorial Optimization
Duan, Ruibo, Liu, Yuxin, Dong, Xinyao, Fan, Chenglin
Generating challenging instances is crucial for the evaluation and advancement of combinatorial optimization solvers. In this work, we introduce EALG (Evolutionary Adversarial Generation of Language Model-Guided Generators), a novel framework that automates the co-evolution of optimization problem instances and their corresponding heuristic solvers using large language models (LLMs). EALG leverages a mutation-based adversarial approach that dynamically evolves instance generation procedures to create increasingly difficult problems, while simultaneously synthesizing adaptive heuristic algorithms through interactions with LLMs guided by algorithmic structure. Unlike existing approaches that focus solely on static benchmark creation or manual solver design, EALG provides a seamless pipeline from instance generation to solver synthesis. Experimental results demonstrate that EALG generates significantly harder instances than current benchmarks, and its synthesized solvers generalize effectively across a broad spectrum of combinatorial tasks. This work explores a new paradigm for combinatorial optimization that integrates instance generation with solver design, resulting in state-of-the-art performance.