Energy
A Proof of Theorem 1 Proof. null null null null null null loss
We consider four types of benchmarks in our experiments, i.e., random COPs, scale-free networks, Therefore, to be fair and practical, we do not consider existing DNN-based BP variants. We compare our DABP with the following state-of-the-art COP solvers: (1) DBP with a damping factor of 0.9 and its splitting constraint factor graph version (DBP-SCFG) with a splitting ratio of 0.95 [ However, it can be extremely tedious and time-consuming to tune the damping factor. Figure 9: Convergence rates under different iteration limits ( | X | = 100) 4 size (i.e., 5) and the constraint functions are highly structured, which allows effective pruning and It also can be concluded that our DABP converges much faster than DBP and DBP-SCFG. To demonstrate the necessity of heterogeneous hyperparameters of Eq. (6), we conduct extensive Figure 1-12 present the results on solution quality. Table 1 presents the GPU memory footprint of our DABP .
Sampling by averaging: A multiscale approach to score estimation
Cordero-Encinar, Paula, Duncan, Andrew B., Reich, Sebastian, Akyildiz, O. Deniz
We introduce a novel framework for efficient sampling from complex, unnormalised target distributions by exploiting multiscale dynamics. Traditional score-based sampling methods either rely on learned approximations of the score function or involve computationally expensive nested Markov chain Monte Carlo (MCMC) loops. In contrast, the proposed approach leverages stochastic averaging within a slow-fast system of stochastic differential equations (SDEs) to estimate intermediate scores along a diffusion path without training or inner-loop MCMC. Two algorithms are developed under this framework: MultALMC, which uses multiscale annealed Langevin dynamics, and MultCDiff, based on multiscale controlled diffusions for the reverse-time Ornstein-Uhlenbeck process. Both overdamped and underdamped variants are considered, with theoretical guarantees of convergence to the desired diffusion path. The framework is extended to handle heavy-tailed target distributions using Student's t-based noise models and tailored fast-process dynamics. Empirical results across synthetic and real-world benchmarks, including multimodal and high-dimensional distributions, demonstrate that the proposed methods are competitive with existing samplers in terms of accuracy and efficiency, without the need for learned models.
Bayesian Optimization with Expected Improvement: No Regret and the Choice of Incumbent
Wang, Jingyi, Wang, Haowei, Ng, Szu Hui, Petra, Cosmin G.
Expected improvement (EI) is one of the most widely used acquisition functions in Bayesian optimization (BO). Despite its proven empirical success in applications, the cumulative regret upper bound of EI remains an open question. In this paper, we analyze the classic noisy Gaussian process expected improvement (GP-EI) algorithm. We consider the Bayesian setting, where the objective is a sample from a GP. Three commonly used incumbents, namely the best posterior mean incumbent (BPMI), the best sampled posterior mean incumbent (BSPMI), and the best observation incumbent (BOI) are considered as the choices of the current best value in GP-EI. We present for the first time the cumulative regret upper bounds of GP-EI with BPMI and BSPMI. Importantly, we show that in both cases, GP-EI is a no-regret algorithm for both squared exponential (SE) and Matérn kernels. Further, we present for the first time that GP-EI with BOI either achieves a sublinear cumulative regret upper bound or has a fast converging noisy simple regret bound for SE and Matérn kernels. Our results provide theoretical guidance to the choice of incumbent when practitioners apply GP-EI in the noisy setting. Numerical experiments are conducted to validate our findings.
Dissecting Tool-Integrated Reasoning: An Empirical Study and Analysis
Zhao, Yufeng, Liu, Junnan, Liu, Hongwei, Zhu, Dongsheng, Shen, Yuan, Zhang, Songyang, Chen, Kai
Large Language Models (LLMs) have made significant strides in reasoning tasks through methods like chain-of-thought (CoT) reasoning. However, they often fall short in tasks requiring precise computations. Tool-Integrated Reasoning (TIR) has emerged as a solution by incorporating external tools into the reasoning process. Nevertheless, the generalization of TIR in improving the reasoning ability of LLM is still unclear. Additionally, whether TIR has improved the model's reasoning behavior and helped the model think remains to be studied. We introduce ReasonZoo, a comprehensive benchmark encompassing nine diverse reasoning categories, to evaluate the effectiveness of TIR across various domains. Additionally, we propose two novel metrics, Performance-Aware Cost (PAC) and Area Under the Performance-Cost Curve (AUC-PCC), to assess reasoning efficiency. Our empirical evaluation demonstrates that TIR-enabled models consistently outperform their non-TIR counterparts in both mathematical and non-mathematical tasks. Furthermore, TIR enhances reasoning efficiency, as evidenced by improved PAC and AUC-PCC, indicating reduced overthinking and more streamlined reasoning. These findings underscore the domain-general benefits of TIR and its potential to advance LLM capabilities in complex reasoning tasks.