Optimization
Unrolled Graph Neural Networks for Constrained Optimization
Hadou, Samar, Ribeiro, Alejandro
In this paper, we unroll the dynamics of the dual ascent (DA) algorithm in two coupled graph neural networks (GNNs) to solve constrained optimization problems. The two networks interact with each other at the layer level to find a saddle point of the Lagrangian. The primal GNN finds a stationary point for a given dual multiplier, while the dual network iteratively refines its estimates to reach an optimal solution. We force the primal and dual networks to mirror the dynamics of the DA algorithm by imposing descent and ascent constraints. We propose a joint training scheme that alternates between updating the primal and dual networks. Our numerical experiments demonstrate that our approach yields near-optimal near-feasible solutions and generalizes well to out-of-distribution (OOD) problems.
GRPOformer: Advancing Hyperparameter Optimization via Group Relative Policy Optimization
Guo, Haoxin, Pan, Jiawen, Zhai, Weixin
Hyperparameter optimization (HPO) plays a critical role in improving model performance. Transformer-based HPO methods have shown great potential; however, existing approaches rely heavily on large-scale historical optimization trajectories and lack effective reinforcement learning (RL) techniques, thereby limiting their efficiency and performance improvements. Inspired by the success of Group Relative Policy Optimization (GRPO) in large language models (LLMs), we propose GRPOformer -- a novel hyperparameter optimization framework that integrates reinforcement learning (RL) with Transformers. In GRPOformer, Transformers are employed to generate new hyperparameter configurations from historical optimization trajectories, while GRPO enables rapid trajectory construction and optimization strategy learning from scratch. Moreover, we introduce Policy Churn Regularization (PCR) to enhance the stability of GRPO training. Experimental results on OpenML demonstrate that GRPOformer consistently outperforms baseline methods across diverse tasks, offering new insights into the application of RL for HPO.
Differential Privacy for Euclidean Jordan Algebra with Applications to Private Symmetric Cone Programming
Song, Zhao, Xue, Jianfei, Zhang, Lichen
In this paper, we study differentially private mechanisms for functions whose outputs lie in a Euclidean Jordan algebra. Euclidean Jordan algebras capture many important mathematical structures and form the foundation of linear programming, second-order cone programming, and semidefinite programming. Our main contribution is a generic Gaussian mechanism for such functions, with sensitivity measured in $\ell_2$, $\ell_1$, and $\ell_\infty$ norms. Notably, this framework includes the important case where the function outputs are symmetric matrices, and sensitivity is measured in the Frobenius, nuclear, or spectral norm. We further derive private algorithms for solving symmetric cone programs under various settings, using a combination of the multiplicative weights update method and our generic Gaussian mechanism. As an application, we present differentially private algorithms for semidefinite programming, resolving a major open question posed by [Hsu, Roth, Roughgarden, and Ullman, ICALP 2014].
The Complexity of Finding Local Optima in Contrastive Learning
Yan, Jingming, Luo, Yiyuan, Chatziafratis, Vaggos, Panageas, Ioannis, Shahkar, Parnian, Stavroulakis, Stelios
Contrastive learning is a powerful technique for discovering meaningful data representations by optimizing objectives based on $\textit{contrastive information}$, often given as a set of weighted triplets $\{(x_i, y_i^+, z_{i}^-)\}_{i = 1}^m$ indicating that an "anchor" $x_i$ is more similar to a "positive" example $y_i$ than to a "negative" example $z_i$. The goal is to find representations (e.g., embeddings in $\mathbb{R}^d$ or a tree metric) where anchors are placed closer to positive than to negative examples. While finding $\textit{global}$ optima of contrastive objectives is $\mathsf{NP}$-hard, the complexity of finding $\textit{local}$ optima -- representations that do not improve by local search algorithms such as gradient-based methods -- remains open. Our work settles the complexity of finding local optima in various contrastive learning problems by proving $\mathsf{PLS}$-hardness in discrete settings (e.g., maximize satisfied triplets) and $\mathsf{CLS}$-hardness in continuous settings (e.g., minimize Triplet Loss), where $\mathsf{PLS}$ (Polynomial Local Search) and $\mathsf{CLS}$ (Continuous Local Search) are well-studied complexity classes capturing local search dynamics in discrete and continuous optimization, respectively. Our results imply that no polynomial time algorithm (local search or otherwise) can find a local optimum for various contrastive learning problems, unless $\mathsf{PLS}\subseteq\mathsf{P}$ (or $\mathsf{CLS}\subseteq \mathsf{P}$ for continuous problems). Even in the unlikely scenario that $\mathsf{PLS}\subseteq\mathsf{P}$ (or $\mathsf{CLS}\subseteq \mathsf{P}$), our reductions imply that there exist instances where local search algorithms need exponential time to reach a local optimum, even for $d=1$ (embeddings on a line).
HyP-ASO: A Hybrid Policy-based Adaptive Search Optimization Framework for Large-Scale Integer Linear Programs
Xu, Ning, Zhang, Junkai, Wu, Yang, Ye, Huigen, Xu, Hua, Xu, Huiling, Zhang, Yifan
Directly solving large-scale Integer Linear Programs (ILPs) using traditional solvers is slow due to their NP-hard nature. While recent frameworks based on Large Neighborhood Search (LNS) can accelerate the solving process, their performance is often constrained by the difficulty in generating sufficiently effective neighborhoods. To address this challenge, we propose HyP-ASO, a hybrid policy-based adaptive search optimization framework that combines a customized formula with deep Reinforcement Learning (RL). The formula leverages feasible solutions to calculate the selection probabilities for each variable in the neighborhood generation process, and the RL policy network predicts the neighborhood size. Extensive experiments demonstrate that HyP-ASO significantly outperforms existing LNS-based approaches for large-scale ILPs. Additional experiments show it is lightweight and highly scalable, making it well-suited for solving large-scale ILPs.
Building Coding Agents via Entropy-Enhanced Multi-Turn Preference Optimization
Yu, Jiahao, Cheng, Zelei, Wu, Xian, Xing, Xinyu
Software engineering presents complex, multi-step challenges for Large Language Models (LLMs), requiring reasoning over large codebases and coordinated tool use. The difficulty of these tasks is exemplified by benchmarks like SWE-bench, where current LLMs still struggle to resolve real-world issues. A promising approach to enhance performance is test-time scaling (TTS), but its gains are heavily dependent on the diversity of model outputs. While standard alignment methods such as Direct Preference Optimization (DPO) and Kahneman-Tversky Optimization (KTO) are effective at aligning model outputs with human preferences, this process can come at the cost of reduced diversity, limiting the effectiveness of TTS. Additionally, existing preference optimization algorithms are typically designed for single-turn tasks and do not fully address the complexities of multi-turn reasoning and tool integration required for interactive coding agents. To bridge this gap, we introduce EntroPO, an entropy-enhanced framework that adapts existing preference optimization algorithms to the multi-turn, tool-assisted setting. EntroPO augments the preference objective to explicitly preserve policy entropy and generalizes learning to optimize over multi-turn interactions rather than single-turn responses. We validate EntroPO by fine-tuning a diverse suite of models from different families and sizes (up to 106B parameters). To maximize performance gains from TTS, we further propose a hybrid best-trajectory selection scheme combining a learned verifier model with model free approaches. On the \swebench leaderboard, our approach establishes new state-of-the-art results among open-weight models. A 30B parameter model trained with EntroPO ranks 1st on \lite and 4th on \verified on the open-weight leaderboard, surpassed only by models with over 10x more parameters(\eg$>$350B).
Differentially Private Synthetic Graphs Preserving Triangle-Motif Cuts
We study the problem of releasing a differentially private (DP) synthetic graph $G'$ that well approximates the triangle-motif sizes of all cuts of any given graph $G$, where a motif in general refers to a frequently occurring subgraph within complex networks. Non-private versions of such graphs have found applications in diverse fields such as graph clustering, graph sparsification, and social network analysis. Specifically, we present the first $(\varepsilon,ฮด)$-DP mechanism that, given an input graph $G$ with $n$ vertices, $m$ edges and local sensitivity of triangles $\ell_{3}(G)$, generates a synthetic graph $G'$ in polynomial time, approximating the triangle-motif sizes of all cuts $(S,V\setminus S)$ of the input graph $G$ up to an additive error of $\tilde{O}(\sqrt{m\ell_{3}(G)}n/\varepsilon^{3/2})$. Additionally, we provide a lower bound of $ฮฉ(\sqrt{mn}\ell_{3}(G)/\varepsilon)$ on the additive error for any DP algorithm that answers the triangle-motif size queries of all $(S,T)$-cut of $G$. Finally, our algorithm generalizes to weighted graphs, and our lower bound extends to any $K_h$-motif cut for any constant $h\geq 2$.
Were Residual Penalty and Neural Operators All We Needed for Solving Optimal Control Problems?
Lundqvist, Oliver G. S., Oliveira, Fabricio
Were Residual Penalty and Neural Operators All We Needed for Solving Optimal Control Problems? Abstract-- Neural networks have been used to solve optimal control problems, typically by training neural networks using a combined loss function that considers data, differential equation residuals, and objective costs. We show that including cost functions in the training process is unnecessary, advocating for a simpler architecture and streamlined approach by decoupling the optimal control problem from the training process. Thus, our work shows that a simple neural operator architecture, such as DeepONet, coupled with an unconstrained optimization routine, can solve multiple optimal control problems with a single physics-informed training phase and a subsequent optimization phase. We achieve this by adding a penalty term based on the differential equation residual to the cost function and computing gradients with respect to the control using automatic differentiation through the trained neural operator within an iterative optimization routine. Our results show acceptable accuracy for practical applications and potential computational savings for more complex and higher-dimensional problems. I. INTRODUCTION An optimal control problem is an optimization problem in which the system dynamics are described by differential equations, either ordinary differential equations (ODEs) or partial differential equations (PDEs), that explicitly depend on a control input.
Generative Diffusion Models for Resource Allocation in Wireless Networks
Uslu, Yigit Berkay, Hadou, Samar, Bidokhti, Shirin Saeedi, Ribeiro, Alejandro
Abstract--This paper proposes a supervised training algorithm for learning stochastic resource allocation policies with generative diffusion models (GDMs). We formulate the allocation problem as the maximization of an ergodic utility function subject to ergodic Quality of Service (QoS) constraints. Given samples from a stochastic expert policy that yields a near-optimal solution to the constrained optimization problem, we train a GDM policy to imitate the expert and generate new samples from the optimal distribution. We achieve near-optimal performance through the sequential execution of the generated samples. T o enable generalization to a family of network configurations, we parameterize the backward diffusion process with a graph neural network (GNN) architecture.
Accelerating Vehicle Routing via AI-Initialized Genetic Algorithms
Greenberg, Ido, Sielski, Piotr, Linsenmaier, Hugo, Gandham, Rajesh, Mannor, Shie, Fender, Alex, Chechik, Gal, Meirom, Eli
Vehicle Routing Problems (VRP) are an extension of the Traveling Salesperson Problem and are a fundamental NP - hard challenge in combinatorial optimization. Solving VRP in real - time at large scale has become critical in numerous applications, from growing markets like last - mile delivery to emerging use - cases like interactive logistics planning. In many applications, one has to repeatedly solv e VRP instances dr a wn from the same distribution, yet current state - of - the - art solvers treat each instance on its own without leveraging previous examples . We introduce a n optimization framework where a reinforcement learning agent is trained on prior instances and quickly generate s initial solutions, which are then further optimized by a genetic algorithm. This framework, Evolutionary Algorithm with Reinforcement Learning Initialization ( EARLI), consistently outperforms current state - of - the - art solvers across various time budgets . For example, EARLI handles vehicle routing with 500 locations within one second, 10x faster than current solvers for the same solution quality, enabling real - time and interactive routing at scale . EARLI can generalize to new data, as we demonstrate on real e - commerce delivery data of a previously unseen city . By combin ing reinforcement learning and genetic algorithms, o ur hybrid framework takes a step forward to closer interdisciplinary collaboration between AI and optimization communities towards real - time optimization in diverse domains .