lagrangian multiplier
Provably Data-driven Lagrangian Relaxation for Mixed Integer Linear Programming
Le, Tung Quoc, Nguyen, Anh Tuan, Nguyen, Viet Anh
Lagrangian Relaxation (LR) is a powerful technique for solving large-scale Mixed Integer Linear Programming (MILP), particularly those with decomposable structures, such as vehicle routing or unit commitment problems. By relaxing the coupling constraints, LR enables parallel subproblem solving and often yields tighter dual bounds than standard linear programming relaxations, which is crucial for efficient branch-and-bound pruning. While recent empirical work has shown promising results using machine learning to predict these multipliers, a theoretical understanding of such methods remains an open question. In this work, we bridge this gap by analyzing the problem of learning LR through the lens of Data-driven Algorithm Design, i.e., a statistical learning problem over a distribution of problem instances. Our contributions are as follows: first, we derive a generalization bound of $\mathcal{O}(s^{1.5}/\sqrt{N})$ for the learned multipliers, where $s$ is the number of coupling constraints and $N$ is the sample size. Second, we provide a minimax lower-bound of $ฮฉ(s/\sqrt{N})$, proving that a linear dependency is unavoidable. Third, we constructively close this theoretical gap by proving that Stochastic Gradient Ascent (SGA) with averaging achieves the minimax optimal rate $ฮ(s/\sqrt{N})$. Finally, we extend our framework to the learning-to-warm-start setting, proving that it achieves a fast, minimax-optimal rate of $ฮ(s/N)$ and establishing a theoretical advantage over direct multiplier prediction.
Distributed primal-dual algorithm for constrained multi-agent reinforcement learning under coupled policies
Dai, Pengcheng, Wang, He, Wang, Dongming, Yu, Wenwu
In this work, we investigate constrained multi-agent reinforcement learning (CMARL), where agents collaboratively maximize the sum of their local objectives while satisfying individual safety constraints. We propose a framework where agents adopt coupled policies that depend on both local states and parameters, as well as those of their $ฮบ_p$-hop neighbors, with $ฮบ_p>0$ denoting the coupling distance. A distributed primal-dual algorithm is further developed under this framework, wherein each agent has access only to state-action pairs within its $2ฮบ_p$-hop neighborhood and to reward information within its $ฮบ+ 2ฮบ_p$-hop neighborhood, with $ฮบ> 0$ representing the truncation distance. Moreover, agents are not permitted to directly share their true policy parameters or Lagrange multipliers. Instead, each agent constructs and maintains local estimates of these variables for other agents and employs such estimates to execute its policy. Additionally, these estimates are further updated and exchanged exclusively through an independent, time-varying networks, which enhances the overall system security. We establish that, with high probability, our algorithm can achieve an $ฮต$-first-order stationary convergence with an approximation error of $\mathcal{O}(ฮณ^{\frac{ฮบ+1}{ฮบ_{p}}})$ for discount factor $ฮณ\in(0,1)$. Finally, simulations in GridWorld environment are conducted to demonstrate the effectiveness of the proposed algorithm.
Supplement to ' Autoencoders that don't overfit towards the Identity '
Eq. 1 in the paper (re-stated in Eq. 2 below), and show that it is equal to the objective function in the Theorem in the paper (see Eq. 8 below) up to the factor In the following, we provide the detailed steps. We start by re-stating Eq. 1 in the paper 1 n null null nullA The details are outlined in Sections 2.2 and 2.3 below. See Eq. 1 above for the definitions of X, multiplied by the dropout-probability p, and q = 1 p. In line 6, we change the sum over the m columns back to matrix notation. Finally, in line 8, we used the substitutions from Eq. 1 as to obtain In lines 11 and 12, the squared loss is expanded into its four terms.
Dual Signal Decomposition of Stochastic Time Series
The decomposition of a stochastic time series into three component series representing a dual signal - namely, the mean and dispersion - while isolating noise is presented. The decomposition is performed by applying machine learning techniques to fit the dual signal. Machine learning minimizes the loss function which compromises between fitting the original time series and penalizing irregularities of the dual signal. The latter includes terms based on the first and second order derivatives along time. To preserve special patterns, weighting of the regularization components of the loss function has been introduced based on Statistical Process Control methodology. The proposed decomposition can be applied as a smoothing algorithm against the mean and dispersion of the time series. By isolating noise, the proposed decomposition can be seen as a denoising algorithm. Two approaches of the learning process have been considered: sequential and jointly. The former approach learns the mean signal first and then dispersion. The latter approach fits the dual signal jointly. Jointly learning can uncover complex relationships for the time series with heteroskedasticity. Learning has been set by solving the direct non-linear unconstrained optimization problem or by applying neural networks that have sequential or twin output architectures. Tuning of the loss function hyperparameters focuses on the isolated noise to be a stationary stochastic process without autocorrelation properties. Depending on the applications, the hyperparameters of the learning can be tuned towards either the discrete states by stepped signal or smoothed series. The decomposed dual signal can be represented on the 2D space and used to learn inherent structures, to forecast both mean and dispersion, or to analyze cross effects in case of multiple time series.
MEP-Net: Generating Solutions to Scientific Problems with Limited Knowledge by Maximum Entropy Principle
Yang, Wuyue, Peng, Liangrong, Li, Guojie, Hong, Liu
Maximum entropy principle (MEP) offers an effective and unbiased approach to inferring unknown probability distributions when faced with incomplete information, while neural networks provide the flexibility to learn complex distributions from data. This paper proposes a novel neural network architecture, the MEP-Net, which combines the MEP with neural networks to generate probability distributions from moment constraints. We also provide a comprehensive overview of the fundamentals of the maximum entropy principle, its mathematical formulations, and a rigorous justification for its applicability for non-equilibrium systems based on the large deviations principle. Through fruitful numerical experiments, we demonstrate that the MEP-Net can be particularly useful in modeling the evolution of probability distributions in biochemical reaction networks and in generating complex distributions from data.
Learning to Handle Complex Constraints for Vehicle Routing Problems
Bi, Jieyi, Ma, Yining, Zhou, Jianan, Song, Wen, Cao, Zhiguang, Wu, Yaoxin, Zhang, Jie
Vehicle Routing Problems (VRPs) can model many real-world scenarios and often involve complex constraints. While recent neural methods excel in constructing solutions based on feasibility masking, they struggle with handling complex constraints, especially when obtaining the masking itself is NP-hard. In this paper, we propose a novel Proactive Infeasibility Prevention (PIP) framework to advance the capabilities of neural methods towards more complex VRPs. Our PIP integrates the Lagrangian multiplier as a basis to enhance constraint awareness and introduces preventative infeasibility masking to proactively steer the solution construction process. Moreover, we present PIP-D, which employs an auxiliary decoder and two adaptive strategies to learn and predict these tailored masks, potentially enhancing performance while significantly reducing computational costs during training. To verify our PIP designs, we conduct extensive experiments on the highly challenging Traveling Salesman Problem with Time Window (TSPTW), and TSP with Draft Limit (TSPDL) variants under different constraint hardness levels. Notably, our PIP is generic to boost many neural methods, and exhibits both a significant reduction in infeasible rate and a substantial improvement in solution quality.
Best-of-Both-Worlds Policy Optimization for CMDPs with Bandit Feedback
Stradi, Francesco Emanuele, Lunghi, Anna, Castiglioni, Matteo, Marchesi, Alberto, Gatti, Nicola
We study online learning in constrained Markov decision processes (CMDPs) in which rewards and constraints may be either stochastic or adversarial. In such settings, Stradi et al.(2024) proposed the first best-of-both-worlds algorithm able to seamlessly handle stochastic and adversarial constraints, achieving optimal regret and constraint violation bounds in both cases. This algorithm suffers from two major drawbacks. First, it only works under full feedback, which severely limits its applicability in practice. Moreover, it relies on optimizing over the space of occupancy measures, which requires solving convex optimization problems, an highly inefficient task. In this paper, we provide the first best-of-both-worlds algorithm for CMDPs with bandit feedback. Specifically, when the constraints are stochastic, the algorithm achieves $\widetilde{\mathcal{O}}(\sqrt{T})$ regret and constraint violation, while, when they are adversarial, it attains $\widetilde{\mathcal{O}}(\sqrt{T})$ constraint violation and a tight fraction of the optimal reward. Moreover, our algorithm is based on a policy optimization approach, which is much more efficient than occupancy-measure-based methods.
Hierarchical Federated ADMM
Azimi-Abarghouyi, Seyed Mohammad, Bastianello, Nicola, Johansson, Karl H., Fodor, Viktoria
Abstract--In this paper, we depart from the widely-used gradient descent-based hierarchical federated learning (FL) algorithms to develop a novel hierarchical FL framework based on the alternating direction method of multipliers (ADMM). Within this framework, we propose two novel FL algorithms, which both use ADMM in the top layer: one that employs ADMM in the lower layer and another that uses the conventional gradient descentbased approach. The proposed framework enhances privacy, and experiments demonstrate the superiority of the proposed algorithms compared to the conventional algorithms in terms Figure 1: Modular schematic of FL algorithms of learning convergence and accuracy. This conventional approach reveals the entire knowledge of the model by directly transmitting its Federated learning (FL) is gaining popularity for tasks parameters. Additionally, FL accelerates method of multipliers (ADMM) [18] within a single client set the learning process by allowing parallel computations across has been proposed [19], [20].