Optimization
Demand Response Optimization MILP Framework for Microgrids with DERs
Babu, K. Victor Sam Moses, Chakraborty, Pratyush, Pal, Mayukha
The integration of renewable energy sources in microgrids introduces significant operational challenges due to their intermittent nature and the mismatch between generation and demand patterns. Effective demand response (DR) strategies are crucial for maintaining system stability and economic efficiency, particularly in microgrids with high renewable penetration. This paper presents a comprehensive mixed-integer linear programming (MILP) framework for optimizing DR operations in a microgrid with solar generation and battery storage systems. The framework incorporates load classification, dynamic price thresholding, and multi-period coordination for optimal DR event scheduling. Analysis across seven distinct operational scenarios demonstrates consistent peak load reduction of 10\% while achieving energy cost savings ranging from 13.1\% to 38.0\%. The highest performance was observed in scenarios with high solar generation, where the framework achieved 38.0\% energy cost reduction through optimal coordination of renewable resources and DR actions. The results validate the framework's effectiveness in managing diverse operational challenges while maintaining system stability and economic efficiency.
Scalable Bilevel Loss Balancing for Multi-Task Learning
Xiao, Peiyao, Dong, Chaosheng, Zou, Shaofeng, Ji, Kaiyi
In recent years, Multi-Task Learning (MTL) has received increasing attention for its ability to predict multiple tasks simultaneously using a single model, thereby reducing computational overhead. This versatility has enabled a wide range of applications, including autonomous driving (Chen et al., 2018), recommendation systems (Wang et al., 2020), and natural language processing (Zhang et al., 2022). Typically, research in MTL follows two main schemes. Scalarization-based methods, such as linear scalarization, reduce MTL to a scalar optimization problem by using an averaged or weighted sum of loss functions as the objective. Due to its simplicity and scalability, it became the prominent approach in the early studies (Caruana, 1997). However, it often causes performance degradation compared with single-task learning due to the gradient conflict (Yu et al., 2020; Liu et al., 2021a). Gradient conflict arises from two main reasons: 1) gradients point in different directions and 2) gradient magnitudes vary significantly. As a result, the final update gradient may either be offset or dominated by the largest gradient (Liu et al., 2021b). To mitigate this issue, various gradient manipulation methods have been developed to find balanced and fair solutions via seeking a better conflict-aware update direction (Désidéri, 2012; Liu et al., 2021a; Ban & Ji,
Integrated Optimization and Game Theory Framework for Fair Cost Allocation in Community Microgrids
Babu, K. Victor Sam Moses, Chakraborty, Pratyush, Pal, Mayukha
Fair cost allocation in community microgrids remains a significant challenge due to the complex interactions between multiple participants with varying load profiles, distributed energy resources, and storage systems. Traditional cost allocation methods often fail to adequately address the dynamic nature of participant contributions and benefits, leading to inequitable distribution of costs and reduced participant satisfaction. This paper presents a novel framework integrating multi-objective optimization with cooperative game theory for fair and efficient microgrid operation and cost allocation. The proposed approach combines mixed-integer linear programming for optimal resource dispatch with Shapley value analysis for equitable benefit distribution, ensuring both system efficiency and participant satisfaction. The framework was validated using real-world data across six distinct operational scenarios, demonstrating significant improvements in both technical and economic performance. Results show peak demand reductions ranging from 7.8% to 62.6%, solar utilization rates reaching 114.8% through effective storage integration, and cooperative gains of up to $1,801.01 per day. The Shapley value-based allocation achieved balanced benefit-cost distributions, with net positions ranging from -16.0% to +14.2% across different load categories, ensuring sustainable participant cooperation.
Strong bounds for large-scale Minimum Sum-of-Squares Clustering
Croella, Anna Livia, Piccialli, Veronica, Sudoso, Antonio M.
Clustering is a fundamental technique in data analysis and machine learning, used to group similar data points together. Among various clustering methods, the Minimum Sum-of-Squares Clustering (MSSC) is one of the most widely used. MSSC aims to minimize the total squared Euclidean distance between data points and their corresponding cluster centroids. Due to the unsupervised nature of clustering, achieving global optimality is crucial, yet computationally challenging. The complexity of finding the global solution increases exponentially with the number of data points, making exact methods impractical for large-scale datasets. Even obtaining strong lower bounds on the optimal MSSC objective value is computationally prohibitive, making it difficult to assess the quality of heuristic solutions. We address this challenge by introducing a novel method to validate heuristic MSSC solutions through optimality gaps. Our approach employs a divide-and-conquer strategy, decomposing the problem into smaller instances that can be handled by an exact solver. The decomposition is guided by an auxiliary optimization problem, the "anticlustering problem", for which we design an efficient heuristic. Computational experiments demonstrate the effectiveness of the method for large-scale instances, achieving optimality gaps below 3% in most cases while maintaining reasonable computational times. These results highlight the practicality of our approach in assessing feasible clustering solutions for large datasets, bridging a critical gap in MSSC evaluation.
Keep your distance: learning dispersed embeddings on $\mathbb{S}_d$
Tokarchuk, Evgeniia, Bakker, Hua Chang, Niculae, Vlad
Learning well-separated features in high-dimensional spaces, such as text or image embeddings, is crucial for many machine learning applications. Achieving such separation can be effectively accomplished through the dispersion of embeddings, where unrelated vectors are pushed apart as much as possible. By constraining features to be on a hypersphere, we can connect dispersion to well-studied problems in mathematics and physics, where optimal solutions are known for limited low-dimensional cases. However, in representation learning we typically deal with a large number of features in high-dimensional space, and moreover, dispersion is usually traded off with some other task-oriented training objective, making existing theoretical and numerical solutions inapplicable. Therefore, it is common to rely on gradient-based methods to encourage dispersion, usually by minimizing some function of the pairwise distances. In this work, we first give an overview of existing methods from disconnected literature, making new connections and highlighting similarities. Next, we introduce some new angles. We propose to reinterpret pairwise dispersion using a maximum mean discrepancy (MMD) motivation. We then propose an online variant of the celebrated Lloyd's algorithm, of K-Means fame, as an effective alternative regularizer for dispersion on generic domains. Finally, we derive a novel dispersion method that directly exploits properties of the hypersphere. Our experiments show the importance of dispersion in image classification and natural language processing tasks, and how algorithms exhibit different trade-offs in different regimes.
Matrix Completion with Graph Information: A Provable Nonconvex Optimization Approach
Wang, Yao, Yang, Yiyang, Wang, Kaidong, Gao, Shanxing, Liao, Xiuwu
We consider the problem of matrix completion with graphs as side information depicting the interrelations between variables. The key challenge lies in leveraging the similarity structure of the graph to enhance matrix recovery. Existing approaches, primarily based on graph Laplacian regularization, suffer from several limitations: (1) they focus only on the similarity between neighboring variables, while overlooking long-range correlations; (2) they are highly sensitive to false edges in the graphs and (3) they lack theoretical guarantees regarding statistical and computational complexities. To address these issues, we propose in this paper a novel graph regularized matrix completion algorithm called GSGD, based on preconditioned projected gradient descent approach. We demonstrate that GSGD effectively captures the higher-order correlation information behind the graphs, and achieves superior robustness and stability against the false edges. Theoretically, we prove that GSGD achieves linear convergence to the global optimum with near-optimal sample complexity, providing the first theoretical guarantees for both recovery accuracy and efficacy in the perspective of nonconvex optimization. Our numerical experiments on both synthetic and real-world data further validate that GSGD achieves superior recovery accuracy and scalability compared with several popular alternatives.
Improving Existing Optimization Algorithms with LLMs
Sartori, Camilo Chacón, Blum, Christian
The integration of Large Language Models (LLMs) into optimization has created a powerful synergy, opening exciting research opportunities. This paper investigates how LLMs can enhance existing optimization algorithms. Using their pre-trained knowledge, we demonstrate their ability to propose innovative heuristic variations and implementation strategies. To evaluate this, we applied a non-trivial optimization algorithm, Construct, Merge, Solve and Adapt (CMSA) -- a hybrid metaheuristic for combinatorial optimization problems that incorporates a heuristic in the solution construction phase. Our results show that an alternative heuristic proposed by GPT-4o outperforms the expert-designed heuristic of CMSA, with the performance gap widening on larger and denser graphs. Project URL: https://imp-opt-algo-llms.surge.sh/
Joint Transmit and Pinching Beamforming for PASS: Optimization-Based or Learning-Based?
Xu, Xiaoxia, Mu, Xidong, Liu, Yuanwei, Nallanathan, Arumugam
A novel pinching antenna system (PASS)-enabled downlink multi-user multiple-input single-output (MISO) framework is proposed. PASS consists of multiple waveguides spanning over thousands of wavelength, which equip numerous low-cost dielectric particles, named pinching antennas (PAs), to radiate signals into free space. The positions of PAs can be reconfigured to change both the large-scale path losses and phases of signals, thus facilitating the novel pinching beamforming design. A sum rate maximization problem is formulated, which jointly optimizes the transmit and pinching beamforming to adaptively achieve constructive signal enhancement and destructive interference mitigation. To solve this highly coupled and nonconvex problem, both optimization-based and learning-based methods are proposed. 1) For the optimization-based method, a majorization-minimization and penalty dual decomposition (MM-PDD) algorithm is developed, which handles the nonconvex complex exponential component using a Lipschitz surrogate function and then invokes PDD for problem decoupling. 2) For the learning-based method, a novel Karush-Kuhn-Tucker (KKT)-guided dual learning (KDL) approach is proposed, which enables KKT solutions to be reconstructed in a data-driven manner by learning dual variables. Following this idea, a KDL-Tranformer algorithm is developed, which captures both inter-PA/inter-user dependencies and channel-state-information (CSI)-beamforming dependencies by attention mechanisms. Simulation results demonstrate that: i) The proposed PASS framework significantly outperforms conventional massive multiple input multiple output (MIMO) system even with a few PAs. ii) The proposed KDL-Transformer can improve over 30% system performance than MM-PDD algorithm, while achieving a millisecond-level response on modern GPUs.
A Survey on Video Analytics in Cloud-Edge-Terminal Collaborative Systems
Gong, Linxiao, Yang, Hao, Fang, Gaoyun, Ju, Bobo, Guo, Juncen, Zhu, Xiaoguang, Wang, Yan, Hu, Xiping, Sun, Peng, Boukerche, Azzedine
The explosive growth of video data has driven the development of distributed video analytics in cloud-edge-terminal collaborative (CETC) systems, enabling efficient video processing, real-time inference, and privacy-preserving analysis. Among multiple advantages, CETC systems can distribute video processing tasks and enable adaptive analytics across cloud, edge, and terminal devices, leading to breakthroughs in video surveillance, autonomous driving, and smart cities. In this survey, we first analyze fundamental architectural components, including hierarchical, distributed, and hybrid frameworks, alongside edge computing platforms and resource management mechanisms. Building upon these foundations, edge-centric approaches emphasize on-device processing, edge-assisted offloading, and edge intelligence, while cloud-centric methods leverage powerful computational capabilities for complex video understanding and model training. Our investigation also covers hybrid video analytics incorporating adaptive task offloading and resource-aware scheduling techniques that optimize performance across the entire system. Beyond conventional approaches, recent advances in large language models and multimodal integration reveal both opportunities and challenges in platform scalability, data protection, and system reliability. Future directions also encompass explainable systems, efficient processing mechanisms, and advanced video analytics, offering valuable insights for researchers and practitioners in this dynamic field.
Learning Theory for Kernel Bilevel Optimization
Khoury, Fares El, Pauwels, Edouard, Vaiter, Samuel, Arbel, Michael
Bilevel optimization has emerged as a technique for addressing a wide range of machine learning problems that involve an outer objective implicitly determined by the minimizer of an inner problem. In this paper, we investigate the generalization properties for kernel bilevel optimization problems where the inner objective is optimized over a Reproducing Kernel Hilbert Space. This setting enables rich function approximation while providing a foundation for rigorous theoretical analysis. In this context, we establish novel generalization error bounds for the bilevel problem under finite-sample approximation. Our approach adopts a functional perspective, inspired by (Petrulionyte et al., 2024), and leverages tools from empirical process theory and maximal inequalities for degenerate $U$-processes to derive uniform error bounds. These generalization error estimates allow to characterize the statistical accuracy of gradient-based methods applied to the empirical discretization of the bilevel problem.