Optimization
Robust Optimization with Diffusion Models for Green Security
Kong, Lingkai, Wang, Haichuan, Pan, Yuqi, Kim, Cheol Woo, Song, Mingxiao, Nguyen, Alayna, Wang, Tonghan, Xu, Haifeng, Tambe, Milind
In green security, defenders must forecast adversarial behavior, such as poaching, illegal logging, and illegal fishing, to plan effective patrols. These behavior are often highly uncertain and complex. Prior work has leveraged game theory to design robust patrol strategies to handle uncertainty, but existing adversarial behavior models primarily rely on Gaussian processes or linear models, which lack the expressiveness needed to capture intricate behavioral patterns. To address this limitation, we propose a conditional diffusion model for adversary behavior modeling, leveraging its strong distribution-fitting capabilities. To the best of our knowledge, this is the first application of diffusion models in the green security domain. Integrating diffusion models into game-theoretic optimization, however, presents new challenges, including a constrained mixed strategy space and the need to sample from an unnormalized distribution to estimate utilities. To tackle these challenges, we introduce a mixed strategy of mixed strategies and employ a twisted Sequential Monte Carlo (SMC) sampler for accurate sampling. Theoretically, our algorithm is guaranteed to converge to an epsilon equilibrium with high probability using a finite number of iterations and samples. Empirically, we evaluate our approach on both synthetic and real-world poaching datasets, demonstrating its effectiveness.
Universal Semantic Embeddings of Chemical Elements for Enhanced Materials Inference and Discovery
Jia, Yunze, Xian, Yuehui, Xu, Yangyang, Dang, Pengfei, Ding, Xiangdong, Sun, Jun, Zhou, Yumei, Xue, Dezhen
We present a framework for generating universal semantic embeddings of chemical elements to advance materials inference and discovery. This framework leverages ElementBERT, a domain - specific BERT - based natural language processing model trained on 1.29 million abstracts of alloy - related scientific papers, to capture latent knowledge and contextual relationships specific to alloys. These semantic embeddings serve as robust elemental descriptors, consistently outperforming traditional empirical descriptors with significant improvements across multiple downstream tasks . These include predicting mechanical and transformation properties, classifying phase structures, and optimizing materials properties via Bayesian optimization. Applications to titanium alloys, high - entropy alloys, and shape memory alloys demonstrate up to 23% gains in prediction accuracy. Our results show that ElementBERT surpasses general - purpose BERT variants by encoding specialized alloy knowledge. By bridging contextual insights from scientific literature with quantitative inference, our framework accelerates the discovery and optimization of advanced materials, with potential applications extending beyond alloys to other material classes.
On-the-fly Preference Alignment via Principle-Guided Decoding
Zhu, Mingye, Liu, Yi, Zhang, Lei, Guo, Junbo, Mao, Zhendong
With the rapidly expanding landscape of large language models, aligning model generations with human values and preferences is becoming increasingly important. Popular alignment methods, such as Reinforcement Learning from Human Feedback, have shown significant success in guiding models with greater control. However, these methods require considerable computational resources, which is inefficient, and substantial collection of training data to accommodate the diverse and pluralistic nature of human preferences, which is impractical. These limitations significantly constrain the scope and efficacy of both task-specific and general preference alignment methods. In this work, we introduce On-the-fly Preference Alignment via Principle-Guided Decoding (OPAD) to directly align model outputs with human preferences during inference, eliminating the need for fine-tuning. Our approach involves first curating a surrogate solution to an otherwise infeasible optimization problem and then designing a principle-guided reward function based on this surrogate. The final aligned policy is derived by maximizing this customized reward, which exploits the discrepancy between the constrained policy and its unconstrained counterpart. OPAD directly modifies the model's predictions during inference, ensuring principle adherence without incurring the computational overhead of retraining or fine-tuning. Experiments show that OPAD achieves competitive or superior performance in both general and personalized alignment tasks, demonstrating its efficiency and effectiveness compared to state-of-the-art baselines.
Explainable Distributed Constraint Optimization Problems
Rachmut, Ben, Vasileiou, Stylianos Loukas, Weinstein, Nimrod Meir, Zivan, Roie, Yeoh, William
The Distributed Constraint Optimization Problem (DCOP) formulation is a powerful tool to model cooperative multi-agent problems that need to be solved distributively. A core assumption of existing approaches is that DCOP solutions can be easily understood, accepted, and adopted, which may not hold, as evidenced by the large body of literature on Explainable AI. In this paper, we propose the Explainable DCOP (X-DCOP) model, which extends a DCOP to include its solution and a contrastive query for that solution. We formally define some key properties that contrastive explanations must satisfy for them to be considered as valid solutions to X-DCOPs as well as theoretical results on the existence of such valid explanations. To solve X-DCOPs, we propose a distributed framework as well as several optimizations and suboptimal variants to find valid explanations. We also include a human user study that showed that users, not surprisingly, prefer shorter explanations over longer ones. Our empirical evaluations showed that our approach can scale to large problems, and the different variants provide different options for trading off explanation lengths for smaller runtimes. Thus, our model and algorithmic contributions extend the state of the art by reducing the barrier for users to understand DCOP solutions, facilitating their adoption in more real-world applications.
EHOP: A Dataset of Everyday NP-Hard Optimization Problems
Duchnowski, Alex, Pavlick, Ellie, Koller, Alexander
We introduce the dataset of Everyday Hard Optimization Problems (EHOP), a collection of NP-hard optimization problems expressed in natural language. EHOP includes problem formulations that could be found in computer science textbooks, versions that are dressed up as problems that could arise in real life, and variants of well-known problems with inverted rules. We find that state-of-the-art LLMs, across multiple prompting strategies, systematically solve textbook problems more accurately than their real-life and inverted counterparts. We argue that this constitutes evidence that LLMs adapt solutions seen during training, rather than leveraging reasoning abilities that would enable them to generalize to novel problems.
Deep Learning for VWAP Execution in Crypto Markets: Beyond the Volume Curve
Volume-Weighted Average Price (VWAP) is arguably the most prevalent benchmark for trade execution as it provides an unbiased standard for comparing performance across market participants. However, achieving VWAP is inherently challenging due to its dependence on two dynamic factors, volumes and prices. Traditional approaches typically focus on forecasting the market's volume curve, an assumption that may hold true under steady conditions but becomes suboptimal in more volatile environments or markets such as cryptocurrency where prediction error margins are higher. In this study, I propose a deep learning framework that directly optimizes the VWAP execution objective by bypassing the intermediate step of volume curve prediction. Leveraging automatic differentiation and custom loss functions, my method calibrates order allocation to minimize VWAP slippage, thereby fully addressing the complexities of the execution problem. My results demonstrate that this direct optimization approach consistently achieves lower VWAP slippage compared to conventional methods, even when utilizing a naive linear model presented in arXiv:2410.21448. They validate the observation that strategies optimized for VWAP performance tend to diverge from accurate volume curve predictions and thus underscore the advantage of directly modeling the execution objective. This research contributes a more efficient and robust framework for VWAP execution in volatile markets, illustrating the potential of deep learning in complex financial systems where direct objective optimization is crucial. Although my empirical analysis focuses on cryptocurrency markets, the underlying principles of the framework are readily applicable to other asset classes such as equities.
Multi-Scale and Multi-Objective Optimization for Cross-Lingual Aspect-Based Sentiment Analysis
Wu, Chengyan, Ma, Bolei, Deng, Ningyuan, He, Yanqing, Xue, Yun
Aspect-based sentiment analysis (ABSA) is a sequence labeling task that has garnered growing research interest in multilingual contexts. However, recent studies lack more robust feature alignment and finer aspect-level alignment. In this paper, we propose a novel framework, Multi-Scale and Multi-Objective optimization (MSMO) for cross-lingual ABSA. During multi-scale alignment, we achieve cross-lingual sentence-level and aspect-level alignment, aligning features of aspect terms in different contextual environments. Specifically, we introduce code-switched bilingual sentences into the language discriminator and consistency training modules to enhance the model's robustness. During multi-objective optimization, we design two optimization objectives: supervised training and consistency training, aiming to enhance cross-lingual semantic alignment. To further improve model performance, we incorporate distilled knowledge of the target language into the model. Results show that MSMO significantly enhances cross-lingual ABSA by achieving state-of-the-art performance across multiple languages and models.
An Adaptive Data-Enabled Policy Optimization Approach for Autonomous Bicycle Control
Persson, Niklas, Zhao, Feiran, Kaheni, Mojtaba, Dรถrfler, Florian, Papadopoulos, Alessandro V.
--This paper presents a unified control framework that integrates a Feedback Linearization (FL) controller in the inner loop with an adaptive Data-Enabled Policy Optimization (DeePO) controller in the outer loop to balance an autonomous bicycle. While the FL controller stabilizes and partially linearizes the inherently unstable and nonlinear system, its performance is compromised by unmodeled dynamics and time-varying characteristics. T o overcome these limitations, the DeePO controller is introduced to enhance adaptability and robustness. The initial control policy of DeePO is obtained from a finite set of offline, persistently exciting input and state data. T o improve stability and compensate for system nonlinearities and disturbances, a robustness-promoting regularizer refines the initial policy, while the adaptive section of the DeePO framework is enhanced with a forgetting factor to improve adaptation to time-varying dynamics. The proposed DeePO+FL approach is evaluated through simulations and real-world experiments on an instrumented autonomous bicycle. Results demonstrate its superiority over the FL-only approach, achieving more precise tracking of the reference lean angle and lean rate. N autonomous bicycle is a bicycle, equipped with electric motors, sensors, algorithms, and control systems that allow the bicycle to navigate and operate without human intervention. Autonomous bicycles are an exciting area of research and development with numerous potential applications that can improve transportation, safety, and efficiency. In bicycle-sharing systems, autonomous bicycles can enhance the user experience by autonomously traveling to a person who has requested one, eliminating the need for individuals to walk toward the bicycle [1]. Additionally, autonomous bicycles can streamline fleet management by enabling bicycles to autonomously navigate to charging stations for recharging. This eliminates the need for operators to manually collect, load, and transport bicycles to charging stations, making the process more efficient. This work was supported by the Knowledge Foundation (KKS) with grant "M alardalen University Automation Research Center (MARC)", n. 20240011. Papadopoulos are with the Division of Intelligent Future Technologies, M alardalen University, 721 23 V aster as, Sweden. F. Zhao and F. D orfler are with the Department of Information Technology and Electrical Engineering, ETH Zurich, 8092 Zurich, Switzerland.
Physics-Aware Robotic Palletization with Online Masking Inference
Zhang, Tianqi, Wu, Zheng, Chen, Yuxin, Wang, Yixiao, Liang, Boyuan, Moura, Scott, Tomizuka, Masayoshi, Ding, Mingyu, Zhan, Wei
-- The efficient planning of stacking boxes, especially in the online setting where the sequence of item arrivals is unpredictable, remains a critical challenge in modern warehouse and logistics management. Existing solutions often address box size variations, but overlook their intrinsic and physical properties, such as density and rigidity, which are crucial for real-world applications. We use reinforcement learning (RL) to solve this problem by employing action space masking to direct the RL policy toward valid actions. Unlike previous methods that rely on heuristic stability assessments which are difficult to assess in physical scenarios, our framework utilizes online learning to dynamically train the action space mask, eliminating the need for manual heuristic design. Extensive experiments demonstrate that our proposed method outperforms existing state-of-the-arts. Furthermore, we deploy our learned task planner in a real-world robotic palletizer, validating its practical applicability in operational settings. I. INTRODUCTION In modern warehouse and logistics management, stacking boxes continues to be a common challenge. In the past, due to the smaller scale of trade and lower efficiency requirements, workers could rely on their experience to decide how each box should be placed. However, with the globalization of trade, there is a growing need for fast and stable box stacking, and a good solution for this is robotic palletization [1] [2].
New Lower Bounds for Stochastic Non-Convex Optimization through Divergence Composition
Saad, El Mehdi, Lee, Weicheng, Orabona, Francesco
We study fundamental limits of first-order stochastic optimization in a range of nonconvex settings, including L-smooth functions satisfying Quasar-Convexity (QC), Quadratic Growth (QG), and Restricted Secant Inequalities (RSI). While the convergence properties of standard algorithms are well-understood in deterministic regimes, significantly fewer results address the stochastic case, where only unbiased and noisy gradients are available. We establish new lower bounds on the number of noisy gradient queries to minimize these classes of functions, also showing that they are tight (up to a logarithmic factor) in all the relevant quantities characterizing each class. Our approach reformulates the optimization task as a function identification problem, leveraging divergence composition arguments to construct a challenging subclass that leads to sharp lower bounds. Furthermore, we present a specialized algorithm in the one-dimensional setting that achieves faster rates, suggesting that certain dimensional thresholds are intrinsic to the complexity of non-convex stochastic optimization.