Optimization
Automatic Generation of Aerobatic Flight in Complex Environments via Diffusion Models
Zhong, Yuhang, Zhao, Anke, Wu, Tianyue, Zhang, Tingrui, Gao, Fei
Performing striking aerobatic flight in complex environments demands manual designs of key maneuvers in advance, which is intricate and time-consuming as the horizon of the trajectory performed becomes long. This paper presents a novel framework that leverages diffusion models to automate and scale up aerobatic trajectory generation. Our key innovation is the decomposition of complex maneuvers into aerobatic primitives, which are short frame sequences that act as building blocks, featuring critical aerobatic behaviors for tractable trajectory synthesis. The model learns aerobatic primitives using historical trajectory observations as dynamic priors to ensure motion continuity, with additional conditional inputs (target waypoints and optional action constraints) integrated to enable user-editable trajectory generation. During model inference, classifier guidance is incorporated with batch sampling to achieve obstacle avoidance. Additionally, the generated outcomes are refined through post-processing with spatial-temporal trajectory optimization to ensure dynamical feasibility. Extensive simulations and real-world experiments have validated the key component designs of our method, demonstrating its feasibility for deploying on real drones to achieve long-horizon aerobatic flight.
Latent Bayesian Optimization via Autoregressive Normalizing Flows
Lee, Seunghun, Park, Jinyoung, Chu, Jaewon, Yoon, Minseo, Kim, Hyunwoo J.
Bayesian Optimization (BO) has been recognized for its effectiveness in optimizing expensive and complex objective functions. Recent advancements in Latent Bayesian Optimization (LBO) have shown promise by integrating generative models such as variational autoencoders (V AEs) to manage the complexity of high-dimensional and structured data spaces. However, existing LBO approaches often suffer from the value discrepancy problem, which arises from the reconstruction gap between input and latent spaces. To address this issue, we propose a Normalizing Flow-based Bayesian Optimization (NF-BO), which utilizes normalizing flow as a generative model to establish one-to-one encoding function from the input space to the latent space, along with its left-inverse decoding function, eliminating the reconstruction gap. Specifically, we introduce SeqFlow, an autoregressive normalizing flow for sequence data. In addition, we develop a new candidate sampling strategy that dynamically adjusts the exploration probability for each token based on its importance. Through extensive experiments, our NF-BO method demonstrates superior performance in molecule generation tasks, significantly outperforming both traditional and recent LBO approaches. Bayesian optimization (BO) (Kushner, 1962; 1964) has been broadly applied across various areas such as chemical design (Wang & Dowling, 2022), material science (Ament et al., 2021), and hy-perparameter optimization (Wu et al., 2019). BO aims to probabilistically optimize an expensive and black-box objective function using a surrogate model to find an optimal solution with minimal cost. Although BO is effective in continuous spaces, its application to a discrete input space still remains challenging (Oh et al., 2019; Deshwal & Doppa, 2021). Latent Bayesian Optimization (LBO) (G omez-Bombarelli et al., 2018; Tripp et al., 2020) addresses this challenge by performing BO in a lower-dimensional latent space learned by a generative model such as V ariational AutoEncoders (V AEs) (Kingma & Welling, 2014). LBO performs optimization in a continuous space by mapping the discrete input into a continuous latent space with the V AEs (Kusner et al., 2017; Jin et al., 2018; Samanta et al., 2019). However, the reconstruction of V AE is not always perfect, leading to value discrepancy problem, which indicates that given a sample encoded as an embedding in the latent space, its decoding may not result in the same sample in the input space.
Mathematical Programming Models for Exact and Interpretable Formulation of Neural Networks
Ataei, Masoud, Hasaj, Edrin, Gipp, Jacob, Forouzi, Sepideh
This paper presents a unified mixed-integer programming framework for training sparse and interpretable neural networks. We develop exact formulations for both fully connected and convolutional architectures by modeling nonlinearities such as ReLU activations through binary variables and encoding structural sparsity via filter- and layer-level pruning constraints. The resulting models integrate parameter learning, architecture selection, and structural regularization within a single optimization problem, yielding globally optimal solutions with respect to a composite objective that balances prediction accuracy, weight sparsity, and architectural compactness. The mixed-integer programming formulation accommodates piecewise-linear operations, including max pooling and activation gating, and permits precise enforcement of logic-based or domain-specific constraints. By incorporating considerations of interpretability, sparsity, and verifiability directly into the training process, the proposed framework bridges a range of research areas including explainable artificial intelligence, symbolic reasoning, and formal verification.
Causal-Copilot: An Autonomous Causal Analysis Agent
Wang, Xinyue, Zhou, Kun, Wu, Wenyi, Singh, Har Simrat, Nan, Fang, Jin, Songyao, Philip, Aryan, Patnaik, Saloni, Zhu, Hou, Singh, Shivam, Prashant, Parjanya, Shen, Qian, Huang, Biwei
Causal analysis plays a foundational role in scientific discovery and reliable decision-making, yet it remains largely inaccessible to domain experts due to its conceptual and algorithmic complexity. This disconnect between causal methodology and practical usability presents a dual challenge: domain experts are unable to leverage recent advances in causal learning, while causal researchers lack broad, real-world deployment to test and refine their methods. To address this, we introduce Causal-Copilot, an autonomous agent that operationalizes expert-level causal analysis within a large language model framework. Causal-Copilot automates the full pipeline of causal analysis for both tabular and time-series data -- including causal discovery, causal inference, algorithm selection, hyperparameter optimization, result interpretation, and generation of actionable insights. It supports interactive refinement through natural language, lowering the barrier for non-specialists while preserving methodological rigor. By integrating over 20 state-of-the-art causal analysis techniques, our system fosters a virtuous cycle -- expanding access to advanced causal methods for domain experts while generating rich, real-world applications that inform and advance causal theory. Empirical evaluations demonstrate that Causal-Copilot achieves superior performance compared to existing baselines, offering a reliable, scalable, and extensible solution that bridges the gap between theoretical sophistication and real-world applicability in causal analysis. A live interactive demo of Causal-Copilot is available at https://causalcopilot.com/.
Traffic Adaptive Moving-window Service Patrolling for Real-time Incident Management during High-impact Events
Lei, Haozhe, Yang, Ya-Ting, Li, Tao, Bian, Zilin, Zuo, Fan, Rangan, Sundeep, Ozbay, Kaan
Lei et al.- Traffic Adaptive Moving-window Patrolling Algorithm 1 Traffic Adaptive Moving-window Service Patrolling for Real-time Incident Management during High-impact Events Haozhe Lei a, Y a-Ting Y ang a, Tao Li a, Zilin Bian b,, Fan Zuo b, Sundeep Rangan a, and Kaan Ozbay b a Department of Electrical and Computer Engineering, New Y ork University, United States of America b Department of Civil and Urban Engineering, New Y ork University, United States of AmericaKeywords: High-impact event management, service patrol, dynamic programming, adaptive graph This paper presents the Traffic Adaptive Moving-window Patrolling Algorithm (T AMP A), designed to improve real-time incident management during major events like sports tournaments and concerts. Such events significantly stress transportation networks, requiring efficient and adaptive patrol solutions. Using dynamic programming, the algorithm continuously adjusts patrol strategies within short planning windows, effectively balancing immediate response and efficient routing. Theoretical analyses ensure performance remains closely aligned with optimal solutions. Simulation results from an urban traffic network demonstrate T AMP A's superior performance, showing improvements of approximately 87.5% over stationary methods and 114.2% over random strategies. Future work includes enhancing adaptability and incorporating digital twin technology for improved predictive accuracy, particularly relevant for events like the 2026 FIFA World Cup at MetLife Stadium.1 Introduction 1.1 Motivation Organizing high-impact events, such as sports tournaments, festivals, and concerts, presents substantial social, economic, and transportation challenges. These events can place immense pressure on transportation infrastructure, security protocols, and public services, particularly in regions that are already congested and economically vital, such as the New Y ork-New Jersey (NYNJ) or Los Angeles (LA) metropolitan* Corresponding author. Both regions attract large, diverse crowds as tourists from across state lines and around the world, further complicating special event management logistics. A prime example of this challenge is the hosting of mega-events, such as the FIFA World Cup Ardemagni (2022) and the Olympics (Government, 2022; Harrison, 2021).
Wasserstein Distributionally Robust Regret Optimization
Fiechtner, Lukas-Benedikt, Blanchet, Jose
Distributionally Robust Optimization (DRO) is a popular framework for decision-making under uncertainty, but its adversarial nature can lead to overly conservative solutions. To address this, we study ex-ante Distributionally Robust Regret Optimization (DRRO), focusing on Wasserstein-based ambiguity sets which are popular due to their links to regularization and machine learning. We provide a systematic analysis of Wasserstein DRRO, paralleling known results for Wasserstein DRO. Under smoothness and regularity conditions, we show that Wasserstein DRRO coincides with Empirical Risk Minimization (ERM) up to first-order terms, and exactly so in convex quadratic settings. We revisit the Wasserstein DRRO newsvendor problem, where the loss is the maximum of two linear functions of demand and decision. Extending [25], we show that the regret can be computed by maximizing two one-dimensional concave functions. For more general loss functions involving the maximum of multiple linear terms in multivariate random variables and decision vectors, we prove that computing the regret and thus also the DRRO policy is NP-hard. We then propose a convex relaxation for these more general Wasserstein DRRO problems and demonstrate its strong empirical performance. Finally, we provide an upper bound on the optimality gap of our relaxation and show it improves over recent alternatives.
Single-loop Algorithms for Stochastic Non-convex Optimization with Weakly-Convex Constraints
Yang, Ming, Li, Gang, Hu, Quanqi, Lin, Qihang, Yang, Tianbao
Constrained optimization with multiple functional inequality constraints has significant applications in machine learning. This paper examines a crucial subset of such problems where both the objective and constraint functions are weakly convex. Existing methods often face limitations, including slow convergence rates or reliance on double-loop algorithmic designs. To overcome these challenges, we introduce a novel single-loop penalty-based stochastic algorithm. Following the classical exact penalty method, our approach employs a {\bf hinge-based penalty}, which permits the use of a constant penalty parameter, enabling us to achieve a {\bf state-of-the-art complexity} for finding an approximate Karush-Kuhn-Tucker (KKT) solution. We further extend our algorithm to address finite-sum coupled compositional objectives, which are prevalent in artificial intelligence applications, establishing improved complexity over existing approaches. Finally, we validate our method through experiments on fair learning with receiver operating characteristic (ROC) fairness constraints and continual learning with non-forgetting constraints.
Some Optimizers are More Equal: Understanding the Role of Optimizers in Group Fairness
Kolahdouzi, Mojtaba, Gunes, Hatice, Etemad, Ali
We study whether and how the choice of optimization algorithm can impact group fairness in deep neural networks. Through stochastic differential equation analysis of optimization dynamics in an analytically tractable setup, we demonstrate that the choice of optimization algorithm indeed influences fairness outcomes, particularly under severe imbalance. Furthermore, we show that when comparing two categories of optimizers, adaptive methods and stochastic methods, RMSProp (from the adaptive category) has a higher likelihood of converging to fairer minima than SGD (from the stochastic category). Building on this insight, we derive two new theoretical guarantees showing that, under appropriate conditions, RMSProp exhibits fairer parameter updates and improved fairness in a single optimization step compared to SGD. We then validate these findings through extensive experiments on three publicly available datasets, namely CelebA, FairFace, and MS-COCO, across different tasks as facial expression recognition, gender classification, and multi-label classification, using various backbones. Considering multiple fairness definitions including equalized odds, equal opportunity, and demographic parity, adaptive optimizers like RMSProp and Adam consistently outperform SGD in terms of group fairness, while maintaining comparable predictive accuracy. Our results highlight the role of adaptive updates as a crucial yet overlooked mechanism for promoting fair outcomes.
A mean teacher algorithm for unlearning of language models
One of the goals of language model unlearning is to reduce memorization of selected text instances while retaining the model's general abilities. Despite various proposed methods, reducing memorization of large datasets without noticeable degradation in model utility remains challenging. In this paper, we investigate the mean teacher algorithm (Tarvainen & Valpola, 2017), a simple proximal optimization method from continual learning literature that gradually modifies the teacher model. We show that the mean teacher can approximate a trajectory of a slow natural gradient descent (NGD), which inherently seeks low-curvature updates that are less likely to degrade the model utility. While slow NGD can suffer from vanishing gradients, we introduce a new unlearning loss called "negative log-unlikelihood" (NLUL) that avoids this problem. We show that the combination of mean teacher and NLUL improves some metrics on the MUSE benchmarks (Shi et al., 2024).
Optimizing Multi-Gateway LoRaWAN via Cloud-Edge Collaboration and Knowledge Distillation
For large-scale multi-gateway LoRaWAN networks, this study proposes a cloud-edge collaborative resource allocation and decision-making method based on edge intelligence, HEAT-LDL (HEAT-Local Distill Lyapunov), which realizes collaborative decision-making between gateways and terminal nodes. HEAT-LDL combines the Actor-Critic architecture and the Lyapunov optimization method to achieve intelligent downlink control and gateway load balancing. When the signal quality is good, the network server uses the HEAT algorithm to schedule the terminal nodes. To improve the efficiency of autonomous decision-making of terminal nodes, HEAT-LDL performs cloud-edge knowledge distillation on the HEAT teacher model on the terminal node side. When the downlink decision instruction is lost, the terminal node uses the student model and the edge decider based on prior knowledge and local history to make collaborative autonomous decisions. Simulation experiments show that compared with the optimal results of all compared algorithms, HEAT-LDL improves the packet success rate and energy efficiency by 20.5% and 88.1%, respectively.