Goto

Collaborating Authors

 Optimization


Proactive Constrained Policy Optimization with Preemptive Penalty

arXiv.org Artificial Intelligence

Safe Reinforcement Learning (RL) often faces significant issues such as constraint violations and instability, necessitating the use of constrained policy optimization, which seeks optimal policies while ensuring adherence to specific constraints like safety. Typically, constrained optimization problems are addressed by the Lagrangian method, a post-violation remedial approach that may result in oscillations and overshoots. Motivated by this, we propose a novel method named Proactive Constrained Policy Optimization (PCPO) that incorporates a preemptive penalty mechanism. This mechanism integrates barrier items into the objective function as the policy nears the boundary, imposing a cost. Meanwhile, we introduce a constraint-aware intrinsic reward to guide boundary-aware exploration, which is activated only when the policy approaches the constraint boundary. We establish theoretical upper and lower bounds for the duality gap and the performance of the PCPO update, shedding light on the method's convergence characteristics. Additionally, to enhance the optimization performance, we adopt a policy iteration approach. An interesting finding is that PCPO demonstrates significant stability in experiments. Experimental results indicate that the PCPO framework provides a robust solution for policy optimization under constraints, with important implications for future research and practical applications.


Robustly Learning Monotone Single-Index Models

arXiv.org Artificial Intelligence

We consider the basic problem of learning Single-Index Models with respect to the square loss under the Gaussian distribution in the presence of adversarial label noise. Our main contribution is the first computationally efficient algorithm for this learning task, achieving a constant factor approximation, that succeeds for the class of {\em all} monotone activations with bounded moment of order $2 + ฮถ,$ for $ฮถ> 0.$ This class in particular includes all monotone Lipschitz functions and even discontinuous functions like (possibly biased) halfspaces. Prior work for the case of unknown activation either does not attain constant factor approximation or succeeds for a substantially smaller family of activations. The main conceptual novelty of our approach lies in developing an optimization framework that steps outside the boundaries of usual gradient methods and instead identifies a useful vector field to guide the algorithm updates by directly leveraging the problem structure, properties of Gaussian spaces, and regularity of monotone functions.


Reliable and Real-Time Highway Trajectory Planning via Hybrid Learning-Optimization Frameworks

arXiv.org Artificial Intelligence

--Autonomous highway driving presents a high collision risk due to fast-changing environments and limited reaction time, necessitating reliable and efficient trajectory planning. This paper proposes a hybrid trajectory planning framework that integrates the adaptability of learning-based methods with the formal safety guarantees of optimization-based approaches. The framework features a two-layer architecture: an upper layer employing a graph neural network (GNN) trained on real-world highway data to predict human-like longitudinal velocity profiles, and a lower layer utilizing path optimization formulated as a mixed-integer quadratic programming (MIQP) problem. The primary contribution is the lower-layer path optimization model, which introduces a linear approximation of discretized vehicle geometry to substantially reduce computational complexity, while enforcing strict spatiotemporal non-overlapping constraints to formally guarantee collision avoidance throughout the planning horizon. Experimental results demonstrate that the planner generates highly smooth, collision-free trajectories in complex real-world emergency scenarios, achieving success rates exceeding 97% with average planning times of 54 ms, thereby confirming real-time capability. HE trajectory planning module plays a central role in ensuring driving safety in the modern autonomous driving system. It generates an optimal continuous trajectory for autonomous vehicles (A Vs) over a future time horizon based on environmental information. This environmental information is provided by the perception module, which performs multi-sensor data fusion and feature extraction to produce real-time structured data through object detection and semantic segmentation. The control system then executes the planned trajectory by minimizing the deviation between the actual and intended vehicle behavior. Highway scenarios are constrained within structured environments characterized by high-speed operation, low-curvature roadways, and standardized traffic regulations, typically involving only rule-compliant motorized vehicles.


Challenges in Applying Variational Quantum Algorithms to Dynamic Satellite Network Routing

arXiv.org Artificial Intelligence

The advent of large-scale Low Earth Orbit (LEO) satellite constellations, spearheaded by initiatives such as SpaceX's Starlink, Amazon's Project Kuiper, and OneWeb, is poised to revolutionize global connectivity Saeed et al. (2020). By deploying thousands of interconnected satellites, these networks promise to deliver high-speed, low-latency internet access to every corner of the globe, including remote and underserved regions Reddy et al. (2023). However, the very characteristics that enable this new paradigm - namely, the massive scale and high orbital velocity of the satellites - introduce unprecedented challenges in network management Hu (2023). The network topology is in a constant state of flux, with inter-satellite links (ISLs) being established and terminated on a timescale of seconds, creating a highly dynamic and complex operational environment Bhattacharjee et al. (2024). At the heart of managing these constellations lies the network routing problem: determining the optimal path for data packets to travel from a source to a destination Zhang et al. (2025); Chen et al. (2021). In this dynamic context, the routing problem is far more complex than in terrestrial networks. It must account for time-varying latencies, intermittent link availability, and vast state spaces.


Agentic-AI based Mathematical Framework for Commercialization of Energy Resilience in Electrical Distribution System Planning and Operation

arXiv.org Artificial Intelligence

The increasing vulnerability of electrical distribution systems to extreme weather events and cyber threats necessitates the development of economically viable frameworks for resilience enhancement. While existing approaches focus primarily on technical resilience metrics and enhancement strategies, there remains a significant gap in establishing market-driven mechanisms that can effectively commercialize resilience features while optimizing their deployment through intelligent decision-making. Moreover, traditional optimization approaches for distribution network reconfiguration often fail to dynamically adapt to both normal and emergency conditions. This paper introduces a novel framework integrating dual-agent Proximal Policy Optimization (PPO) with market-based mechanisms, achieving an average resilience score of 0.85 0.08 over 10 test episodes. The proposed architecture leverages a dual-agent PPO scheme, where a strategic agent selects optimal DER-driven switching configurations, while a tactical agent fine-tunes individual switch states and grid preferences under budget and weather constraints. These agents interact within a custom-built dynamic simulation environment that models stochastic calamity events, budget limits, and resilience-cost trade-offs. A comprehensive reward function is designed that balances resilience enhancement objectives with market profitability (with up to 200x reward incentives, resulting in 85% of actions during calamity steps selecting configurations with 4 DERs), incorporating factors such as load recovery speed, system robustness, and customer satisfaction. Over 10 test episodes, the framework achieved a benefit-cost ratio of 0.12 0.01, demonstrating sustainable market incentives for resilience investment. This framework creates sustainable market incentives


LRTuckerRep: Low-rank Tucker Representation Model for Multi-dimensional Data Completion

arXiv.org Artificial Intelligence

--Multi-dimensional data completion is a critical problem in computational sciences, particularly in domains such as computer vision and signal processing. Existing methods typically leverage either global low-rank approximations or local smoothness regularization, but each suffers from notable limitations: low-rank methods are computationally expensive and may disrupt intrinsic data structures, while smoothness-based approaches often require extensive manual parameter tuning and exhibit poor generalization. In this paper, we propose a novel Low-Rank T ucker Representation (LRT uckerRep) model that unifies global and local prior modeling within a T ucker decomposition. T o efficiently solve the resulting nonconvex optimization problem, we develop two iterative algorithms with provable convergence guarantees. Extensive experiments on multi-dimensional image inpainting and traffic data imputation demonstrate that LRT uckerRep achieves superior completion accuracy and robustness under high missing rates compared to baselines. N the era of big data and artificial intelligence, multidimensional data with complex structures is increasingly prevalent across diverse domains, including computer vision, signal processing, and scientific computing. Tensor representations depict complex structural information from multidimensional data, which plays an important role in image science [1] and signal processing [2]. However, multidimensional data collected in practical applications suffers from degradation and information loss, affecting image enhancement quality and traffic prediction accuracy.


A Survey on Deep Multi-Task Learning in Connected Autonomous Vehicles

arXiv.org Artificial Intelligence

Connected autonomous vehicles (CAVs) must simultaneously perform multiple tasks, such as object detection, semantic segmentation, depth estimation, trajectory prediction, motion prediction, and behaviour prediction, to ensure safe and reliable navigation in complex environments. Vehicle-to-everything (V2X) communication enables cooperative driving among CAVs, thereby mitigating the limitations of individual sensors, reducing occlusions, and improving perception over long distances. Traditionally, these tasks are addressed using distinct models, which leads to high deployment costs, increased computational overhead, and challenges in achieving real-time performance. Multi-task learning (MTL) has recently emerged as a promising solution that enables the joint learning of multiple tasks within a single unified model. This offers improved efficiency and resource utilization. To the best of our knowledge, this survey is the first comprehensive review focused on MTL in the context of CAVs. We begin with an overview of CAVs and MTL to provide foundational background. We then explore the application of MTL across key functional modules, including perception, prediction, planning, control, and multi-agent collaboration. Finally, we discuss the strengths and limitations of existing methods, identify key research gaps, and provide directions for future research aimed at advancing MTL methodologies for CAV systems.


A Dual Optimization View to Empirical Risk Minimization with f-Divergence Regularization

arXiv.org Machine Learning

--The dual formulation of empirical risk minimization with f -divergence regularization (ERM-f DR) is introduced. The solution of the dual optimization problem to the ERM-f DR is connected to the notion of normalization function introduced as an implicit function. This dual approach leverages the Legendre-Fenchel transform and the implicit function theorem to provide a nonlinear ODE expression to the normalization function. Furthermore, the nonlinear ODE expression and its properties provide a computationally efficient method to calculate the normalization function of the ERM-f DR solution under a mild condition. Empirical risk minimization (ERM) [1]-[6] is often posed as an optimization problem regularized by a statistical distance between the probability measure to be optimized and a given reference measure [7]-[13].


Automatic Prompt Optimization for Knowledge Graph Construction: Insights from an Empirical Study

arXiv.org Artificial Intelligence

A KG represents a network of entities and illustrates relationships between them. KGs are used for various applications, including semantic search and discovery, reasoning, decision-making, natural language processing, machine learning, and recommendation systems. Triple (subject-relation-object) extraction from text is the fundamental building block of KG construction and has been widely studied, for example, in early benchmarks such as ACE 2002 to more recent ones, such as WebNLG 2020, REBEL and SynthIE. While the use of LLMs is explored for KG construction, handcrafting reasonable task-specific prompts for LLMs is a labour-intensive exercise and can be brittle due to subtle changes in the LLM models employed. Recent work in NLP tasks (e.g. autonomy generation) uses automatic prompt optimization/engineering to address this challenge by generating optimal or near-optimal task-specific prompts given input-output examples. This empirical study explores the application of automatic prompt optimization for the triple extraction task using experimental benchmarking. We evaluate different settings by changing (a) the prompting strategy, (b) the LLM being used for prompt optimization and task execution, (c) the number of canonical relations in the schema (schema complexity), (d) the length and diversity of input text, (e) the metric used to drive the prompt optimization, and (f) the dataset being used for training and testing. We evaluate three different automatic prompt optimizers, namely, DSPy, APE, and TextGrad and use two different triple extraction datasets, SynthIE and REBEL. Through rigorous empirical evaluation, our main contribution highlights that automatic prompt optimization techniques can generate reasonable prompts similar to humans for triple extraction. In turn, these optimized prompts achieve improved results, particularly with increasing schema complexity and text size.


EmbedGrad: Gradient-Based Prompt Optimization in Embedding Space for Large Language Models

arXiv.org Artificial Intelligence

Effectively adapting powerful pretrained foundation models to diverse tasks remains a key challenge in AI deployment. Current approaches primarily follow two paradigms:discrete optimization of text prompts through prompt engineering, or continuous adaptation via additional trainable parameters. Both exhibit limitations-discrete methods lack refinement precision while parameter-based techniques increase complexity and reduce interpretability. To address these constraints, we propose EmbedGrad, a novel framework that optimizes text prompt embeddings through gradient-based refinement. Our approach uniquely decouples training from deployment:during optimization,labeled examples guide precise embedding adjustments while preserving semantic meaning; during inference, only optimized embeddings integrate with user queries. This enables fine-grained calibration impossible in text space, such as enhancing the reasoning capability of prompts like please reason step by step. Comprehensive evaluations across mathematical reasoning, sentiment analysis, and causal judgment tasks demonstrate EmbedGrad's effectiveness:optimizing this reasoning prompt for Qwen2.5-Math-1.5B increased accuracy from 14.74\% to 58.96\% on mathematical problems. Consistent improvements were observed across model scales (0.5B-14B) and all tasks, with particularly significant gains for smaller models on complex problems like causal judgment. By bridging prompt engineering and parameter efficiency without architectural changes, our work establishes embedding refinement as a powerful new paradigm for task adaptation.