Goto

Collaborating Authors

 shortest path problem



Convergent Reinforcement Learning Algorithms for Stochastic Shortest Path Problem

arXiv.org Artificial Intelligence

In this paper we propose two algorithms in the tabular setting and an algorithm for the function approximation setting for the Stochastic Shortest Path (SSP) problem. SSP problems form an important class of problems in Reinforcement Learning (RL), as other types of cost-criteria in RL can be formulated in the setting of SSP. We show asymptotic almost-sure convergence for all our algorithms. We observe superior performance of our tabular algorithms compared to other well-known convergent RL algorithms. We further observe reliable performance of our function approximation algorithm compared to other algorithms in the function approximation setting.


APULSE: A Scalable Hybrid Algorithm for the RCSPP on Large-Scale Dense Graphs

arXiv.org Artificial Intelligence

Abstract--The resource-constrained shortest path problem (RCSPP) is a fundamental NP-hard optimization challenge with broad applications, from network routing to autonomous navigation. This problem involves finding a path that minimizes a primary cost subject to a budget on a secondary resource. While various RCSPP solvers exist, they often face critical scalability limitations when applied to the large, dense graphs characteristic of complex, real-world scenarios, making them impractical for time-critical planning. This challenge is particularly acute in domains like mission planning for unmanned ground vehicles (UGVs), which demand solutions on large-scale terrain graphs. This paper introduces APULSE, a hybrid label-setting algorithm designed to efficiently solve the RCSPP on such challenging graphs. APULSE integrates a best-first search guided by an A* heuristic with aggressive, Pulse-style pruning mechanisms and a time-bucketing strategy for effective state-space reduction. The results demonstrate that APULSE consistently finds near-optimal solutions while being orders of magnitude faster and more robust, particularly on large problem instances where competing methods fail. This superior scalability establishes APULSE as an effective solution for RCSPP in complex, large-scale environments, enabling capabilities such as interactive decision support and dynamic replanning. HE Resource-Constrained Shortest Path Problem (RC-SPP) is a fundamental NP-hard optimization challenge with broad applications, from network routing and logistics to autonomous navigation [1].


ARGUS: A Framework for Risk-Aware Path Planning in Tactical UGV Operations

arXiv.org Artificial Intelligence

This thesis presents the development of ARGUS, a framework for mission planning for Unmanned Ground Vehicles (UGVs) in tactical environments. The system is designed to translate battlefield complexity and the commander's intent into executable action plans. To this end, ARGUS employs a processing pipeline that takes as input geospatial terrain data, military intelligence on existing threats and their probable locations, and mission priorities defined by the commander. Through a set of integrated modules, the framework processes this information to generate optimized trajectories that balance mission objectives against the risks posed by threats and terrain characteristics. A fundamental capability of ARGUS is its dynamic nature, which allows it to adapt plans in real-time in response to unforeseen events, reflecting the fluid nature of the modern battlefield. The system's interoperability were validated in a practical exercise with the Portuguese Army, where it was successfully demonstrated that the routes generated by the model can be integrated and utilized by UGV control systems. The result is a decision support tool that not only produces an optimal trajectory but also provides the necessary insights for its execution, thereby contributing to greater effectiveness and safety in the employment of autonomous ground systems.


Adapting to Stochastic and Adversarial Losses in Episodic MDPs with Aggregate Bandit Feedback

arXiv.org Machine Learning

We study online learning in finite-horizon episodic Markov decision processes (MDPs) under the challenging aggregate bandit feedback model, where the learner observes only the cumulative loss incurred in each episode, rather than individual losses at each state-action pair. While prior work in this setting has focused exclusively on worst-case analysis, we initiate the study of best-of-both-worlds (BOBW) algorithms that achieve low regret in both stochastic and adversarial environments. We propose the first BOBW algorithms for episodic tabular MDPs with aggregate bandit feedback. In the case of known transitions, our algorithms achieve $O(\log T)$ regret in stochastic settings and ${O}(\sqrt{T})$ regret in adversarial ones. Importantly, we also establish matching lower bounds, showing the optimality of our algorithms in this setting. We further extend our approach to unknown-transition settings by incorporating confidence-based techniques. Our results rely on a combination of FTRL over occupancy measures, self-bounding techniques, and new loss estimators inspired by recent advances in online shortest path problems. Along the way, we also provide the first individual-gap-dependent lower bounds and demonstrate near-optimal BOBW algorithms for shortest path problems with bandit feedback.


Contextual Linear Optimization with Bandit Feedback

Neural Information Processing Systems

We show a fast-rate regret bound for IERM that allows for misspecified model classes and flexible choices of the optimization estimate, and we develop computationally tractable surrogate losses.



Challenges in Applying Variational Quantum Algorithms to Dynamic Satellite Network Routing

arXiv.org Artificial Intelligence

The advent of large-scale Low Earth Orbit (LEO) satellite constellations, spearheaded by initiatives such as SpaceX's Starlink, Amazon's Project Kuiper, and OneWeb, is poised to revolutionize global connectivity Saeed et al. (2020). By deploying thousands of interconnected satellites, these networks promise to deliver high-speed, low-latency internet access to every corner of the globe, including remote and underserved regions Reddy et al. (2023). However, the very characteristics that enable this new paradigm - namely, the massive scale and high orbital velocity of the satellites - introduce unprecedented challenges in network management Hu (2023). The network topology is in a constant state of flux, with inter-satellite links (ISLs) being established and terminated on a timescale of seconds, creating a highly dynamic and complex operational environment Bhattacharjee et al. (2024). At the heart of managing these constellations lies the network routing problem: determining the optimal path for data packets to travel from a source to a destination Zhang et al. (2025); Chen et al. (2021). In this dynamic context, the routing problem is far more complex than in terrestrial networks. It must account for time-varying latencies, intermittent link availability, and vast state spaces.


Robust Route Planning for Sidewalk Delivery Robots

arXiv.org Artificial Intelligence

Sidewalk delivery robots are a promising solution for urban freight distribution, reducing congestion compared to trucks and providing a safer, higher-capacity alternative to drones. However, unreliable travel times on sidewalks due to pedestrian density, obstacles, and varying infrastructure conditions can significantly affect their efficiency. This study addresses the robust route planning problem for sidewalk robots, explicitly accounting for travel time uncertainty due to varying sidewalk conditions. Optimization is integrated with simulation to reproduce the effect of obstacles and pedestrian flows and generate realistic travel times. The study investigates three different approaches to derive uncertainty sets, including budgeted, ellipsoidal, and support vector clustering (SVC)-based methods, along with a distributionally robust method to solve the shortest path (SP) problem. A realistic case study reproducing pedestrian patterns in Stockholm's city center is used to evaluate the efficiency of robust routing across various robot designs and environmental conditions. The results show that, when compared to a conventional SP, robust routing significantly enhances operational reliability under variable sidewalk conditions. The Ellipsoidal and DRSP approaches outperform the other methods, yielding the most efficient paths in terms of average and worst-case delay. Sensitivity analyses reveal that robust approaches consistently outperform the conventional SP, particularly for sidewalk delivery robots that are wider, slower, and have more conservative navigation behaviors. These benefits are even more pronounced in adverse weather conditions and high pedestrian congestion scenarios.


OPO: Making Decision-Focused Data Acquisition Decisions

arXiv.org Artificial Intelligence

We propose a model for making data acquisition decisions for variables in contextual stochastic optimisation problems. Data acquisition decisions are typically treated as separate and fixed. We explore problem settings in which the acquisition of contextual variables is costly and consequently constrained. The data acquisition problem is often solved heuristically for proxy objectives such as coverage. The more intuitive objective is the downstream decision quality as a result of data acquisition decisions. The whole pipeline can be characterised as an optimise-then-predict-then-optimise (OPO) problem. Analogously, much recent research has focused on how to integrate prediction and optimisation (PO) in the form of decision-focused learning. We propose leveraging differentiable optimisation to extend the integration to data acquisition. We solve the data acquisition problem with well-defined constraints by learning a surrogate linear objective function. We demonstrate an application of this model on a shortest path problem for which we first have to set a drone reconnaissance strategy to capture image segments serving as inputs to a model that predicts travel costs. We ablate the problem with a number of training modalities and demonstrate that the differentiable optimisation approach outperforms random search strategies.