AITopics

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.65)

Industry: Education > Educational Setting > Online (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.45)

Neural Information Processing SystemsJun-10-2026, 02:22:12 GMT

Adapting to Stochastic and Adversarial Losses in Episodic MDPs with Aggregate Bandit Feedback

We study online learning in finite-horizon episodic Markov decision processes (MDPs) under the challenging \textit{aggregate bandit feedback} model, where the learner observes only the cumulative loss incurred in each episode, rather than individual losses at each state-action pair. While prior work in this setting has focused exclusively on worst-case analysis, we initiate the study of \textit{best-of-both-worlds} (BOBW) algorithms that achieve low regret in both stochastic and adversarial environments. We propose the first BOBW algorithms for episodic tabular MDPs with aggregate bandit feedback. In the case of known transitions, our algorithms achieve $O(\log T)$ regret in stochastic settings and ${O}(\sqrt{T})$ regret in adversarial ones. Importantly, we also establish matching lower bounds, showing the optimality of our algorithms in this setting. We further extend our approach to unknown-transition settings by incorporating confidence-based techniques. Our results rely on a combination of FTRL over occupancy measures, self-bounding techniques, and new loss estimators inspired by recent advances in online shortest path problems. Along the way, we also provide the first individual-gap-dependent lower bounds and demonstrate near-optimal BOBW algorithms for shortest path problems with bandit feedback.

artificial intelligence, machine learning, proceedings, (9 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.40)

Neural Information Processing SystemsFeb-9-2026, 23:13:24 GMT

b64a70760bb75e3ecfd1ad86d8f10c88-Paper.pdf

generalizability, graph, neural network, (13 more...)

Country:

North America > United States > California > San Diego County > San Diego (0.04)
Asia > China > Shanghai > Shanghai (0.04)
North America > Canada (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Guin, Soumyajit, Bhatnagar, Shalabh

Convergent Reinforcement Learning Algorithms for Stochastic Shortest Path Problem

arXiv.org Artificial IntelligenceDec-3-2025

In this paper we propose two algorithms in the tabular setting and an algorithm for the function approximation setting for the Stochastic Shortest Path (SSP) problem. SSP problems form an important class of problems in Reinforcement Learning (RL), as other types of cost-criteria in RL can be formulated in the setting of SSP. We show asymptotic almost-sure convergence for all our algorithms. We observe superior performance of our tabular algorithms compared to other well-known convergent RL algorithms. We further observe reliable performance of our function approximation algorithm compared to other algorithms in the function approximation setting.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2508.13963

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Soares, Nuno, Grilo, António

APULSE: A Scalable Hybrid Algorithm for the RCSPP on Large-Scale Dense Graphs

arXiv.org Artificial IntelligenceDec-2-2025

Abstract--The resource-constrained shortest path problem (RCSPP) is a fundamental NP-hard optimization challenge with broad applications, from network routing to autonomous navigation. This problem involves finding a path that minimizes a primary cost subject to a budget on a secondary resource. While various RCSPP solvers exist, they often face critical scalability limitations when applied to the large, dense graphs characteristic of complex, real-world scenarios, making them impractical for time-critical planning. This challenge is particularly acute in domains like mission planning for unmanned ground vehicles (UGVs), which demand solutions on large-scale terrain graphs. This paper introduces APULSE, a hybrid label-setting algorithm designed to efficiently solve the RCSPP on such challenging graphs. APULSE integrates a best-first search guided by an A* heuristic with aggressive, Pulse-style pruning mechanisms and a time-bucketing strategy for effective state-space reduction. The results demonstrate that APULSE consistently finds near-optimal solutions while being orders of magnitude faster and more robust, particularly on large problem instances where competing methods fail. This superior scalability establishes APULSE as an effective solution for RCSPP in complex, large-scale environments, enabling capabilities such as interactive decision support and dynamic replanning. HE Resource-Constrained Shortest Path Problem (RC-SPP) is a fundamental NP-hard optimization challenge with broad applications, from network routing and logistics to autonomous navigation [1].

algorithm, artificial intelligence, planning & scheduling, (17 more...)

2511.18236

Country: Europe > Portugal > Lisbon > Lisbon (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Government > Military (1.00)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.94)

Soares, Nuno, Grilo, António

ARGUS: A Framework for Risk-Aware Path Planning in Tactical UGV Operations

arXiv.org Artificial IntelligenceNov-12-2025

This thesis presents the development of ARGUS, a framework for mission planning for Unmanned Ground Vehicles (UGVs) in tactical environments. The system is designed to translate battlefield complexity and the commander's intent into executable action plans. To this end, ARGUS employs a processing pipeline that takes as input geospatial terrain data, military intelligence on existing threats and their probable locations, and mission priorities defined by the commander. Through a set of integrated modules, the framework processes this information to generate optimized trajectories that balance mission objectives against the risks posed by threats and terrain characteristics. A fundamental capability of ARGUS is its dynamic nature, which allows it to adapt plans in real-time in response to unforeseen events, reflecting the fluid nature of the modern battlefield. The system's interoperability were validated in a practical exercise with the Portuguese Army, where it was successfully demonstrated that the routes generated by the model can be integrated and utilized by UGV control systems. The result is a decision support tool that not only produces an optimal trajectory but also provides the necessary insights for its execution, thereby contributing to greater effectiveness and safety in the employment of autonomous ground systems.

algorithm, artificial intelligence, planning & scheduling, (18 more...)

2511.07565

Country: Europe > Portugal (0.14)

Genre: Research Report (0.82)

Industry: Government > Military > Army (0.54)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

arXiv.org Machine LearningOct-28-2025

Adapting to Stochastic and Adversarial Losses in Episodic MDPs with Aggregate Bandit Feedback

Ito, Shinji, Jamieson, Kevin, Luo, Haipeng, Maiti, Arnab, Tsuchiya, Taira

We study online learning in finite-horizon episodic Markov decision processes (MDPs) under the challenging aggregate bandit feedback model, where the learner observes only the cumulative loss incurred in each episode, rather than individual losses at each state-action pair. While prior work in this setting has focused exclusively on worst-case analysis, we initiate the study of best-of-both-worlds (BOBW) algorithms that achieve low regret in both stochastic and adversarial environments. We propose the first BOBW algorithms for episodic tabular MDPs with aggregate bandit feedback. In the case of known transitions, our algorithms achieve $O(\log T)$ regret in stochastic settings and ${O}(\sqrt{T})$ regret in adversarial ones. Importantly, we also establish matching lower bounds, showing the optimality of our algorithms in this setting. We further extend our approach to unknown-transition settings by incorporating confidence-based techniques. Our results rely on a combination of FTRL over occupancy measures, self-bounding techniques, and new loss estimators inspired by recent advances in online shortest path problems. Along the way, we also provide the first individual-gap-dependent lower bounds and demonstrate near-optimal BOBW algorithms for shortest path problems with bandit feedback.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

2510.17103

Genre: Research Report > New Finding (0.34)

Industry: Education > Educational Setting (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.45)

Neural Information Processing SystemsOct-9-2025, 22:25:15 GMT

Contextual Linear Optimization with Bandit Feedback

We show a fast-rate regret bound for IERM that allows for misspecified model classes and flexible choices of the optimization estimate, and we develop computationally tractable surrogate losses.

misspecification, misspecified degree 2, reg, (14 more...)

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)

Neural Information Processing SystemsAug-15-2025, 23:43:31 GMT

b64a70760bb75e3ecfd1ad86d8f10c88-Paper.pdf

generalizability, graph, neural network, (13 more...)

Country:

North America > United States > California > San Diego County > San Diego (0.04)
Asia > China > Shanghai > Shanghai (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Do, Phuc Hao, Le, Tran Duc

Challenges in Applying Variational Quantum Algorithms to Dynamic Satellite Network Routing

arXiv.org Artificial IntelligenceAug-7-2025

The advent of large-scale Low Earth Orbit (LEO) satellite constellations, spearheaded by initiatives such as SpaceX's Starlink, Amazon's Project Kuiper, and OneWeb, is poised to revolutionize global connectivity Saeed et al. (2020). By deploying thousands of interconnected satellites, these networks promise to deliver high-speed, low-latency internet access to every corner of the globe, including remote and underserved regions Reddy et al. (2023). However, the very characteristics that enable this new paradigm - namely, the massive scale and high orbital velocity of the satellites - introduce unprecedented challenges in network management Hu (2023). The network topology is in a constant state of flux, with inter-satellite links (ISLs) being established and terminated on a timescale of seconds, creating a highly dynamic and complex operational environment Bhattacharjee et al. (2024). At the heart of managing these constellations lies the network routing problem: determining the optimal path for data packets to travel from a source to a destination Zhang et al. (2025); Chen et al. (2021). In this dynamic context, the routing problem is far more complex than in terrestrial networks. It must account for time-varying latencies, intermittent link availability, and vast state spaces.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

2508.04288

Genre: Research Report > New Finding (0.93)

Industry:

Aerospace & Defense (0.68)
Transportation (0.55)
Telecommunications (0.46)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)