AITopics

2502.20068

Country:

Asia > China > Shaanxi Province > Xi'an (0.24)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report > New Finding (0.34)

Industry:

Transportation > Ground > Road (1.00)
Transportation > Electric Vehicle (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

arXiv.org Artificial IntelligenceFeb-27-2025

ColorDynamic: Generalizable, Scalable, Real-time, End-to-end Local Planner for Unstructured and Dynamic Environments

Xin, Jinghao, Liang, Zhichao, Zhang, Zihuan, Wang, Peng, Li, Ning

Deep Reinforcement Learning (DRL) has demonstrated potential in addressing robotic local planning problems, yet its efficacy remains constrained in highly unstructured and dynamic environments. To address these challenges, this study proposes the ColorDynamic framework. First, an end-to-end DRL formulation is established, which maps raw sensor data directly to control commands, thereby ensuring compatibility with unstructured environments. Under this formulation, a novel network, Transqer, is introduced. The Transqer enables online DRL learning from temporal transitions, substantially enhancing decision-making in dynamic scenarios. To facilitate scalable training of Transqer with diverse data, an efficient simulation platform E-Sparrow, along with a data augmentation technique leveraging symmetric invariance, are developed. Comparative evaluations against state-of-the-art methods, alongside assessments of generalizability, scalability, and real-time performance, were conducted to validate the effectiveness of ColorDynamic. Results indicate that our approach achieves a success rate exceeding 90% while exhibiting real-time capacity (1.2-1.3 ms per planning). Additionally, ablation studies were performed to corroborate the contributions of individual components. Building on this, the OkayPlan-ColorDynamic (OPCD) navigation system is presented, with simulated and real-world experiments demonstrating its superiority and applicability in complex scenarios. The codebase and experimental demonstrations have been open-sourced on our website to facilitate reproducibility and further research.

colordynamic, obstacle, robot, (16 more...)

2502.19892

Country:

Asia > China > Shanghai > Shanghai (0.05)
Asia > Japan > Honshū > Kansai > Hyogo Prefecture > Kobe (0.04)

Genre: Research Report > Promising Solution (0.34)

Industry:

Transportation (0.68)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Architecture > Real Time Systems (1.00)
(4 more...)

Foresti, Alberto, Franzese, Giulio, Michiardi, Pietro

INFO-SEDD: Continuous Time Markov Chains as Scalable Information Metrics Estimators

arXiv.org Artificial IntelligenceFeb-27-2025

Information-theoretic quantities play a crucial role in understanding non-linear relationships between random variables and are widely used across scientific disciplines. However, estimating these quantities remains an open problem, particularly in the case of high-dimensional discrete distributions. Current approaches typically rely on embedding discrete data into a continuous space and applying neural estimators originally designed for continuous distributions, a process that may not fully capture the discrete nature of the underlying data. We consider Continuous-Time Markov Chains (CTMCs), stochastic processes on discrete state-spaces which have gained popularity due to their generative modeling applications. In this work, we introduce INFO-SEDD, a novel method for estimating information-theoretic quantities of discrete data, including mutual information and entropy. Our approach requires the training of a single parametric model, offering significant computational and memory advantages. Additionally, it seamlessly integrates with pretrained networks, allowing for efficient reuse of pretrained generative models. To evaluate our approach, we construct a challenging synthetic benchmark. Our experiments demonstrate that INFO-SEDD is robust and outperforms neural competitors that rely on embedding techniques. Moreover, we validate our method on a real-world task: estimating the entropy of an Ising model. Overall, INFO-SEDD outperforms competing methods and shows scalability to high-dimensional scenarios, paving the way for new applications where estimating MI between discrete distribution is the focus. The promising results in this complex, high-dimensional scenario highlight INFO-SEDD as a powerful new estimator in the toolkit for information-theoretical analysis.

estimator, information, mutual information, (14 more...)

2502.19183

Country:

North America > United States (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.70)
(2 more...)

Son, Jaehyeon, Lee, Soochan, Kim, Gunhee

Distilling Reinforcement Learning Algorithms for In-Context Model-Based Planning

Recent studies have shown that Transformers can perform in-context reinforcement learning (RL) by imitating existing RL algorithms, enabling sample-efficient adaptation to unseen tasks without parameter updates. However, these models also inherit the suboptimal behaviors of the RL algorithms they imitate. This issue primarily arises due to the gradual update rule employed by those algorithms. Model-based planning offers a promising solution to this limitation by allowing the models to simulate potential outcomes before taking action, providing an additional mechanism to deviate from the suboptimal behavior. Rather than learning a separate dynamics model, we propose Distillation for In-Context Planning (DICP), an in-context model-based RL framework where Transformers simultaneously learn environment dynamics and improve policy in-context. We evaluate DICP across a range of discrete and continuous environments, including Darkroom variants and Meta-World. Our results show that DICP achieves state-of-the-art performance while requiring significantly fewer environment interactions than baselines, which include both model-free counterparts and existing meta-RL methods. Since the introduction of Transformers (V aswani et al., 2017), their versatility in handling diverse tasks has been widely recognized across various domains (Brown et al., 2020; Dosovitskiy et al., 2021; Bubeck et al., 2023). A key aspect of their success is in-context learning (Brown et al., 2020), which enables models to acquire knowledge rapidly without explicit parameter updates through gradient descent. Recently, this capability has been explored in reinforcement learning (RL) (Chen et al., 2021; Schulman et al., 2017; Lee et al., 2022; Reed et al., 2022), where acquiring skills in a sample-efficient manner is crucial. This line of research naturally extends to meta-RL, which focuses on leveraging prior knowledge to quickly adapt to novel tasks. In this context, Laskin et al. (2023) introduce Algorithm Distillation (AD), an in-context RL approach where Transformers sequentially model the entire learning histories of a specific RL algorithm across various tasks. The goal is for the models to replicate the exploration-exploitation behaviors of the source RL algorithm, enabling them to tackle novel tasks purely in-context.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

2502.19009

Country: Asia (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment (0.46)
Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Assessing Autonomous Inspection Regimes: Active Versus Passive Satellite Inspection

Aurand, Joshua, Pang, Christopher, Mokhtar, Sina, Lei, Henry, Cutlip, Steven, Phillips, Sean

This paper addresses the problem of satellite inspection, where one or more satellites (inspectors) are tasked with imaging or inspecting a resident space object (RSO) due to potential malfunctions or anomalies. Inspection strategies are often reduced to a discretized action space with predefined waypoints, facilitating tractability in both classical optimization and machine learning based approaches. However, this discretization can lead to suboptimal guidance in certain scenarios. This study presents a comparative simulation to explore the tradeoffs of passive versus active strategies in multi-agent missions. Key factors considered include RSO dynamic mode, state uncertainty, unmodeled entrance criteria, and inspector motion types. The evaluation is conducted with a focus on fuel utilization and surface coverage. Building on a Monte-Carlo based evaluator of passive strategies and a reinforcement learning framework for training active inspection policies, this study investigates conditions under which passive strategies, such as Natural Motion Circumnavigation (NMC), may perform comparably to active strategies like Reinforcement Learning based waypoint transfers.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

doi: 10.2514/6.2025-0755

2502.19556

Country:

North America > United States > Rocky Mountains (0.04)
North America > United States > Montana (0.04)
North America > United States > Massachusetts (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Energy (0.94)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Exponential Topology-enabled Scalable Communication in Multi-agent Reinforcement Learning

Li, Xinran, Wang, Xiaolu, Bai, Chenjia, Zhang, Jun

In cooperative multi-agent reinforcement learning (MARL), well-designed communication protocols can effectively facilitate consensus among agents, thereby enhancing task performance. Moreover, in large-scale multi-agent systems commonly found in real-world applications, effective communication plays an even more critical role due to the escalated challenge of partial observability compared to smaller-scale setups. In this work, we endeavor to develop a scalable communication protocol for MARL. Unlike previous methods that focus on selecting optimal pairwise communication links-a task that becomes increasingly complex as the number of agents grows-we adopt a global perspective on communication topology design. Specifically, we propose utilizing the exponential topology to enable rapid information dissemination among agents by leveraging its small-diameter and small-size properties. This approach leads to a scalable communication protocol, named ExpoComm. To fully unlock the potential of exponential graphs as communication topologies, we employ memory-based message processors and auxiliary tasks to ground messages, ensuring that they reflect global information and benefit decision-making. Extensive experiments on large-scale cooperative benchmarks, including MAgent and Infrastructure Management Planning, demonstrate the superior performance and robust zero-shot transferability of ExpoComm compared to existing communication strategies. The code is publicly available at https://github.com/LXXXXR/ExpoComm.

agent, communication, topology, (12 more...)

2502.19717

Country:

Asia > China > Hong Kong (0.04)
North America > United States > Virginia (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (0.82)

Industry: Telecommunications (0.48)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Risk-aware Integrated Task and Motion Planning for Versatile Snake Robots under Localization Failures

Jasour, Ashkan, Daddi, Guglielmo, Endo, Masafumi, Vaquero, Tiago S., Paton, Michael, Strub, Marlin P., Corpino, Sabrina, Ingham, Michel, Ono, Masahiro, Thakker, Rohan

Snake robots enable mobility through extreme terrains and confined environments in terrestrial and space applications. However, robust perception and localization for snake robots remain an open challenge due to the proximity of the sensor payload to the ground coupled with a limited field of view. To address this issue, we propose Blind-motion with Intermittently Scheduled Scans (BLISS) which combines proprioception-only mobility with intermittent scans to be resilient against both localization failures and collision risks. BLISS is formulated as an integrated Task and Motion Planning (TAMP) problem that leads to a Chance-Constrained Hybrid Partially Observable Markov Decision Process (CC-HPOMDP), known to be computationally intractable due to the curse of history. Our novelty lies in reformulating CC-HPOMDP as a tractable, convex Mixed Integer Linear Program. This allows us to solve BLISS-TAMP significantly faster and jointly derive optimal task-motion plans. Simulations and hardware experiments on the EELS snake robot show our method achieves over an order of magnitude computational improvement compared to state-of-the-art POMDP planners and $>$ 50\% better navigation time optimality versus classical two-stage planners.

motion planning, robot, task and motion planning, (15 more...)

2502.1969

Country:

North America > United States > California > Los Angeles County > Pasadena (0.04)
Europe > Italy (0.04)
Asia > Japan (0.04)

Genre: Research Report (0.50)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Robust Gymnasium: A Unified Modular Benchmark for Robust Reinforcement Learning

Gu, Shangding, Shi, Laixi, Wen, Muning, Jin, Ming, Mazumdar, Eric, Chi, Yuejie, Wierman, Adam, Spanos, Costas

Driven by inherent uncertainty and the sim-to-real gap, robust reinforcement learning (RL) seeks to improve resilience against the complexity and variability in agent-environment sequential interactions. Despite the existence of a large number of RL benchmarks, there is a lack of standardized benchmarks for robust RL. Current robust RL policies often focus on a specific type of uncertainty and are evaluated in distinct, one-off environments. In this work, we introduce Robust-Gymnasium, a unified modular benchmark designed for robust RL that supports a wide variety of disruptions across all key RL components-agents' observed state and reward, agents' actions, and the environment. Offering over sixty diverse task environments spanning control and robotics, safe RL, and multi-agent RL, it provides an open-source and user-friendly tool for the community to assess current methods and foster the development of robust RL algorithms. In addition, we benchmark existing standard and robust RL algorithms within this framework, uncovering significant deficiencies in each and offering new insights.

algorithm, arxiv preprint arxiv, reinforcement, (15 more...)

2502.19652

Country:

North America > United States > Virginia (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report > New Finding (0.93)

Industry:

Information Technology > Security & Privacy (0.46)
Media > Television (0.46)
Leisure & Entertainment > Games > Computer Games (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Planning with Linear Temporal Logic Specifications: Handling Quantifiable and Unquantifiable Uncertainty

Yu, Pian, Li, Yong, Parker, David, Kwiatkowska, Marta

This work studies the planning problem for robotic systems under both quantifiable and unquantifiable uncertainty. The objective is to enable the robotic systems to optimally fulfill high-level tasks specified by Linear Temporal Logic (LTL) formulas. To capture both types of uncertainty in a unified modelling framework, we utilise Markov Decision Processes with Set-valued Transitions (MDPSTs). We introduce a novel solution technique for the optimal robust strategy synthesis of MDPSTs with LTL specifications. To improve efficiency, our work leverages limit-deterministic B\"uchi automata (LDBAs) as the automaton representation for LTL to take advantage of their efficient constructions. To tackle the inherent nondeterminism in MDPSTs, which presents a significant challenge for reducing the LTL planning problem to a reachability problem, we introduce the concept of a Winning Region (WR) for MDPSTs. Additionally, we propose an algorithm for computing the WR over the product of the MDPST and the LDBA. Finally, a robust value iteration algorithm is invoked to solve the reachability problem. We validate the effectiveness of our approach through a case study involving a mobile robot operating in the hexagonal world, demonstrating promising efficiency gains.

acc, mdpst, probability, (15 more...)

2502.19603

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Asia > Vietnam > Hanoi > Hanoi (0.04)
Asia > China (0.04)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)

Consistent Amortized Clustering via Generative Flow Networks

Chelly, Irit, Uziel, Roy, Freifeld, Oren, Pakman, Ari

Neural models for amortized probabilistic clustering yield samples of cluster labels given a set-structured input, while avoiding lengthy Markov chain runs and the need for explicit data likelihoods. Existing methods which label each data point sequentially, like the Neural Clustering Process, often lead to cluster assignments highly dependent on the data order. Alternatively, methods that sequentially create full clusters, do not provide assignment probabilities. In this paper, we introduce GFNCP, a novel framework for amortized clustering. GFNCP is formulated as a Generative Flow Network with a shared energy-based parametrization of policy and reward. We show that the flow matching conditions are equivalent to consistency of the clustering posterior under marginalization, which in turn implies order invariance. GFNCP also outperforms existing methods in clustering performance on both synthetic and real-world data.

dataset, gfncp, international conference, (14 more...)

2502.19337

Country:

Asia > Middle East > Jordan (0.04)
Asia > Thailand (0.04)
Asia > Middle East > Israel > Southern District > Beer-Sheva (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)