AITopics

This paper proposes a family of graph metrics for measuring distances between graphs of different sizes. The proposed metric family defines a general form of the graph generalised optimal sub-pattern assignment (GOSPA) metric and is also proved to satisfy the metric properties. Similarly to the graph GOSPA metric, the proposed graph GOSPA metric family also penalises the node attribute costs for assigned nodes between the two graphs, and the number of unassigned nodes. However, the proposed family of metrics provides more general penalties for edge mismatches than the graph GOSPA metric. This paper also shows that the graph GOSPA metric family can be approximately computed using linear programming. Simulation experiments are performed to illustrate the characteristics of the proposed graph GOSPA metric family with different choices of hyperparameters. The benefits of the proposed graph GOSPA metric family for classification tasks are also shown on real-world datasets.

artificial intelligence, graph gospa, machine learning, (12 more...)

2506.17316

Country: Europe (0.93)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)

Memery, Sean, Denamganai, Kevin, Kapron-King, Anna, Subr, Kartic

xInv: Explainable Optimization of Inverse Problems

Inverse problems are central to a wide range of fields, including healthcare, climate science, and agriculture. They involve the estimation of inputs, typically via iterative optimization, to some known forward model so that it produces a desired outcome. Despite considerable development in the explainability and interpretability of forward models, the iterative optimization of inverse problems remains largely cryptic to domain experts. We propose a methodology to produce explanations, from traces produced by an optimizer, that are interpretable by humans at the abstraction of the domain. The central idea in our approach is to instrument a differentiable simulator so that it emits natural language events during its forward and backward passes. In a post-process, we use a Language Model to create an explanation from the list of events. We demonstrate the effectiveness of our approach with an illustrative optimization problem and an example involving the training of a neural network.

large language model, machine learning, natural language, (19 more...)

2506.11056

Genre:

Workflow (1.00)
Questionnaire & Opinion Survey (1.00)
Research Report > New Finding (0.46)

Industry: Health & Medicine (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Learning to Reason under Off-Policy Guidance

Yan, Jianhao, Li, Yafu, Hu, Zican, Wang, Zhi, Cui, Ganqu, Qu, Xiaoye, Cheng, Yu, Zhang, Yue

Recent advances in large reasoning models (LRMs) demonstrate that sophisticated behaviors such as multi-step reasoning and self-reflection can emerge via reinforcement learning with verifiable rewards~(\textit{RLVR}). However, existing \textit{RLVR} approaches are inherently ``on-policy'', limiting learning to a model's own outputs and failing to acquire reasoning abilities beyond its initial capabilities. To address this issue, we introduce \textbf{LUFFY} (\textbf{L}earning to reason \textbf{U}nder o\textbf{FF}-polic\textbf{Y} guidance), a framework that augments \textit{RLVR} with off-policy reasoning traces. LUFFY dynamically balances imitation and exploration by combining off-policy demonstrations with on-policy rollouts during training. Specifically, LUFFY combines the Mixed-Policy GRPO framework, which has a theoretically guaranteed convergence rate, alongside policy shaping via regularized importance sampling to avoid superficial and rigid imitation during mixed-policy training. Compared with previous RLVR methods, LUFFY achieves an over \textbf{+6.4} average gain across six math benchmarks and an advantage of over \textbf{+6.2} points in out-of-distribution tasks. Most significantly, we show that LUFFY successfully trains weak models in scenarios where on-policy RLVR completely fails. These results provide compelling evidence that LUFFY transcends the fundamental limitations of on-policy RLVR and demonstrates the great potential of utilizing off-policy guidance in RLVR.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

2504.14945

Country:

Asia > China (0.28)
North America > United States (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
(3 more...)

Cloete, Jacques, Vertovec, Nikolaus, Abate, Alessandro

SPoRt -- Safe Policy Ratio: Certified Training and Deployment of Task Policies in Model-Free RL

To apply reinforcement learning to safety-critical applications, we ought to provide safety guarantees during both policy training and deployment. In this work, we present theoretical results that place a bound on the probability of violating a safety property for a new task-specific policy in a model-free, episodic setting. This bound, based on a maximum policy ratio computed with respect to a 'safe' base policy, can also be applied to temporally-extended properties (beyond safety) and to robust control problems. To utilize these results, we introduce SPoRt, which provides a data-driven method for computing this bound for the base policy using the scenario approach, and includes Projected PPO, a new projection-based approach for training the task-specific policy while maintaining a user-specified bound on property violation. SPoRt thus enables users to trade off safety guarantees against task-specific performance. Complementing our theoretical results, we present experimental results demonstrating this trade-off and comparing the theoretical bound to posterior bounds derived from empirical violation rates.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

2504.06386

Country:

North America > United States (0.93)
Europe (0.92)

Genre: Research Report > New Finding (0.34)

Industry: Transportation (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.45)

Mahlau, Yannik, Schier, Maximilian, Reinders, Christoph, Schubert, Frederik, Bügling, Marco, Rosenhahn, Bodo

Multi-Agent Reinforcement Learning for Inverse Design in Photonic Integrated Circuits

Inverse design of photonic integrated circuits (PICs) has traditionally relied on gradient-based optimization. However, this approach is prone to end up in local minima, which results in suboptimal design functionality. As interest in PICs increases due to their potential for addressing modern hardware demands through optical computing, more adaptive optimization algorithms are needed. We present a reinforcement learning (RL) environment as well as multi-agent RL algorithms for the design of PICs. By discretizing the design space into a grid, we formulate the design task as an optimization problem with thousands of binary variables. We consider multiple two-and three-dimensional design tasks that represent PIC components for an optical computing system. By decomposing the design space into thousands of individual agents, our algorithms are able to optimize designs with only a few thousand environment samples. They outperform previous state-of-the-art gradient-based optimization in both two-and three-dimensional design tasks. Our work may also serve as a benchmark for further exploration of sample-efficient RL for inverse design in photonics.

machine learning, optimization, reinforcement learning, (16 more...)

2506.18627

Country: Europe > Germany > Lower Saxony (0.28)

Genre: Research Report (0.64)

Industry:

Semiconductors & Electronics (0.71)
Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

Integrating Maneuverable Planning and Adaptive Control for Robot Cart-Pushing under Disturbances

Zhang, Zhe, Xie, Peijia, Sun, Zhirui, Xia, Bingyi, Zhu, Bi-Ke, Wang, Jiankun

--Precise and flexible cart-pushing is a challenging task for mobile robots. The motion constraints during cart-pushing and the robot's redundancy lead to complex motion planning problems, while variable payloads and disturbances present complicated dynamics. In this work, we propose a novel planning and control framework for flexible whole-body coordination and robust adaptive control. Our motion planning method employs a local coordinate representation and a novel kinematic model to solve a nonlinear optimization problem, thereby enhancing motion maneuverability by generating feasible and flexible push poses. Furthermore, we present a disturbance rejection control method to resist disturbances and reduce control errors for the complex control problem without requiring an accurate dynamic model. T o the best of our knowledge, this is the first work to systematically evaluate the flexibility and robustness of cart-pushing methods in experiments. The video supplement is available at https://sites.google.com/view/mpac-pushing/. Index T erms--Cart-pushing, mobile manipulation, whole-body control, adaptive control ANIPULA TION of cart-like objects is common in daily life, such as in cargo transportation, shopping assistance and luggage handling at airports. Recent studies [1]-[6] have explored using mobile robots to replace humans in these tasks. Most of these works simplify local planning and control by employing simple manipulators (e.g., single-link structures) or by limiting the robot's Degrees of Freedom (DoFs).

artificial intelligence, controller, optimization problem, (17 more...)

2506.1841

Country: Asia > China > Guangdong Province (0.15)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

Improvement on LiDAR-Camera Calibration Using Square Targets

Li, Zhongyuan, Gou, Honggang, Li, Ping, Guo, Jiaotong, Ye, Mao

-- Precise sensor calibration is critical for autonomous vehicles as a prerequisite for perception algorithms to function properly. Rotation error of one degree can translate to position error of meters in target object detection at large distance, leading to improper reaction of the system or even safety related issues. Many methods for multi-sensor calibration have been proposed. However, there are very few work that comprehensively consider the challenges of the calibration procedure when applied to factory manufacturing pipeline or after-sales service scenarios. In this work, we introduce a fully automatic LiDAR-camera extrinsic calibration algorithm based on targets that is fast, easy to deploy and robust to sensor noises such as missing data. The core of the method include: (1) an automatic multi-stage LiDAR board detection pipeline using only geometry information with no specific material requirement; (2) a fast coarse extrinsic parameter search mechanism that is robust to initial extrinsic errors; (3) a direct optimization algorithm that is robust to sensor noises. We validate the effectiveness of our methods through experiments on data captured in real world scenarios.

artificial intelligence, calibration, machine learning, (15 more...)

2506.18294

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.50)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Como, Giacomo, Fagnani, Fabio, Proskurnikov, Anton

Wisdom of Crowds Through Myopic Self-Confidence Adaptation

The wisdom of crowds is an umbrella term for phenomena suggesting that the collective judgment or decision of a large group can be more accurate than the individual judgments or decisions of the group members. A well-known example illustrating this concept is the competition at a country fair described by Galton, where the median value of the individual guesses about the weight of an ox resulted in an astonishingly accurate estimate of the actual weight. This phenomenon resembles classical results in probability theory and relies on independent decision-making. The accuracy of the group's final decision can be significantly reduced if the final agents' opinions are driven by a few influential agents. In this paper, we consider a group of agents who initially possess uncorrelated and unbiased noisy measurements of a common state of the world. Assume these agents iteratively update their estimates according to a simple non-Bayesian learning rule, commonly known in mathematical sociology as the French-DeGroot dynamics or iterative opinion pooling. As a result of this iterative distributed averaging process, each agent arrives at an asymptotic estimate of the state of the world, with the variance of this estimate determined by the matrix of weights the agents assign to each other. Every agent aims at minimizing the variance of her asymptotic estimate of the state of the world; however, such variance is also influenced by the weights allocated by other agents. To achieve the best possible estimate, the agents must then solve a game-theoretic, multi-objective optimization problem defined by the available sets of influence weights. We characterize both the Pareto frontier and the set of Nash equilibria in the resulting game. Additionally, we examine asynchronous best-response dynamics for the group of agents and prove their convergence to the set of strict Nash equilibria.

artificial intelligence, bayesian inference, machine learning, (20 more...)

2506.18195

Country: Europe (0.46)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.45)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Identity Disorder (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Integrating LLMs and Digital Twins for Adaptive Multi-Robot Task Allocation in Construction

Deng, Min, Fu, Bo, Li, Lingyao, Wang, Xi

--Multi-robot systems are emerging as a promising solution to the growing demand for productivity, safety, and adaptability across industrial sectors. However, effectively coordinating multiple robots in dynamic and uncertain environments, such as construction sites, remains a challenge, particularly due to unpredictable factors like material delays, unexpected site conditions, and weather-induced disruptions. T o address these challenges, this study proposes an adaptive task allocation framework that strategically leverages the synergistic potential of Digital Twins, Integer Programming (IP), and Large Language Models (LLMs). The multi-robot task allocation problem is formally defined and solved using an IP model that accounts for task dependencies, robot heterogeneity, scheduling constraints, and re-planning requirements. A mechanism for narrative-driven schedule adaptation is introduced, in which unstructured natural language inputs are interpreted by an LLM, and optimization constraints are autonomously updated, enabling human-in-the-loop flexibility without manual coding. A digital twin-based system has been developed to enable real-time synchronization between physical operations and their digital representations. This closed-loop feedback framework ensures that the system remains dynamic and responsive to ongoing changes on site. A case study demonstrates both the computational efficiency of the optimization algorithm and the reasoning performance of several LLMs, with top-performing models achieving over 97% accuracy in constraint and parameter extraction. The results confirm the practicality, adaptability, and cross-domain applicability of the proposed methods. Ith rising demands for faster project delivery and improved efficiency, automation is becoming an essential solution for the construction industry [1]-[3]. Robotics, particularly the use of coordinated teams of robots, offers a promising approach that could revolutionize traditional construction practices. Robotic systems are being employed on construction sites to assist with tasks such as material delivery [4], assembly [5]-[7], and installation [8], [9], with the potential to significantly improve efficiency [10], [11] and safety [12]. Min Deng is with the Department of Civil, Environmental, and Construction Engineering, Texas Tech University, Lubbock, TX 79409, USA (e-mail: mindeng@ttu.edu) Bo Fu is with Amazon Robotics, North Reading, MA 01864, USA (e-mail: bofu@amazon.com) Lingyao Li is with the School of Information, University of South Florida, Tampa, FL 33620, USA (e-mail: lingyaol@usf.edu) Xi Wang is with the Department of Construction Science, Texas A&M University, College Station, TX 77843, USA (e-mail: xiwang@tamu.edu)

constraint, large language model, machine learning, (22 more...)

2506.18178

Country:

North America > United States > Texas > Brazos County > College Station (0.54)
North America > United States > Florida > Hillsborough County > Tampa (0.54)

Genre:

Research Report > New Finding (1.00)
Research Report > Promising Solution (0.86)

Industry:

Construction & Engineering (1.00)
Energy > Renewable (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Optimization of Flying Ad Hoc Network Topology and Collaborative Path Planning for Multiple UAVs

He, Ming, Wang, Peizhao, Chen, Haihua, Sun, Bin, Wang, Hongpeng

--Multiple unmanned aerial vehicles (UA Vs) play a vital role in monitoring and data collection in wide area environments with harsh conditions. In most scenarios, issues such as real-time data retrieval and real-time UA V positioning are often disregarded, essentially neglecting the communication constraints. In this paper, we comprehensively address both the coverage of the target area and the data transmission capabilities of the flying ad hoc network (F ANET). The data throughput of the network is therefore maximized by optimizing the network topology and the UA V trajectories. The resultant optimization problem is effectively solved by the proposed reinforcement learning-based trajectory planning (RL-TP) algorithm and the convex-based topology optimization (C-TOP) algorithm sequentially. The C-TOP maximizes the data throughput of the network while simultaneously constraining the neighbors and transmit powers of the UA Vs, which is shown to be a convex problem that can be efficiently solved in polynomial time. Simulations and field experimental results show that the proposed optimization strategy can effectively plan the UA V trajectories and significantly improve the data throughput of the F ANET over the adaptive local minimum spanning tree (A-LMST) and cyclic pruning-assisted power optimization (CPAPO) methods. ONITORING tasks are generally demanding in forest, desert, alpine tundra and other wide-area environments, where infrastructure and human resources are scarce. However, relying solely on manpower to complete these tasks can be challenging and time consuming. Unmanned aerial vehicles (UA Vs) are therefore introduced as a substitute for humans, and multiple UA Vs compose a flying ad hoc network (FANET) to cover a wide area. FANET has attracted significant interest and found many applications in electric power inspection, security, urban mapping, and so on.

artificial intelligence, machine learning, real time system, (19 more...)

2506.17945

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.66)

Industry:

Transportation (0.68)
Information Technology (0.54)
Energy > Power Industry (0.34)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
(4 more...)