Goto

Collaborating Authors

 kmp


Bridging VLM and KMP: Enabling Fine-grained robotic manipulation via Semantic Keypoints Representation

arXiv.org Artificial Intelligence

From early Movement Primitive (MP) techniques to modern Vision-Language Models (VLMs), autonomous manipulation has remained a pivotal topic in robotics. As two extremes, VLM-based methods emphasize zero-shot and adaptive manipulation but struggle with fine-grained planning. In contrast, MP-based approaches excel in precise trajectory generalization but lack decision-making ability. To leverage the strengths of the two frameworks, we propose VL-MP, which integrates VLM with Kernelized Movement Primitives (KMP) via a low-distortion decision information transfer bridge, enabling fine-grained robotic manipulation under ambiguous situations. One key of VL-MP is the accurate representation of task decision parameters through semantic keypoints constraints, leading to more precise task parameter generation. Additionally, we introduce a local trajectory feature-enhanced KMP to support VL-MP, thereby achieving shape preservation for complex trajectories. Extensive experiments conducted in complex real-world environments validate the effectiveness of VL-MP for adaptive and fine-grained manipulation.


State- and context-dependent robotic manipulation and grasping via uncertainty-aware imitation learning

arXiv.org Artificial Intelligence

Generating context-adaptive manipulation and grasping actions is a challenging problem in robotics. Classical planning and control algorithms tend to be inflexible with regard to parameterization by external variables such as object shapes. In contrast, Learning from Demonstration (LfD) approaches, due to their nature as function approximators, allow for introducing external variables to modulate policies in response to the environment. In this paper, we utilize this property by introducing an LfD approach to acquire context-dependent grasping and manipulation strategies. We treat the problem as a kernel-based function approximation, where the kernel inputs include generic context variables describing task-dependent parameters such as the object shape. We build on existing work on policy fusion with uncertainty quantification to propose a state-dependent approach that automatically returns to demonstrations, avoiding unpredictable behavior while smoothly adapting to context changes. The approach is evaluated against the LASA handwriting dataset and on a real 7-DoF robot in two scenarios: adaptation to slippage while grasping and manipulating a deformable food item.


Imitation Learning for Robotic Assisted Ultrasound Examination of Deep Venous Thrombosis using Kernelized Movement Primitives

arXiv.org Artificial Intelligence

Deep Vein Thrombosis (DVT) is a common yet potentially fatal condition, often leading to critical complications like pulmonary embolism. DVT is commonly diagnosed using Ultrasound (US) imaging, which can be inconsistent due to its high dependence on the operator's skill. Robotic US Systems (RUSs) aim to improve diagnostic test consistency but face challenges with the complex scanning pattern needed for DVT assessment, where precise control over US probe pressure is crucial for indirectly detecting occlusions. This work introduces an imitation learning method, based on Kernelized Movement Primitives (KMP), to standardize DVT US exams by training an autonomous robotic controller using sonographer demonstrations. A new recording device design enhances demonstration ergonomics, integrating with US probes and enabling seamless force and position data recording. KMPs are used to capture scanning skills, linking scan trajectory and force, enabling generalization beyond the demonstrations. Our approach, evaluated on synthetic models and volunteers, shows that the KMP-based RUS can replicate an expert's force control and image quality in DVT US examination. It outperforms previous methods using manually defined force profiles, improving exam standardization and reducing reliance on specialized sonographers.


Teaching MLP More Graph Information: A Three-stage Multitask Knowledge Distillation Framework

arXiv.org Artificial Intelligence

We study the challenging problem for inference tasks on large-scale graph datasets of Graph Neural Networks: huge time and memory consumption, and try to overcome it by reducing reliance on graph structure. Even though distilling graph knowledge to student MLP is an excellent idea, it faces two major problems of positional information loss and low generalization. To solve the problems, we propose a new three-stage multitask distillation framework. In detail, we use Positional Encoding to capture positional information. Also, we introduce Neural Heat Kernels responsible for graph data processing in GNN and utilize hidden layer outputs matching for better performance of student MLP's hidden layers. To the best of our knowledge, it is the first work to include hidden layer distillation for student MLP on graphs and to combine graph Positional Encoding with MLP. We test its performance and robustness with several settings and draw the conclusion that our work can outperform well with good stability.


Auto-LfD: Towards Closing the Loop for Learning from Demonstrations

arXiv.org Artificial Intelligence

Over the past few years, there have been numerous works towards advancing the generalization capability of robots, among which learning from demonstrations (LfD) has drawn much attention by virtue of its user-friendly and data-efficient nature. While many LfD solutions have been reported, a key question has not been properly addressed: how can we evaluate the generalization performance of LfD? For instance, when a robot draws a letter that needs to pass through new desired points, how does it ensure the new trajectory maintains a similar shape to the demonstration? This question becomes more relevant when a new task is significantly far from the demonstrated region. To tackle this issue, a user often resorts to manual tuning of the hyperparameters of an LfD approach until a satisfactory trajectory is attained. In this paper, we aim to provide closed-loop evaluative feedback for LfD and optimize LfD in an automatic fashion. Specifically, we consider dynamical movement primitives (DMP) and kernelized movement primitives (KMP) as examples and develop a generic optimization framework capable of measuring the generalization performance of DMP and KMP and auto-optimizing their hyperparameters without any human inputs. Evaluations including a peg-in-hole task and a pushing task on a real robot evidence the applicability of our framework.


A Non-parametric Skill Representation with Soft Null Space Projectors for Fast Generalization

arXiv.org Artificial Intelligence

Over the last two decades, the robotics community witnessed the emergence of various motion representations that have been used extensively, particularly in behavorial cloning, to compactly encode and generalize skills. Among these, probabilistic approaches have earned a relevant place, owing to their encoding of variations, correlations and adaptability to new task conditions. Modulating such primitives, however, is often cumbersome due to the need for parameter re-optimization which frequently entails computationally costly operations. In this paper we derive a non-parametric movement primitive formulation that contains a null space projector. We show that such formulation allows for fast and efficient motion generation with computational complexity O(n2) without involving matrix inversions, whose complexity is O(n3). This is achieved by using the null space to track secondary targets, with a precision determined by the training dataset. Using a 2D example associated with time input we show that our non-parametric solution compares favourably with a state-of-the-art parametric approach. For demonstrated skills with high-dimensional inputs we show that it permits on-the-fly adaptation as well.


Cheetah-Cub Quadruped Robot Learns to Walk, Trot Using Gait Patterns from Real Animal

AITopics Original Links

The rising interest in quadrupeds over the past few years has led to the development of several exciting new projects based on Cheetahs. One such robot is Cheetah-Cub, a compliant quadruped developed at the Biorobotics lab at the EPFL, the Swiss Federal Institute of Technology in Lausanne. To put Cheetah-Cub in motion, the EPFL group teamed up with researchers from the Italian Institute of Technology (IIT), who have recently managed to transfer horse-like locomotion to the robot. EPFL's Cheetah-Cub quadruped, which weighs just 1.1 kg (2.4 lb) and is about the size of a housecat, is powered by Kondo KRS2350 hobby servos. It's a compliant robot that, like IIT's COMAN humanoid, is part of the AMARSi (Adaptive Modular Architectures for Rich Motor Skills) project, which seeks to "improve biological richness of robotic motor skills."


Theory of matching pursuit

Neural Information Processing Systems

We analyse matching pursuit for kernel principal components analysis (KPCA) by proving that the sparse subspace it produces is a sample compression scheme. We show that this bound is tighter than the KPCA bound of Shawe-Taylor et al [7] and highly predictive of the size of the subspace needed to capture most of the variance in the data. We analyse a second matching pursuit algorithm called kernel matching pursuit (KMP) which does not correspond to a sample compression scheme. However, we give a novel bound that views the choice of subspace of the KMP algorithm as a compression scheme and hence provide a VC bound to upper bound its future loss. Finally we describe how the same bound can be applied to other matching pursuit related algorithms.


Eight Maximal Tractable Subclasses of Allen's Algebra with Metric Time

Journal of Artificial Intelligence Research

This paper combines two important directions of research in temporal resoning: that of finding maximal tractable subclasses of Allen's interval algebra, and that of reasoning with metric temporal information. Eight new maximal tractable subclasses of Allen's interval algebra are presented, some of them subsuming previously reported tractable algebras. The algebras allow for metric temporal constraints on interval starting or ending points, using the recent framework of Horn DLRs. Two of the algebras can express the notion of sequentiality between intervals, being the first such algebras admitting both qualitative and metric time.