Goto

Collaborating Authors

 coordinate space




UPTor: Unified 3D Human Pose Dynamics and Trajectory Prediction for Human-Robot Interaction

arXiv.org Artificial Intelligence

We introduce a unified approach to forecast the dynamics of human keypoints along with the motion trajectory based on a short sequence of input poses. While many studies address either full-body pose prediction or motion trajectory prediction, only a few attempt to merge them. We propose a motion transformation technique to simultaneously predict full-body pose and trajectory key-points in a global coordinate frame. We utilize an off-the-shelf 3D human pose estimation module, a graph attention network to encode the skeleton structure, and a compact, non-autoregressive transformer suitable for real-time motion prediction for human-robot interaction and human-aware navigation. We introduce a human navigation dataset ``DARKO'' with specific focus on navigational activities that are relevant for human-aware mobile robot navigation. We perform extensive evaluation on Human3.6M, CMU-Mocap, and our DARKO dataset. In comparison to prior work, we show that our approach is compact, real-time, and accurate in predicting human navigation motion across all datasets. Result animations, our dataset, and code will be available at https://nisarganc.github.io/UPTor-page/


Robot Learning Using Multi-Coordinate Elastic Maps

arXiv.org Artificial Intelligence

To learn manipulation skills, robots need to understand the features of those skills. An easy way for robots to learn is through Learning from Demonstration (LfD), where the robot learns a skill from an expert demonstrator. While the main features of a skill might be captured in one differential coordinate (i.e., Cartesian), they could have meaning in other coordinates. For example, an important feature of a skill may be its shape or velocity profile, which are difficult to discover in Cartesian differential coordinate. In this work, we present a method which enables robots to learn skills from human demonstrations via encoding these skills into various differential coordinates, then determines the importance of each coordinate to reproduce the skill. We also introduce a modified form of Elastic Maps that includes multiple differential coordinates, combining statistical modeling of skills in these differential coordinate spaces. Elastic Maps, which are flexible and fast to compute, allow for the incorporation of several different types of constraints and the use of any number of demonstrations. Additionally, we propose methods for auto-tuning several parameters associated with the modified Elastic Map formulation. We validate our approach in several simulated experiments and a real-world writing task with a UR5e manipulator arm.


RobotFingerPrint: Unified Gripper Coordinate Space for Multi-Gripper Grasp Synthesis

arXiv.org Artificial Intelligence

Abstract-- We introduce a novel representation named as the unified gripper coordinate space for grasp synthesis of multiple grippers. The space is a 2D surface of a sphere in 3D using longitude and latitude as its coordinates, and it is shared for all robotic grippers. We propose a new algorithm to map the palm surface of a gripper into the unified gripper coordinate space, and design a conditional variational autoencoder to predict the unified gripper coordinates given an input object. The predicted unified gripper coordinates establish correspondences between the gripper and the object, which can be used in an optimization problem to solve the grasp pose and the finger joints for grasp synthesis. We demonstrate that using the unified gripper coordinate space improves the success rate and diversity in the grasp synthesis of multiple grippers.


OV9D: Open-Vocabulary Category-Level 9D Object Pose and Size Estimation

arXiv.org Artificial Intelligence

This paper studies a new open-set problem, the open-vocabulary category-level object pose and size estimation. Given human text descriptions of arbitrary novel object categories, the robot agent seeks to predict the position, orientation, and size of the target object in the observed scene image. To enable such generalizability, we first introduce OO3D-9D, a large-scale photorealistic dataset for this task. Derived from OmniObject3D, OO3D-9D is the largest and most diverse dataset in the field of category-level object pose and size estimation. It includes additional annotations for the symmetry axis of each category, which help resolve symmetric ambiguity. Apart from the large-scale dataset, we find another key to enabling such generalizability is leveraging the strong prior knowledge in pre-trained visual-language foundation models. We then propose a framework built on pre-trained DinoV2 and text-to-image stable diffusion models to infer the normalized object coordinate space (NOCS) maps of the target instances. This framework fully leverages the visual semantic prior from DinoV2 and the aligned visual and language knowledge within the text-to-image diffusion model, which enables generalization to various text descriptions of novel categories. Comprehensive quantitative and qualitative experiments demonstrate that the proposed open-vocabulary method, trained on our large-scale synthesized data, significantly outperforms the baseline and can effectively generalize to real-world images of unseen categories. The project page is at https://ov9d.github.io.


IMMP++: Isometric Motion Manifold Primitives with Parametric Curve Models

arXiv.org Artificial Intelligence

The Motion Manifold Primitive (MMP) produces, for a given task, a continuous manifold of trajectories, each of which can successfully complete the task, addressing the challenge of high dimensionality in trajectory data. However, the discrete-time trajectory representations used in existing MMP methods lack important functionalities of movement primitives (e.g., temporal modulation, via-points modulation, etc.) found in other conventional methods that employ parametric curve representations. To address these limitations, we introduce Motion Manifold Primitives++ (MMP++), which combines the advantages of the MMP and conventional methods by applying the MMP framework to the parametric curve representations. However, we observe that the performance of MMP++ can sometimes degrade significantly due to geometric distortion in the latent space -- by distortion, we mean that similar motions are not located nearby in the latent space. To mitigate this issue, we propose Isometric Motion Manifold Primitives++ (IMMP++), where the latent coordinate space preserves the geometry of the manifold. Experimental results with 2-DoF planar motions and 7-DoF robot arm tasks demonstrate that MMP++ and IMMP++ outperform existing methods, in some cases by a significant margin, while maintaining the advantages of parametric curve representations.


Solving High-Dimensional PDEs with Latent Spectral Models

arXiv.org Artificial Intelligence

Deep models have achieved impressive progress in solving partial differential equations (PDEs). A burgeoning paradigm is learning neural operators to approximate the input-output mappings of PDEs. While previous deep models have explored the multiscale architectures and various operator designs, they are limited to learning the operators as a whole in the coordinate space. In real physical science problems, PDEs are complex coupled equations with numerical solvers relying on discretization into high-dimensional coordinate space, which cannot be precisely approximated by a single operator nor efficiently learned due to the curse of dimensionality. We present Latent Spectral Models (LSM) toward an efficient and precise solver for high-dimensional PDEs. Going beyond the coordinate space, LSM enables an attention-based hierarchical projection network to reduce the high-dimensional data into a compact latent space in linear time. Inspired by classical spectral methods in numerical analysis, we design a neural spectral block to solve PDEs in the latent space that approximates complex input-output mappings via learning multiple basis operators, enjoying nice theoretical guarantees for convergence and approximation. Experimentally, LSM achieves consistent state-of-the-art and yields a relative gain of 11.5% averaged on seven benchmarks covering both solid and fluid physics. Code is available at https://github.com/thuml/Latent-Spectral-Models.


Charting a Manifold

Neural Information Processing Systems

The field has its roots in map- ping algorithms: DeMers and Cottrell [3] proposed using auto-encoding neural networks with a hidden layer " bottleneck," effectively casting dimensionality reduction as a com- pression problem. Hastie defined principal curves [ 5] as nonparametric 1D curves that pass through the center of " nearby" data points. A rich literature has grown up around properly regularizing this approach and extending it to surfaces. Smola and colleagues [10] analyzed the NLDR problem in the broader framework of regularized quantization methods. More recent advances aim for embeddings: Gomes and Mojsilovic [4] treat manifold com- pletion as an anisotropic diffusion problem, iteratively expanding points until they connect to their neighbors. The ISOMAP algorithm [12] represents remote distances as sums of a trusted set of distances between immediate neighbors, then uses multidimensional scaling to compute a low-dimensional embedding that minimally distorts all distances. The locally linear embedding algorithm (LLE) [9] represents each point as a weighted combination of a trusted set of nearest neighbors, then computes a minimally distorting low-dimensional barycentric embedding. They have complementary strengths: ISOMAP handles holes well but can fail if the data hull is nonconvex [12]; and vice versa for LLE [9].


When Big Data Goes Local, Small Data Gets Big

#artificialintelligence

In an earlier article "The Importance of Location in Real Estate, Weather, and Machine Learning," various meanings and applications of location-based discovery in data science and machine learning were discussed. One algorithm described there is a powerful but strangely named machine learning algorithm: the Support Vector Machine (SVM). In the remarks below, we summarize the significance and utility of another powerful but strangely named machine learning algorithm that focuses on location: Local Linear Embedding (LLE). LLE is a specific example from the general category of Manifold Learning algorithms. The most famous example of manifold learning with LLE is the Swiss jelly roll example (illustrated above).