ieee
HOH: Markerless Multimodal Human-Object-Human Handover Dataset with Large Object Count
We present the HOH (Human-Object-Human) Handover Dataset, a large object count dataset with 136 objects, to accelerate data-driven research on handover studies, human-robot handover implementation, and artificial intelligence (AI) on handover parameter estimation from 2D and 3D data of two-person interactions. HOH contains multi-view RGB and depth data, skeletons, fused point clouds, grasp type and handedness labels, object, giver hand, and receiver hand 2D and 3D segmentations, giver and receiver comfort ratings, and paired object metadata and aligned 3D models for 2,720 handover interactions spanning 136 objects and 20 giver-receiver pairs--40 with role-reversal--organized from 40 participants. We also show experimental results of neural networks trained using HOH to perform grasp, orientation, and trajectory prediction. As the only fully markerless handover capture dataset, HOH represents natural human-human handover interactions, overcoming challenges with markered datasets that require specific suiting for body tracking, and lack high-resolution hand tracking. To date, HOH is the largest handover dataset in terms of object count, participant count, pairs with role reversal accounted for, and total interactions captured.
Learning Conditional Deformable Templates with Convolutional Networks
Adrian Dalca, Marianne Rakic, John Guttag, Mert Sabuncu
In these frameworks, templates are constructed using an iterative process of template estimation and alignment, which is often computationally very expensive. Due in part to this shortcoming, most methods compute asingle template for the entire population of images, or a few templates for specific sub-groups of the data.
A Training Objectives Our model is trained from scratch with the semantic loss L
The computational overhead of CluB is 1.2 / 1.3 times that of the BEV -only A detailed comparison is shown in the following table. GPUs and the batch size per GPU is set as 2. Table 2: Ablation study on the effect of the two kinds of object queries for the transformer decoder. Red boxes and green boxes are the predictions and ground-truth, respectively. Transfusion: Robust lidar-camera fusion for 3d object detection with transformers. Fully sparse 3d object detection.