Goto

Collaborating Authors

 joint location


We would like to thank our reviewers for their constructive comments

Neural Information Processing Systems

We would like to thank our reviewers for their constructive comments. R1: Not first to look at full meshes. Accordingly, we will limit our claim to deep learning approaches for human pose reconstruction. We will release those, along with pre-trained models. Why does fig 4 show stick figures?


Direct Multi-view Multi-person 3D Pose Estimation Tao Wang

Neural Information Processing Systems

Notably, it achieves 92.3% AP Multi-view multi-person 3D pose estimation aims to localize 3D skeleton joints for each person instance in a scene from multi-view camera inputs. Additionally, we mitigate the commonly faced generalization issue by a simple query adaptation strategy.


Displacement-Actuated Continuum Robots: A Joint Space Abstraction

Grassmann, Reinhard M., Burgner-Kahrs, Jessica

arXiv.org Artificial Intelligence

The displacement-actuated continuum robot as an abstraction has been shown as a key abstraction to significantly simplify and improve approaches due to its relation to the Clarke transform. To highlight further potentials, we revisit and extend this abstraction that features an increasingly popular length extension and an underutilized twisting. For each extension, the corresponding mapping from the joint values to the local coordinates of the manifold embedded in the joint spaces is provided. Each mapping is characterized by its compactness and linearity.


Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation

Jonathan J. Tompson, Arjun Jain, Yann LeCun, Christoph Bregler

Neural Information Processing Systems

This paper proposes a new hybrid architecture that consists of a deep Convolutional Network and a Markov Random Field. We show how this architecture is successfully applied to the challenging problem of articulated human pose estimation in monocular images. The architecture can exploit structural domain constraints such as geometric relationships between body joint locations. We show that joint training of these two model paradigms improves performance and allows us to significantly outperform existing state-of-the-art techniques.


Clarke Transform and Encoder-Decoder Architecture for Arbitrary Joints Locations in Displacement-Actuated Continuum Robots

Grassmann, Reinhard M., Burgner-Kahrs, Jessica

arXiv.org Artificial Intelligence

Abstract-- In this paper, we consider an arbitrary number of joints and their arbitrary joint locations along the center line of a displacement-actuated continuum robot. To achieve this, we revisit the derivation of the Clarke transform leading to a formulation capable of considering arbitrary joint locations. The proposed modified Clarke transform opens new opportunities in mechanical design and algorithmic approaches beyond the current limiting dependency on symmetric arranged joint locations. By presenting an encoder-decoder architecture based on the Clarke transform, joint values between different robot designs can be transformed enabling the use of an analogous robot design and direct knowledge transfer. To demonstrate its versatility, applications of control and trajectory generation in simulation are presented, which can be easily integrated into an existing framework designed, for instance, for three symmetric arranged joints. Joint location and improved joint representation.


Direct Multi-view Multi-person 3D Pose Estimation

Neural Information Processing Systems

We present Multi-view Pose transformer (MvP) for estimating multi-person 3D poses from multi-view images. Instead of estimating 3D joint locations from costly volumetric representation or reconstructing the per-person 3D pose from multiple detected 2D poses as in previous methods, MvP directly regresses the multi-person 3D poses in a clean and efficient way, without relying on intermediate tasks. Specifically, MvP represents skeleton joints as learnable query embeddings and let them progressively attend to and reason over the multi-view information from the input images to directly regress the actual 3D joint locations. To improve the accuracy of such a simple pipeline, MvP presents a hierarchical scheme to concisely represent query embeddings of multi-person skeleton joints and introduces an input-dependent query adaptation approach. Further, MvP designs a novel geometrically guided attention mechanism, called projective attention, to more precisely fuse the cross-view information for each joint.


Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation

Neural Information Processing Systems

This paper proposes a new hybrid architecture that consists of a deep Convolutional Network and a Markov Random Field. We show how this architecture is successfully applied to the challenging problem of articulated human pose estimation in monocular images. The architecture can exploit structural domain constraints such as geometric relationships between body joint locations. We show that joint training of these two model paradigms improves performance and allows us to significantly outperform existing state-of-the-art techniques.


Towards a large-scale fused and labeled dataset of human pose while interacting with robots in shared urban areas

Sherafat, E., Farooq, B.

arXiv.org Artificial Intelligence

Over the last decade, Autonomous Delivery Robots (ADRs) have transformed conventional delivery methods, responding to the growing e-commerce demand. However, the readiness of ADRs to navigate safely among pedestrians in shared urban areas remains an open question. We contend that there are crucial research gaps in understanding their interactions with pedestrians in such environments. Human Pose Estimation is a vital stepping stone for various downstream applications, including pose prediction and socially aware robot path-planning. Yet, the absence of an enriched and pose-labeled dataset capturing human-robot interactions in shared urban areas hinders this objective. In this paper, we bridge this gap by repurposing, fusing, and labeling two datasets, MOT17 and NCLT, focused on pedestrian tracking and Simultaneous Localization and Mapping (SLAM), respectively. The resulting unique dataset represents thousands of real-world indoor and outdoor human-robot interaction scenarios. Leveraging YOLOv7, we obtained human pose visual and numeric outputs and provided ground truth poses using manual annotation. To overcome the distance bias present in the traditional MPJPE metric, this study introduces a novel human pose estimation error metric called Mean Scaled Joint Error (MSJE) by incorporating bounding box dimensions into it. Findings demonstrate that YOLOv7 effectively estimates human pose in both datasets. However, it exhibits weaker performance in specific scenarios, like indoor, crowded scenes with a focused light source, where both MPJPE and MSJE are recorded as 10.89 and 25.3, respectively. In contrast, YOLOv7 performs better in single-person estimation (NCLT seq 2) and outdoor scenarios (MOT17 seq1), achieving MSJE values of 5.29 and 3.38, respectively.


yoga pose classification

#artificialintelligence

OpenPose is a multi-person real-time keypoint detection which brought a revolution in the field of pose estimation. It was invented in Carnegie Mellon University (CMU) by the Perceptual Computing Lab. It uses CNN based architecture to identify facial, hand and foot keypoints of a human body from single images. OpenPose helps identify human body joints using an RGB camera. OpenPose keypoints include eyes, ears, neck, nose, elbows, shoulders, knees, wrists, ankles and hips.


Human Pose Estimation

#artificialintelligence

Human Pose Estimation is an important task in Computer Vision which has gained a lot of attention the last years and has a wide range of applications like human-computer interaction, gaming, action recognition, computer-assisted living, special effects. It has rapidly progressed with the advent of neural networks in the deep learning era. The goal of 3D human pose estimation is to estimate the joints location of one or more human bodies in 2D or 3D space from a single image. Joints are connected to form a skeleton to describe the pose of the person. OpenPose is the most popular open-source tool for body, foot, hand, and facial keypoint detection.