AITopics | human pose

Collaborating Authors

human pose

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Synthetic-to-Real Pose Estimation with Geometric Reconstruction Qiuxia Lin 1 Kerui Gu1 Linlin Y ang 2, 3 Angela Y ao 1 1

Neural Information Processing SystemsFeb-16-2026, 09:31:30 GMT

The warping estimation module W is based on an hourglass with five conv3 3 - bn - relu - pool2 2 in the encoders and five upsample2 2 - conv3 3 - bn - relu blocks in the decoders. In G, we use the Johnson architecture [ 3 ] with two down-sampling blocks, six residual-blocks and two up-sampling blocks. The design follows [ 7 ]. The inputs are the base image, displacement field, and inpainting map. It downsampled 4 and upsampled 4 to get the output, i.e. the reconstructed image.

artificial intelligence, geometric reconstruction qiuxia lin 1, video understanding, (13 more...)

Neural Information Processing Systems

Country: Asia > Singapore (0.05)

Technology: Information Technology > Artificial Intelligence > Vision > Video Understanding (0.43)

Add feedback

2052b3e0617ecb2ce9474a6feaf422b3-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsFeb-8-2026, 19:16:24 GMT

computer vision, dataset, interaction, (11 more...)

Neural Information Processing Systems

Country:

Europe > Germany > North Rhine-Westphalia (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Vision (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Learning Disentangled Representation for Robust Person Re-identification

Neural Information Processing SystemsDec-26-2025, 00:36:30 GMT

We address the problem of person re-identification (reID), that is, retrieving person images from a large dataset, given a query image of the person of interest. The key challenge is to learn person representations robust to intra-class variations, as different persons can have the same attribute and the same person's appearance looks different with viewpoint changes. Recent reID methods focus on learning discriminative features but robust to only a particular factor of variations (e.g., human pose) and this requires corresponding supervisory signals (e.g., pose annotations). To tackle this problem, we propose to disentangle identity-related and -unrelated features from person images. Identity-related features contain information useful for specifying a particular person (e.g.,clothing), while identity-unrelated ones hold other factors (e.g., human pose, scale changes). To this end, we introduce a new generative adversarial network, dubbed identity shuffle GAN (IS-GAN), that factorizes these features using identification labels without any auxiliary information. We also propose an identity shuffling technique to regularize the disentangled features. Experimental results demonstrate the effectiveness of IS-GAN, largely outperforming the state of the art on standard reID benchmarks including the Market-1501, CUHK03 and DukeMTMC-reID. Our code and models will be available online at the time of the publication.

learning disentangled representation, name change, robust person re-identification, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.63)

Add feedback

Humans in Kitchens: A Dataset for Multi-Person Human Motion Forecasting with Scene Context

Neural Information Processing SystemsDec-24-2025, 04:36:53 GMT

Forecasting human motion of multiple persons is very challenging. It requires to model the interactions between humans and the interactions with objects and the environment. For example, a person might want to make a coffee, but if the coffee machine is already occupied the person will haveto wait. These complex relations between scene geometry and persons ariseconstantly in our daily lives, and models that wish to accurately forecasthuman behavior will have to take them into consideration. To facilitate research in this direction, we propose Humans in Kitchens, alarge-scale multi-person human motion dataset with annotated 3D human poses, scene geometry and activities per person and frame.Our dataset consists of over 7.3h recorded data of up to 16 persons at the same time in four kitchen scenes, with more than 4M annotated human poses, represented by a parametric 3D body model. In addition, dynamic scene geometry and objects like chair or cupboard are annotated per frame. As first benchmarks, we propose two protocols for short-term and long-term human motion forecasting.

kitchen, multi-person human motion forecasting, name change, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.40)

Add feedback

Synthetic-to-Real Pose Estimation with Geometric Reconstruction Qiuxia Lin 1 Kerui Gu1 Linlin Y ang 2, 3 Angela Y ao 1 1

Neural Information Processing SystemsOct-9-2025, 04:00:29 GMT

artificial intelligence, geometric reconstruction qiuxia lin 1, video understanding, (13 more...)

Neural Information Processing Systems

Country: Asia > Singapore (0.05)

Technology: Information Technology > Artificial Intelligence > Vision > Video Understanding (0.43)

Add feedback

Humans in Kitchens: A Dataset for Multi-Person Human Motion Forecasting with Scene Context Julian Tanke

Neural Information Processing SystemsOct-8-2025, 06:31:16 GMT

artificial intelligence, dataset, machine learning, (13 more...)

Neural Information Processing Systems

Country:

Europe > Germany > North Rhine-Westphalia (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Vision (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Chirality Nets for Human Pose Regression

Raymond Yeh, Yuan-Ting Hu, Alexander Schwing

Neural Information Processing SystemsOct-2-2025, 08:28:25 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, pose estimation, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois (0.04)
North America > Canada (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Articulated Pose Estimation by a Graphical Model with Image Dependent Pairwise Relations

Neural Information Processing SystemsSep-30-2025, 09:27:38 GMT

We present a method for estimating articulated human pose from a single static image based on a graphical model with novel pairwise relations that make adaptive use of local image measurements. More precisely, we specify a graphical model for human pose which exploits the fact the local image measurements can be used both to detect parts (or joints) and also to predict the spatial relationships between them (Image Dependent Pairwise Relations). These spatial relationships are represented by a mixture model. We use Deep Convolutional Neural Networks (DCNNs) to learn conditional probabilities for the presence of parts and their spatial relationships within image patches. Hence our model combines the representational flexibility of graphical models with the efficiency and statistical power of DCNNs. Our method significantly outperforms the state of the art methods on the LSP and FLIC datasets and also performs very well on the Buffy dataset without any training.

articulated pose estimation, graphical model, image dependent pairwise relation, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.61)

Add feedback

Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark

Kim, Junsu, Kim, Naeun, Lee, Jaeho, Park, Incheol, Han, Dongyoon, Baek, Seungryul

arXiv.org Artificial IntelligenceJul-18-2025

The reasoning-based pose estimation (RPE) benchmark has emerged as a widely adopted evaluation standard for pose-aware multimodal large language models (MLLMs). Despite its significance, we identified critical reproducibility and benchmark-quality issues that hinder fair and consistent quantitative evaluations. Most notably, the benchmark utilizes different image indices from those of the original 3DPW dataset, forcing researchers into tedious and error-prone manual matching processes to obtain accurate ground-truth (GT) annotations for quantitative metrics ( e.g ., MPJPE, P A-MPJPE). Furthermore, our analysis reveals several inherent benchmark-quality limitations, including significant image redundancy, scenario imbalance, overly simplistic poses, and ambiguous textual descriptions, collectively undermining reliable evaluations across diverse scenarios. T o alleviate manual effort and enhance reproducibility, we carefully refined the GT annotations through meticulous visual matching and publicly release these refined annotations as an open-source resource, thereby promoting consistent quantitative evaluations and facilitating future advancements in human pose-aware mul-timodal reasoning.

benchmark, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2507.13314

Country: Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision > Video Understanding (0.73)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)

Add feedback

SpikeVideoFormer: An Efficient Spike-Driven Video Transformer with Hamming Attention and $\mathcal{O}(T)$ Complexity

Zou, Shihao, Li, Qingfeng, Ji, Wei, Li, Jingjing, Yang, Yongkui, Li, Guoqi, Dong, Chao

arXiv.org Artificial IntelligenceMay-16-2025

Spiking Neural Networks (SNNs) have shown competitive performance to Artificial Neural Networks (ANNs) in various vision tasks, while offering superior energy efficiency. However, existing SNN-based Transformers primarily focus on single-image tasks, emphasizing spatial features while not effectively leveraging SNNs' efficiency in video-based vision tasks. In this paper, we introduce SpikeVideoFormer, an efficient spike-driven video Transformer, featuring linear temporal complexity $\mathcal{O}(T)$. Specifically, we design a spike-driven Hamming attention (SDHA) which provides a theoretically guided adaptation from traditional real-valued attention to spike-driven attention. Building on SDHA, we further analyze various spike-driven space-time attention designs and identify an optimal scheme that delivers appealing performance for video tasks, while maintaining only linear temporal complexity. The generalization ability and efficiency of our model are demonstrated across diverse downstream video tasks, including classification, human pose tracking, and semantic segmentation. Empirical results show our method achieves state-of-the-art (SOTA) performance compared to existing SNN approaches, with over 15\% improvement on the latter two tasks. Additionally, it matches the performance of recent ANN-based methods while offering significant efficiency gains, achieving $\times 16$, $\times 10$ and $\times 5$ improvements on the three tasks. https://github.com/JimmyZou/SpikeVideoFormer

artificial intelligence, machine learning, spikevideoformer, (18 more...)

arXiv.org Artificial Intelligence

2505.10352

Country: