animal pose estimation
APT-36K: A Large-scale Benchmark for Animal Pose Estimation and Tracking
Animal pose estimation and tracking (APT) is a fundamental task for detecting and tracking animal keypoints from a sequence of video frames. Previous animal-related datasets focus either on animal tracking or single-frame animal pose estimation, and never on both aspects. The lack of APT datasets hinders the development and evaluation of video-based animal pose estimation and tracking methods, limiting the applications in real world, e.g., understanding animal behavior in wildlife conservation. To fill this gap, we make the first step and propose APT-36K, i.e., the first large-scale benchmark for animal pose estimation and tracking. Specifically, APT-36K consists of 2,400 video clips collected and filtered from 30 animal species with 15 frames for each video, resulting in 36,000 frames in total. After manual annotation and careful double-check, high-quality keypoint and tracking annotations are provided for all the animal instances. Based on APT-36K, we benchmark several representative models on the following three tracks: (1) supervised animal pose estimation on a single frame under intra-and inter-domain transfer learning settings, (2) inter-species domain generalization test for unseen animals, and (3) animal pose estimation with animal tracking. Based on the experimental results, we gain some empirical insights and show that APT-36K provides a useful animal pose estimation and tracking benchmark, offering new challenges and opportunities for future research.
Cross-Species Transfer Learning in Agricultural AI: Evaluating ZebraPose Adaptation for Dairy Cattle Pose Estimation
Tapp, Mackenzie, Parivendan, Sibi Chakravarthy, Sailunaz, Kashfia, Neethirajan, Suresh
Pose estimation serves as a cornerstone of computer vision for understanding animal posture, behavior, and welfare. Yet, agricultural applications remain constrained by the scarcity of large, annotated datasets for livestock, especially dairy cattle. This study evaluates the potential and limitations of cross-species transfer learning by adapting ZebraPose - a vision transformer-based model trained on synthetic zebra imagery - for 27-keypoint detection in dairy cows under real barn conditions. Using three configurations - a custom on-farm dataset (375 images, Sussex, New Brunswick, Canada), a subset of the APT-36K benchmark dataset, and their combination, we systematically assessed model accuracy and generalization across environments. While the combined model achieved promising performance (AP = 0.86, AR = 0.87, PCK 0.5 = 0.869) on in-distribution data, substantial generalization failures occurred when applied to unseen barns and cow populations. These findings expose the synthetic-to-real domain gap as a major obstacle to agricultural AI deployment and emphasize that morphological similarity between species is insufficient for cross-domain transfer. The study provides practical insights into dataset diversity, environmental variability, and computational constraints that influence real-world deployment of livestock monitoring systems. We conclude with a call for agriculture-first AI design, prioritizing farm-level realism, cross-environment robustness, and open benchmark datasets to advance trustworthy and scalable animal-centric technologies.
- North America > Canada > New Brunswick (0.24)
- North America > Canada > Nova Scotia > Halifax Regional Municipality > Halifax (0.04)
- North America > Canada > Nova Scotia > Colchester County > Truro (0.04)
- Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
APT-36K: A Large-scale Benchmark for Animal Pose Estimation and Tracking
Animal pose estimation and tracking (APT) is a fundamental task for detecting and tracking animal keypoints from a sequence of video frames. Previous animal-related datasets focus either on animal tracking or single-frame animal pose estimation, and never on both aspects. The lack of APT datasets hinders the development and evaluation of video-based animal pose estimation and tracking methods, limiting the applications in real world, e.g., understanding animal behavior in wildlife conservation. To fill this gap, we make the first step and propose APT-36K, i.e., the first large-scale benchmark for animal pose estimation and tracking. Specifically, APT-36K consists of 2,400 video clips collected and filtered from 30 animal species with 15 frames for each video, resulting in 36,000 frames in total.
ZebraPose: Zebra Detection and Pose Estimation using only Synthetic Data
Synthetic data is increasingly being used to address the lack of labeled images in uncommon domains for deep learning tasks. A prominent example is 2D pose estimation of animals, particularly wild species like zebras, for which collecting real-world data is complex and impractical. However, many approaches still require real images, consistency and style constraints, sophisticated animal models, and/or powerful pre-trained networks to bridge the syn-to-real gap. Moreover, they often assume that the animal can be reliably detected in images or videos, a hypothesis that often does not hold, e.g. in wildlife scenarios or aerial images. To solve this, we use synthetic data generated with a 3D photorealistic simulator to obtain the first synthetic dataset that can be used for both detection and 2D pose estimation of zebras without applying any of the aforementioned bridging strategies. Unlike previous works, we extensively train and benchmark our detection and 2D pose estimation models on multiple real-world and synthetic datasets using both pre-trained and non-pre-trained backbones. These experiments show how the models trained from scratch and only with synthetic data can consistently generalize to real-world images of zebras in both tasks. Moreover, we show it is possible to easily generalize those same models to 2D pose estimation of horses with a minimal amount of real-world images to account for the domain transfer. Code, results, trained models; and the synthetic, training, and validation data, including 104K manually labeled frames, are provided as open-source at https://zebrapose.is.tue.mpg.de/
- Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.04)
- Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
Animal3D: A Comprehensive Dataset of 3D Animal Pose and Shape
Xu, Jiacong, Zhang, Yi, Peng, Jiawei, Ma, Wufei, Jesslen, Artur, Ji, Pengliang, Hu, Qixin, Zhang, Jiehua, Liu, Qihao, Wang, Jiahao, Ji, Wei, Wang, Chen, Yuan, Xiaoding, Kaushik, Prakhar, Zhang, Guofeng, Liu, Jie, Xie, Yushan, Cui, Yawen, Yuille, Alan, Kortylewski, Adam
Accurately estimating the 3D pose and shape is an essential step towards understanding animal behavior, and can potentially benefit many downstream applications, such as wildlife conservation. However, research in this area is held back by the lack of a comprehensive and diverse dataset with high-quality 3D pose and shape annotations. In this paper, we propose Animal3D, the first comprehensive dataset for mammal animal 3D pose and shape estimation. Animal3D consists of 3379 images collected from 40 mammal species, high-quality annotations of 26 keypoints, and importantly the pose and shape parameters of the SMAL model. All annotations were labeled and checked manually in a multi-stage process to ensure highest quality results. Based on the Animal3D dataset, we benchmark representative shape and pose estimation models at: (1) supervised learning from only the Animal3D data, (2) synthetic to real transfer from synthetically generated images, and (3) fine-tuning human pose and shape estimation models. Our experimental results demonstrate that predicting the 3D shape and pose of animals across species remains a very challenging task, despite significant advances in human pose estimation. Our results further demonstrate that synthetic pre-training is a viable strategy to boost the model performance. Overall, Animal3D opens new directions for facilitating future research in animal 3D pose and shape estimation, and is publicly available.
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > Canada > Alberta (0.14)
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.05)
- (6 more...)
A Unified Framework for Domain Adaptive Pose Estimation
Kim, Donghyun, Wang, Kaihong, Saenko, Kate, Betke, Margrit, Sclaroff, Stan
While pose estimation is an important computer vision task, it requires expensive annotation and suffers from domain shift. In this paper, we investigate the problem of domain adaptive 2D pose estimation that transfers knowledge learned on a synthetic source domain to a target domain without supervision. While several domain adaptive pose estimation models have been proposed recently, they are not generic but only focus on either human pose or animal pose estimation, and thus their effectiveness is somewhat limited to specific scenarios. In this work, we propose a unified framework that generalizes well on various domain adaptive pose estimation problems. We propose to align representations using both input-level and output-level cues (pixels and pose labels, respectively), which facilitates the knowledge transfer from the source domain to the unlabeled target domain. Our experiments show that our method achieves state-of-the-art performance under various domain shifts. Our method outperforms existing baselines on human pose estimation by up to 4.5 percent points (pp), hand pose estimation by up to 7.4 pp, and animal pose estimation by up to 4.8 pp for dogs and 3.3 pp for sheep. These results suggest that our method is able to mitigate domain shift on diverse tasks and even unseen domains and objects (e.g., trained on horse and tested on dog).
- Asia > Middle East > Jordan (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)