Self-supervised Learning on Camera Trap Footage Yields a Strong Universal Face Embedder

Iashin, Vladimir, Lee, Horace, Schofield, Dan, Zisserman, Andrew

Jul-15-2025–arXiv.org Artificial Intelligence

Camera traps are revolutionising wildlife monitoring by capturing vast amounts of visual data; however, the manual identification of individual animals remains a significant bottleneck. This study introduces a fully self-supervised approach to learning robust chimpanzee face embeddings from unlabeled camera-trap footage. Leveraging the DINOv2 framework, we train Vision Transformers on automatically mined face crops, eliminating the need for identity labels. Our method demonstrates strong open-set re-identification performance, surpassing supervised baselines on challenging benchmarks such as Bossou, despite utilising no labelled data during training. This work underscores the potential of self-supervised learning in biodiversity monitoring and paves the way for scalable, non-invasive population studies.

artificial intelligence, machine learning, recognition, (18 more...)

arXiv.org Artificial Intelligence

Jul-15-2025

arXiv.org PDF

Add feedback

Country:
- Africa (0.28)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Inductive Learning (0.61)
  - Vision > Face Recognition (0.49)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found