burghardt
The PanAf-FGBG Dataset: Understanding the Impact of Backgrounds in Wildlife Behaviour Recognition
Brookes, Otto, Kukushkin, Maksim, Mirmehdi, Majid, Stephens, Colleen, Dieguez, Paula, Hicks, Thurston C., Jones, Sorrel, Lee, Kevin, McCarthy, Maureen S., Meier, Amelia, Normand, Emmanuelle, Wessling, Erin G., Wittig, Roman M., Langergraber, Kevin, Zuberbühler, Klaus, Boesch, Lukas, Schmid, Thomas, Arandjelovic, Mimi, Kühl, Hjalmar, Burghardt, Tilo
Computer vision analysis of camera trap video footage is essential for wildlife conservation, as captured behaviours offer some of the earliest indicators of changes in population health. Recently, several high-impact animal behaviour datasets and methods have been introduced to encourage their use; however, the role of behaviour-correlated background information and its significant effect on out-of-distribution generalisation remain unexplored. In response, we present the PanAf-FGBG dataset, featuring 20 hours of wild chimpanzee behaviours, recorded at over 350 individual camera locations. Uniquely, it pairs every video with a chimpanzee (referred to as a foreground video) with a corresponding background video (with no chimpanzee) from the same camera location. We present two views of the dataset: one with overlapping camera locations and one with disjoint locations. This setup enables, for the first time, direct evaluation of in-distribution and out-of-distribution conditions, and for the impact of backgrounds on behaviour recognition models to be quantified. All clips come with rich behavioural annotations and metadata including unique camera IDs and detailed textual scene descriptions. Additionally, we establish several baselines and present a highly effective latent-space normalisation technique that boosts out-of-distribution performance by +5.42% mAP for convolutional and +3.75% mAP for transformer-based models. Finally, we provide an in-depth analysis on the role of backgrounds in out-of-distribution behaviour recognition, including the so far unexplored impact of background durations (i.e., the count of background frames within foreground videos).
ChimpVLM: Ethogram-Enhanced Chimpanzee Behaviour Recognition
Brookes, Otto, Mirmehdi, Majid, Kuhl, Hjalmar, Burghardt, Tilo
We show that chimpanzee behaviour understanding from camera traps can be enhanced by providing visual architectures with access to an embedding of text descriptions that detail species behaviours. In particular, we present a vision-language model which employs multi-modal decoding of visual features extracted directly from camera trap videos to process query tokens representing behaviours and output class predictions. Query tokens are initialised using a standardised ethogram of chimpanzee behaviour, rather than using random or name-based initialisations. In addition, the effect of initialising query tokens using a masked language model fine-tuned on a text corpus of known behavioural patterns is explored. We evaluate our system on the PanAf500 and PanAf20K datasets and demonstrate the performance benefits of our multi-modal decoding approach and query initialisation strategy on multi-class and multi-label recognition tasks, respectively. Results and ablations corroborate performance improvements. We achieve state-of-the-art performance over vision and vision-language models in top-1 accuracy (+6.34%) on PanAf500 and overall (+1.1%) and tail-class (+2.26%) mean average precision on PanAf20K. We share complete source code and network weights for full reproducibility of results and easy utilisation.
Universal Bovine Identification via Depth Data and Deep Metric Learning
Sharma, Asheesh, Randewich, Lucy, Andrew, William, Hannuna, Sion, Campbell, Neill, Mullan, Siobhan, Dowsey, Andrew W., Smith, Melvyn, Hansen, Mark, Burghardt, Tilo
This paper proposes and evaluates, for the first time, a top-down (dorsal view), depth-only deep learning system for accurately identifying individual cattle and provides associated code, datasets, and training weights for immediate reproducibility. An increase in herd size skews the cow-to-human ratio at the farm and makes the manual monitoring of individuals more challenging. Therefore, real-time cattle identification is essential for the farms and a crucial step towards precision livestock farming. Underpinned by our previous work, this paper introduces a deep-metric learning method for cattle identification using depth data from an off-the-shelf 3D camera. The method relies on CNN and MLP backbones that learn well-generalised embedding spaces from the body shape to differentiate individuals -- requiring neither species-specific coat patterns nor close-up muzzle prints for operation. The network embeddings are clustered using a simple algorithm such as $k$-NN for highly accurate identification, thus eliminating the need to retrain the network for enrolling new individuals. We evaluate two backbone architectures, ResNet, as previously used to identify Holstein Friesians using RGB images, and PointNet, which is specialised to operate on 3D point clouds. We also present CowDepth2023, a new dataset containing 21,490 synchronised colour-depth image pairs of 99 cows, to evaluate the backbones. Both ResNet and PointNet architectures, which consume depth maps and point clouds, respectively, led to high accuracy that is on par with the coat pattern-based backbone.
Triple-stream Deep Metric Learning of Great Ape Behavioural Actions
Brookes, Otto, Mirmehdi, Majid, Kühl, Hjalmar, Burghardt, Tilo
We propose the first metric learning system for the recognition of great ape behavioural actions. Our proposed triple stream embedding architecture works on camera trap videos taken directly in the wild and demonstrates that the utilisation of an explicit DensePose-C chimpanzee body part segmentation stream effectively complements traditional RGB appearance and optical flow streams. We evaluate system variants with different feature fusion techniques and long-tail recognition approaches. Results and ablations show performance improvements of ~12% in top-1 accuracy over previous results achieved on the PanAf-500 dataset containing 180,000 manually annotated frames across nine behavioural actions. Furthermore, we provide a qualitative analysis of our findings and augment the metric learning system with long-tail recognition techniques showing that average per class accuracy -- critical in the domain -- can be improved by ~23% compared to the literature on that dataset. Finally, since our embedding spaces are constructed as metric, we provide first data-driven visualisations of the great ape behavioural action spaces revealing emerging geometry and topology. We hope that the work sparks further interest in this vital application area of computer vision for the benefit of endangered great apes.
Do bees play? A groundbreaking study says yes.
Many animals like to play, often for no other apparent reason than enjoyment. Pet owners know this is true for cats, dogs, even rodents--and scientists have observed the same in some fish, frogs, lizards, and birds. Are their minds and lives rich enough to make room for play? New research published in the journal Animal Behaviour suggests that bumblebees seem to enjoy rolling around wooden balls, without being trained or receiving rewards--presumably just because it's fun. "It shows that bees are not little robots that just respond to stimuli… and they do carry out activities that might be pleasurable," says lead author Samadi Galpayage, a researcher at the Queen Mary University of London.
Burghardt
Crowdsourcing can identify high-quality solutions to problems; however, individual decisions are constrained by cognitive biases. We investigate some of these biases in an experimental model of a question-answering system. We observe a strong position bias in favor of answers appearing earlier in a list of choices. This effect is enhanced by three cognitive factors: the attention an answer receives, its perceived popularity, and cognitive load, measured by the number of choices a user has to process. While separately weak, these effects synergistically amplify position bias and decouple user choices of best answers from their intrinsic quality. We end our paper by discussing the novel ways we can apply these findings to substantially improve how high-quality answers are found in question-answering systems.
Visual Microfossil Identification via Deep Metric Learning
Karaderi, Tayfun, Burghardt, Tilo, Hsiang, Allison Y., Ramaer, Jacob, Schmidt, Daniela N.
We apply deep metric learning for the first time to the prob-lem of classifying planktic foraminifer shells on microscopic images. This species recognition task is an important information source and scientific pillar for reconstructing past climates. All foraminifer CNN recognition pipelines in the literature produce black-box classifiers that lack visualisation options for human experts and cannot be applied to open set problems. Here, we benchmark metric learning against these pipelines, produce the first scientific visualisation of the phenotypic planktic foraminifer morphology space, and demonstrate that metric learning can be used to cluster species unseen during training. We show that metric learning out-performs all published CNN-based state-of-the-art benchmarks in this domain. We evaluate our approach on the 34,640 expert-annotated images of the Endless Forams public library of 35 modern planktic foraminifera species. Our results on this data show leading 92% accuracy (at 0.84 F1-score) in reproducing expert labels on withheld test data, and 66.5% accuracy (at 0.70 F1-score) when clustering species never encountered in training. We conclude that metric learning is highly effective for this domain and serves as an important tool towards expert-in-the-loop automation of microfossil identification. Key code, network weights, and data splits are published with this paper for full reproducibility.
How Can The European Private Equity Industry Best Embrace Artificial Intelligence?
"Artificial intelligence (AI) continues to progress and will eventually impact all industries," Alexandro Pando, the CEO of digital disruptive firm Xyrupt Technologies recently told Forbes. AI, which refers to the use of computer algorithms to replace the human ability to learn and make predictions, is booming. According to a new KPMG report, $12.4 billion has been invested in AI to date, with the figure expected to skyrocket to an incredible $232 billion by 2025. The study also found that 40% of industry leaders said they would be increasing investment in the technology by 20% or more over the next few years. The potential benefits of AI are varied and can positively impact a host of divergent sectors. In retail, for example, Japan's SoftBank telecom operations partnered with French firm Aldebaran to develop "Pepper" – a humanoid robot.