Beyond Discrete Categories: Multi-Task Valence-Arousal Modeling for Pet Vocalization Analysis
–arXiv.org Artificial Intelligence
Traditional pet emotion recognition from vocalizations, based on discrete classification, struggles with ambiguity and capturing intensity variations. We propose a continuous Valence-Arousal (VA) model that represents emotions in a two-dimensional space. Our method uses an automatic VA label generation algorithm, enabling large-scale annotation of 42,553 pet vocalization samples. A multi-task learning framework jointly trains VA regression with auxiliary tasks (emotion, body size, gender) to enhance prediction by improving feature learning. Our Audio Transformer model achieves a validation Valence Pearson correlation of r = 0.9024 and an Arousal r = 0.7155, effectively resolving confusion between discrete categories like "territorial" and "happy." This work introduces the first continuous VA framework for pet vocalization analysis, offering a more expressive representation for human-pet interaction, veterinary diagnostics, and behavioral training. The approach shows strong potential for deployment in consumer products like AI pet emotion translators.
arXiv.org Artificial Intelligence
Oct-16-2025
- Country:
- Asia > China > Guangdong Province > Shenzhen (0.04)
- Genre:
- Research Report > New Finding (0.93)
- Industry:
- Health & Medicine
- Consumer Health (0.46)
- Therapeutic Area (0.68)
- Health & Medicine
- Technology:
- Information Technology > Artificial Intelligence
- Cognitive Science > Emotion (0.68)
- Machine Learning
- Neural Networks > Deep Learning (1.00)
- Performance Analysis > Accuracy (0.67)
- Statistical Learning (0.68)
- Natural Language > Large Language Model (0.93)
- Information Technology > Artificial Intelligence