Transformer Redesign for Late Fusion of Audio-Text Features on Ultra-Low-Power Edge Hardware
Mitsis, Stavros, Hadjikyriakos, Ermos, Ibrahim, Humaid, Neofytou, Savvas, Raman, Shashwat, Myles, James, Kanjo, Eiman
–arXiv.org Artificial Intelligence
Deploying emotion recognition systems in real-world environments where devices must be small, low-power, and private remains a significant challenge. This is especially relevant for applications such as tension monitoring, conflict de-escalation, and responsive wearables, where cloud-based solutions are impractical. Multimodal emotion recognition has advanced through deep learning, but most systems remain unsuitable for deployment on ultra-constrained edge devices. Prior work typically relies on powerful hardware, lacks real-time performance, or uses unimodal input. This paper addresses that gap by presenting a hardware-aware emotion recognition system that combines acoustic and linguistic features using a late-fusion architecture optimised for Edge TPU. The design integrates a quantised transformer-based acoustic model with frozen keyword embeddings from a DSResNet-SE network, enabling real-time inference within a 1.8MB memory budget and 21-23ms latency. The pipeline ensures spectrogram alignment between training and deployment using MicroFrontend and MLTK. Evaluation on re-recorded, segmented IEMOCAP samples captured through the Coral Dev Board Micro microphone shows a 6.3% macro F1 improvement over unimodal baselines. This work demonstrates that accurate, real-time multimodal emotion inference is achievable on microcontroller-class edge platforms through task-specific fusion and hardware-guided model design.
arXiv.org Artificial Intelligence
Oct-22-2025
- Country:
- Europe > United Kingdom
- England
- Greater London > London (0.04)
- Nottinghamshire > Nottingham (0.04)
- England
- North America
- Canada > Quebec
- Montreal (0.04)
- United States > California (0.14)
- Canada > Quebec
- Europe > United Kingdom
- Genre:
- Research Report (0.83)
- Industry:
- Information Technology (0.88)
- Technology: