Evaluating Voice Command Pipelines for Drone Control: From STT and LLM to Direct Classification and Siamese Networks
Simões, Lucca Emmanuel Pineli, Rodrigues, Lucas Brandão, Silva, Rafaela Mota, da Silva, Gustavo Rodrigues
–arXiv.org Artificial Intelligence
The integration of automation and voice control in drone systems has received significant attention in recent research, driven by the need for more intuitive and efficient human-machine interaction [4, 1]. This project focuses on developing a voice command system for the Tello drone, utilizing speech recognition and deep learning models to translate voice commands into precise drone actions. The primary challenge addressed by this project is the accurate and efficient translation of voice commands into specific drone operations. This is particularly crucial in scenarios where traditional control interfaces are impractical or where operators require hands-free operation [10, 5]. To address this challenge, we developed and evaluated three distinct pipelines. The first pipeline uses a traditional Speech-to-Text (STT) model followed by a Large Language Model (LLM) for command interpretation [11]. The second pipeline involves a direct mapping model that predicts drone commands from audio inputs without intermediate text conversion. The third pipeline employs a Siamese neural network to generalize new commands by comparing audio inputs to pre-trained examples [8]. Each pipeline was designed to balance performance, flexibility, and ease of maintenance.
arXiv.org Artificial Intelligence
Jul-10-2024
- Country:
- South America > Brazil > Goiás > Goiânia (0.05)
- Genre:
- Research Report (0.64)
- Industry:
- Technology: