EmojiVoice: Towards long-term controllable expressivity in robot speech

Tuttösí, Paige, Mehta, Shivam, Syvenky, Zachary, Burkanova, Bermet, Henter, Gustav Eje, Lim, Angelica

Jul-31-2025–arXiv.org Artificial Intelligence

-- Humans vary their expressivity when speaking for extended periods to maintain engagement with their listener . Although social robots tend to be deployed with "expressive" joyful voices, they lack this long-term variation found in human speech. Foundation model text-to-speech systems are beginning to mimic the expressivity in human speech, but they are difficult to deploy offline on robots. We present EmojiV oice, a free, customizable text-to-speech (TTS) toolkit that allows social roboticists to build temporally variable, expressive speech on social robots. We introduce emoji-prompting to allow fine-grained control of expressivity on a phase level and use the lightweight Matcha-TTS backbone to generate speech in real-time. We explore three case studies: (1) a scripted conversation with a robot assistant, (2) a storytelling robot, and (3) an autonomous speech-to-speech interactive agent. We found that using varied emoji prompting improved the perception and expressivity of speech over a long period in a storytelling task, but expressive voice was not preferred in the assistant use case. I. INTRODUCTION Imagine a robot telling a 10-minute story to children. How would you like the robot to speak? The expression of paralinguistics such as emotions is an integral part of human speech [1], and humans convey expressivity by changing their expression over time [2], [3].

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

Jul-31-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report > Experimental Study (0.46)

Industry:
- Information Technology (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Robots (1.00)
  - Representation & Reasoning > Personal Assistant Systems (1.00)
  - Speech > Speech Synthesis (0.70)
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.47)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found