Comparative Analysis of Audio Feature Extraction for Real-Time Talking Portrait Synthesis

Salehi, Pegah, Sheshkal, Sajad Amouei, Thambawita, Vajira, Gautam, Sushant, Sabet, Saeed S., Johansen, Dag, Riegler, Michael A., Halvorsen, Pål

arXiv.org Artificial Intelligence 

The application of AI in education has gained widespread attention for its potential to enhance learning experiences across disciplines, including psychology [1, 2]. In the context of investigative interviewing, especially when questioning suspected child victims, AI offers a promising alternative to traditional training approaches. These conventional methods, often delivered through short workshops, fail to provide the hands-on practice, feedback, and continuous engagement needed for interviewers to master best practices in questioning child victims [3, 4]. Research has shown that while best practices recommend open-ended questions and discourage leading or suggestive queries [5, 6], many interviewers still struggle to implement these techniques effectively during real-world investigations [7]. The adoption of AI-powered child avatars provides a valuable solution, enabling Child Protective Services (CPS) workers to engage in realistic practice sessions without the ethical dilemmas associated with using real children, while simultaneously offering personalized feedback on their performance [8]. Our current system leverages advanced AI techniques within a structured virtual environment to train professionals in investigative interviewing. Specifically, this system integrates the Unity Engine to generate virtual avatars. Despite the potential advantages of our AI-based training system, its effectiveness largely depends on the perceived realism and fidelity of the virtual avatars used in these simulations [9]. Based on our findings, we observed that avatars generated using Generative Adversarial Networks (GANs) demonstrated higher levels of realism compared to those created with the Unity Engine in several key aspects [10].