Goto

Collaborating Authors

 voice clips


On the Promise for Assurance of Differentiable Neurosymbolic Reasoning Paradigms

arXiv.org Artificial Intelligence

To create usable and deployable Artificial Intelligence (AI) systems, there requires a level of assurance in performance under many different conditions. Many times, deployed machine learning systems will require more classic logic and reasoning performed through neurosymbolic programs jointly with artificial neural network sensing. While many prior works have examined the assurance of a single component of the system solely with either the neural network alone or entire enterprise systems, very few works have examined the assurance of integrated neurosymbolic systems. Within this work, we assess the assurance of end-to-end fully differentiable neurosymbolic systems that are an emerging method to create data-efficient and more interpretable models. We perform this investigation using Scallop, an end-to-end neurosymbolic library, across classification and reasoning tasks in both the image and audio domains. We assess assurance across adversarial robustness, calibration, user performance parity, and interpretability of solutions for catching misaligned solutions. We find end-to-end neurosymbolic methods present unique opportunities for assurance beyond their data efficiency through our empirical results but not across the board. We find that this class of neurosymbolic models has higher assurance in cases where arithmetic operations are defined and where there is high dimensionality to the input space, where fully neural counterparts struggle to learn robust reasoning operations. We identify the relationship between neurosymbolic models' interpretability to catch shortcuts that later result in increased adversarial vulnerability despite performance parity. Finally, we find that the promise of data efficiency is typically only in the case of class imbalanced reasoning problems.


Is the Lecture Engaging for Learning? Lecture Voice Sentiment Analysis for Knowledge Graph-Supported Intelligent Lecturing Assistant (ILA) System

arXiv.org Artificial Intelligence

This paper introduces an intelligent lecturing assistant (ILA) system that utilizes a knowledge graph to represent course content and optimal pedagogical strategies. The system is designed to support instructors in enhancing student learning through real-time analysis of voice, content, and teaching methods. As an initial investigation, we present a case study on lecture voice sentiment analysis, in which we developed a training set comprising over 3,000 one-minute lecture voice clips. Each clip was manually labeled as either engaging or non-engaging. Utilizing this dataset, we constructed and evaluated several classification models based on a variety of features extracted from the voice clips. The results demonstrate promising performance, achieving an F1-score of 90% for boring lectures on an independent set of over 800 test voice clips. This case study lays the groundwork for the development of a more sophisticated model that will integrate content analysis and pedagogical practices. Our ultimate goal is to aid instructors in teaching more engagingly and effectively by leveraging modern artificial intelligence techniques.


Text to Speech System for Multi-Speaker Setting

#artificialintelligence

What would you want to do if you could generate the voice of your favorite celebrity? Before I get ahead of myself, let me clearly define the objective of this blog. Given text and some voice clips of the desired speaker (say, Beyonce), I want my AI to output an audio clip where Beyonce is speaking the text that I input to this code. So essentially, this is the same Text To Speech (TTS) problem we saw earlier but with an added constraint to output the speech in a particular speaker's voice. In this blog, I share two methods that can complete our task, and I will be comparing these two methods at the end.


Mozilla Common Voice- The Largest Dataset

#artificialintelligence

Mozilla Common Voice is the largest dataset that consists of thousands of hours of voice clips, in fifty different languages. Mozilla is planning to transform the voice technology ecosystem by releasing its own voice assistant. "The Common Voice dataset is set to contribute to the birth of'Firefox voice', and with the data gathered we cannot help but think the huge surprise we're in for soon." Mozilla released the largest public dataset of human voices available for use last year. Mozilla Firefox is a popular, open-source web browser, used by millions today.


Smart speaker recordings reviewed by humans

BBC News

Amazon, Apple and Google all employ staff who listen to customer voice recordings from their smart speakers and voice assistant apps. News site Bloomberg highlighted the topic after speaking to Amazon staff who "reviewed" Alexa recordings. All three companies say voice recordings are occasionally reviewed to improve speech recognition. But the reaction to the Bloomberg article suggests many customers are unaware that humans may be listening. The news site said it had spoken to seven people who reviewed audio from Amazon Echo smart speakers and the Alexa service.


Mozilla's open voice-recognition library now includes 18 languages

Engadget

Over the past year, Mozilla worked on expanding its Common Voice initiative to include open source voice recognition datasets in more languages. Now, the organization has released the largest collection of human voices available for use in 18 different languages, including Dutch, Hakha-Chin, Esperanto, Farsi, Basque, Spanish, French, German, Mandarin Chinese (Traditional), Welsh and Kabyle. The collection is composed of 1,400 hours of recorded voice clips from 42,000 contributors. Some of them are volunteers who just wanted to help out, while others are linguists and professionals working in voice technologies. Mozilla's Common Voice project aims to make it easier for developers who don't have the resources a bigger company (such as Apple or Google) does to create voice-enabled products.