AITopics | voice clips

Collaborating Authors

voice clips

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

On the Promise for Assurance of Differentiable Neurosymbolic Reasoning Paradigms

Richards, Luke E., Yaros, Jessie, Babcock, Jasen, Ly, Coung, Cosbey, Robin, Doster, Timothy, Matuszek, Cynthia

arXiv.org Artificial IntelligenceFeb-12-2025

To create usable and deployable Artificial Intelligence (AI) systems, there requires a level of assurance in performance under many different conditions. Many times, deployed machine learning systems will require more classic logic and reasoning performed through neurosymbolic programs jointly with artificial neural network sensing. While many prior works have examined the assurance of a single component of the system solely with either the neural network alone or entire enterprise systems, very few works have examined the assurance of integrated neurosymbolic systems. Within this work, we assess the assurance of end-to-end fully differentiable neurosymbolic systems that are an emerging method to create data-efficient and more interpretable models. We perform this investigation using Scallop, an end-to-end neurosymbolic library, across classification and reasoning tasks in both the image and audio domains. We assess assurance across adversarial robustness, calibration, user performance parity, and interpretability of solutions for catching misaligned solutions. We find end-to-end neurosymbolic methods present unique opportunities for assurance beyond their data efficiency through our empirical results but not across the board. We find that this class of neurosymbolic models has higher assurance in cases where arithmetic operations are defined and where there is high dimensionality to the input space, where fully neural counterparts struggle to learn robust reasoning operations. We identify the relationship between neurosymbolic models' interpretability to catch shortcuts that later result in increased adversarial vulnerability despite performance parity. Finally, we find that the promise of data efficiency is typically only in the case of class imbalanced reasoning problems.

artificial intelligence, machine learning, robustness, (18 more...)

arXiv.org Artificial Intelligence

2502.08932

Country:

North America > United States > Maryland > Baltimore County (0.04)
North America > United States > Maryland > Baltimore (0.04)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
Europe > Central Europe (0.04)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (0.68)
Government (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Is the Lecture Engaging for Learning? Lecture Voice Sentiment Analysis for Knowledge Graph-Supported Intelligent Lecturing Assistant (ILA) System

An, Yuan, Kolanupaka, Samarth, An, Jacob, Ma, Matthew, Chhatwal, Unnat, Kalinowski, Alex, Rogers, Michelle, Smith, Brian

arXiv.org Artificial IntelligenceAug-19-2024

This paper introduces an intelligent lecturing assistant (ILA) system that utilizes a knowledge graph to represent course content and optimal pedagogical strategies. The system is designed to support instructors in enhancing student learning through real-time analysis of voice, content, and teaching methods. As an initial investigation, we present a case study on lecture voice sentiment analysis, in which we developed a training set comprising over 3,000 one-minute lecture voice clips. Each clip was manually labeled as either engaging or non-engaging. Utilizing this dataset, we constructed and evaluated several classification models based on a variety of features extracted from the voice clips. The results demonstrate promising performance, achieving an F1-score of 90% for boring lectures on an independent set of over 800 test voice clips. This case study lays the groundwork for the development of a more sophisticated model that will integrate content analysis and pedagogical practices. Our ultimate goal is to aid instructors in teaching more engagingly and effectively by leveraging modern artificial intelligence techniques.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2408.10492

Country:

North America > United States > New York > New York County > New York City (0.05)
North America > United States > District of Columbia > Washington (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
(5 more...)

Genre:

Instructional Material > Course Syllabus & Notes (1.00)
Research Report > New Finding (0.88)

Industry:

Education > Educational Setting (1.00)
Health & Medicine (0.93)
Education > Educational Technology > Educational Software > Computer Based Training (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.66)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.64)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.64)

Add feedback

Text to Speech System for Multi-Speaker Setting

#artificialintelligenceMay-26-2021, 07:20:25 GMT

What would you want to do if you could generate the voice of your favorite celebrity? Before I get ahead of myself, let me clearly define the objective of this blog. Given text and some voice clips of the desired speaker (say, Beyonce), I want my AI to output an audio clip where Beyonce is speaking the text that I input to this code. So essentially, this is the same Text To Speech (TTS) problem we saw earlier but with an added constraint to output the speech in a particular speaker's voice. In this blog, I share two methods that can complete our task, and I will be comparing these two methods at the end.

beyonce, dataset, speaker encoder, (14 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.65)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.65)
Information Technology > Artificial Intelligence > Assistive Technologies (0.65)

Add feedback

Mozilla Common Voice- The Largest Dataset

#artificialintelligenceJul-23-2020, 13:40:13 GMT

Mozilla Common Voice is the largest dataset that consists of thousands of hours of voice clips, in fifty different languages. Mozilla is planning to transform the voice technology ecosystem by releasing its own voice assistant. "The Common Voice dataset is set to contribute to the birth of'Firefox voice', and with the data gathered we cannot help but think the huge surprise we're in for soon." Mozilla released the largest public dataset of human voices available for use last year. Mozilla Firefox is a popular, open-source web browser, used by millions today.

mozilla, voice clips, voice technology ecosystem, (8 more...)

#artificialintelligence

Industry: Information Technology (0.39)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.58)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.41)

Add feedback

Smart speaker recordings reviewed by humans

BBC NewsApr-11-2019, 16:39:35 GMT

Amazon, Apple and Google all employ staff who listen to customer voice recordings from their smart speakers and voice assistant apps. News site Bloomberg highlighted the topic after speaking to Amazon staff who "reviewed" Alexa recordings. All three companies say voice recordings are occasionally reviewed to improve speech recognition. But the reaction to the Bloomberg article suggests many customers are unaware that humans may be listening. The news site said it had spoken to seven people who reviewed audio from Amazon Echo smart speakers and the Alexa service.

artificial intelligence, chatbot, natural language, (17 more...)

BBC News

AI-Alerts: 2019 > 2019-04 > AAAI AI-Alert for Apr 16, 2019 (1.00)

Genre: Research Report (0.36)

Industry:

Information Technology > Security & Privacy (1.00)
Appliances & Durable Goods (0.87)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.95)

Add feedback

Mozilla's open voice-recognition library now includes 18 languages

EngadgetFeb-28-2019, 09:27:45 GMT

Over the past year, Mozilla worked on expanding its Common Voice initiative to include open source voice recognition datasets in more languages. Now, the organization has released the largest collection of human voices available for use in 18 different languages, including Dutch, Hakha-Chin, Esperanto, Farsi, Basque, Spanish, French, German, Mandarin Chinese (Traditional), Welsh and Kabyle. The collection is composed of 1,400 hours of recorded voice clips from 42,000 contributors. Some of them are volunteers who just wanted to help out, while others are linguists and professionals working in voice technologies. Mozilla's Common Voice project aims to make it easier for developers who don't have the resources a bigger company (such as Apple or Google) does to create voice-enabled products.

acoustic processing, mozilla, speech recognition, (2 more...)

Engadget

Country: Asia > Myanmar > Chin State > Hakha (0.29)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.80)
Information Technology > Artificial Intelligence > Speech > Acoustic Processing (0.65)

Add feedback