AITopics | opensmile

Collaborating Authors

opensmile

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Enhancing Lie Detection Accuracy: A Comparative Study of Classic ML, CNN, and GCN Models using Audio-Visual Features

Abdelwahab, Abdelrahman, Vishnubhatla, Akshaj, Vaswani, Ayaan, Bharathulwar, Advait, Kommaraju, Arnav

arXiv.org Artificial IntelligenceOct-26-2024

Inaccuracies in polygraph tests often lead to wrongful convictions, false information, and bias, all of which have significant consequences for both legal and political systems. Recently, analyzing facial micro-expressions has emerged as a method for detecting deception; however, current models have not reached high accuracy and generalizability. The purpose of this study is to aid in remedying these problems. The unique multimodal transformer architecture used in this study improves upon previous approaches by using auditory inputs, visual facial micro-expressions, and manually transcribed gesture annotations, moving closer to a reliable non-invasive lie detection model. Visual and auditory features were extracted using the Vision Transformer and OpenSmile models respectively, which were then concatenated with the transcriptions of participants micro-expressions and gestures. Various models were trained for the classification of lies and truths using these processed and concatenated features. The CNN Conv1D multimodal model achieved an average accuracy of 95.4%. However, further research is still required to create higher-quality datasets and even more generalized models for more diverse applications.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2411.08885

Country:

North America > United States > Michigan (0.05)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
North America > United States > Wisconsin (0.04)
(4 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.94)

Industry:

Health & Medicine (0.93)
Law (0.66)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)

Add feedback

Easy, Interpretable, Effective: openSMILE for voice deepfake detection

Pascu, Octavian, Oneata, Dan, Cucu, Horia, Müller, Nicolas M.

arXiv.org Artificial IntelligenceAug-29-2024

In this paper, we demonstrate that attacks in the latest ASVspoof5 dataset -- a de facto standard in the field of voice authenticity and deepfake detection -- can be identified with surprising accuracy using a small subset of very simplistic features. These are derived from the openSMILE library, and are scalar-valued, easy to compute, and human interpretable. For example, attack A10`s unvoiced segments have a mean length of 0.09 +- 0.02, while bona fide instances have a mean length of 0.18 +- 0.07. Using this feature alone, a threshold classifier achieves an Equal Error Rate (EER) of 10.3% for attack A10. Similarly, across all attacks, we achieve up to 0.8% EER, with an overall EER of 15.7 +- 6.0%. We explore the generalization capabilities of these features and find that some of them transfer effectively between attacks, primarily when the attacks originate from similar Text-to-Speech (TTS) architectures. This finding may indicate that voice anti-spoofing is, in part, a problem of identifying and remembering signatures or fingerprints of individual TTS systems. This allows to better understand anti-spoofing models and their challenges in real-world application.

detection, opensmile, proc, (17 more...)

arXiv.org Artificial Intelligence

2408.15775

Country:

Europe > Romania > București - Ilfov Development Region > Municipality of Bucharest > Bucharest (0.05)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.87)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Add feedback

Multimodal Belief Prediction

Murzaku, John, Soubki, Adil, Rambow, Owen

arXiv.org Artificial IntelligenceJun-11-2024

Recognizing a speaker's level of commitment to a belief is a difficult task; humans do not only interpret the meaning of the words in context, but also understand cues from intonation and other aspects of the audio signal. Many papers and corpora in the NLP community have approached the belief prediction task using text-only approaches. We are the first to frame and present results on the multimodal belief prediction task. We use the CB-Prosody corpus (CBP), containing aligned text and audio with speaker belief annotations. We first report baselines and significant features using acoustic-prosodic features and traditional machine learning methods. We then present text and audio baselines for the CBP corpus fine-tuning on BERT and Whisper respectively. Finally, we present our multimodal architecture which fine-tunes on BERT and Whisper and uses multiple fusion methods, improving on both modalities alone.

computational linguistic, experiment, prediction, (15 more...)

arXiv.org Artificial Intelligence

2406.07466

Country:

North America > United States > New York > Suffolk County > Stony Brook (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report > Experimental Study (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback