AITopics | speaker tag

Collaborating Authors

speaker tag

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Speaker Tagging Correction With Non-Autoregressive Language Models

Kirakosyan, Grigor, Karamyan, Davit

arXiv.org Artificial IntelligenceAug-30-2024

Speech applications dealing with conversations require not only recognizing the spoken words but also determining who spoke when. The task of assigning words to speakers is typically addressed by merging the outputs of two separate systems, namely, an automatic speech recognition (ASR) system and a speaker diarization (SD) system. In practical settings, speaker diarization systems can experience significant degradation in performance due to a variety of factors, including uniform segmentation with a high temporal resolution, inaccurate word timestamps, incorrect clustering and estimation of speaker numbers, as well as background noise. Therefore, it is important to automatically detect errors and make corrections if possible. We used a second-pass speaker tagging correction system based on a non-autoregressive language model to correct mistakes in words placed at the borders of sentences spoken by different speakers. We first show that the employed error correction approach leads to reductions in word diarization error rate (WDER) on two datasets: TAL and test set of Fisher. Additionally, we evaluated our system in the Post-ASR Speaker Tagging Correction challenge and observed significant improvements in cpWER compared to baseline methods.

dataset, sec model, speaker tag, (13 more...)

arXiv.org Artificial Intelligence

2409.00151

Country:

North America > United States (0.16)
Asia > Armenia > Yerevan > Yerevan (0.05)
Europe > United Kingdom > England (0.04)
Asia > Taiwan > Taiwan Province > Taipei (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.89)

Add feedback

Dialog Acts for Task-Driven Embodied Agents

Gella, Spandana, Padmakumar, Aishwarya, Lange, Patrick, Hakkani-Tur, Dilek

arXiv.org Artificial IntelligenceSep-26-2022

Embodied agents need to be able to interact in natural language understanding task descriptions and asking appropriate follow up questions to obtain necessary information to be effective at successfully accomplishing tasks for a wide range of users. In this work, we propose a set of dialog acts for modelling such dialogs and annotate the TEACh dataset that includes over 3,000 situated, task oriented conversations (consisting of 39.5k utterances in total) with dialog acts. TEACh-DA is one of the first large scale dataset of dialog act annotations for embodied task completion. Furthermore, we demonstrate the use of this annotated dataset in training models for tagging the dialog acts of a given utterance, predicting the dialog act of the next response given a dialog history, and use the dialog acts to guide agent's non-dialog behaviour. In particular, our experiments on the TEACh Execution from Dialog History task where the model predicts the sequence of low level actions to be executed in the environment for embodied task completion, demonstrate that dialog acts can improve end task success rate by up to 2 points compared to the system without dialog acts.

artificial intelligence, dialog act, natural language, (16 more...)

arXiv.org Artificial Intelligence

2209.12953

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Pennsylvania (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.90)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.61)

Add feedback