AITopics | utterance pair

Collaborating Authors

utterance pair

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Introducing voice timbre attribute detection

He, Jinghao, Sheng, Zhengyan, Chen, Liping, Lee, Kong Aik, Ling, Zhen-Hua

arXiv.org Artificial IntelligenceJun-24-2025

This paper focuses on explaining the timbre conveyed by speech signals and introduces a task termed voice timbre attribute detection (vTAD). In this task, voice timbre is explained with a set of sensory attributes describing its human perception. A pair of speech utterances is processed, and their intensity is compared in a designated timbre descriptor. Moreover, a framework is proposed, which is built upon the speaker embeddings extracted from the speech utterances. The investigation is conducted on the VCTK-RVA dataset. Experimental examinations on the ECAPA-TDNN and FACodec speaker encoders demonstrated that: 1) the ECAPA-TDNN speaker encoder was more capable in the seen scenario, where the testing speakers were included in the training set; 2) the FACodec speaker encoder was superior in the unseen scenario, where the testing speakers were not part of the training, indicating enhanced generalization capability. The VCTK-RVA dataset and open-source code are available on the website https://github.com/vTAD2025-Challenge/vTAD.

artificial intelligence, descriptor, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2505.09661

Country: Asia > China > Hong Kong (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

LANID: LLM-assisted New Intent Discovery

Fan, Lu, Pu, Jiashu, Zhang, Rongsheng, Wu, Xiao-Ming

arXiv.org Artificial IntelligenceMar-31-2025

Task-oriented Dialogue Systems (TODS) often face the challenge of encountering new intents. New Intent Discovery (NID) is a crucial task that aims to identify these novel intents while maintaining the capability to recognize existing ones. Previous efforts to adapt TODS to new intents have struggled with inadequate semantic representation or have depended on external knowledge, which is often not scalable or flexible. Recently, Large Language Models (LLMs) have demonstrated strong zero-shot capabilities; however, their scale can be impractical for real-world applications that involve extensive queries. To address the limitations of existing NID methods by leveraging LLMs, we propose LANID, a framework that enhances the semantic representation of lightweight NID encoders with the guidance of LLMs. Specifically, LANID employs the $K$-nearest neighbors and Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithms to sample selective utterance pairs from the training set. It then queries an LLM to ascertain the relationships between these pairs. The data produced from this process is utilized to design a contrastive fine-tuning task, which is then used to train a small encoder with a contrastive triplet loss. Our experimental results demonstrate the efficacy of the proposed method across three distinct NID datasets, surpassing strong baselines in both unsupervised and semi-supervised settings. Our code is available at https://github.com/floatSDSDS/LANID.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2503.2374

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Asia > China > Hong Kong (0.04)
North America > United States > District of Columbia > Washington (0.04)
(5 more...)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Uncovering the Potential of ChatGPT for Discourse Analysis in Dialogue: An Empirical Study

Fan, Yaxin, Jiang, Feng

arXiv.org Artificial IntelligenceMay-15-2023

Large Language Models (LLMs) like ChatGPT have proven a great shallow understanding of many traditional NLP tasks, such as translation, summarization, etc. However, its performance on high-level understanding, such as dialogue discourse analysis task that requires a higher level of understanding and reasoning, remains less explored. This study investigates ChatGPT's capabilities in three dialogue discourse tasks: topic segmentation, discourse relation recognition, and discourse parsing, of varying difficulty levels. To adapt ChatGPT to these tasks, we propose discriminative and generative paradigms and introduce the Chain of Thought (COT) approach to improve ChatGPT's performance in more difficult tasks. The results show that our generative paradigm allows ChatGPT to achieve comparative performance in the topic segmentation task comparable to state-of-the-art methods but reveals room for improvement in the more complex tasks of discourse relation recognition and discourse parsing. Notably, the COT can significantly enhance ChatGPT's performance with the help of understanding complex structures in more challenging tasks. Through a series of case studies, our in-depth analysis suggests that ChatGPT can be a good annotator in topic segmentation but has difficulties understanding complex rhetorical structures. We hope these findings provide a foundation for future research to refine dialogue discourse analysis approaches in the era of LLMs.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2305.08391

Country:

North America > United States > Ohio (0.04)
North America > United States > Missouri (0.04)
North America > Dominican Republic (0.04)
(5 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Design Guidelines for Inclusive Speaker Verification Evaluation Datasets

Hutiri, Wiebke Toussaint, Gorce, Lauriane, Ding, Aaron Yi

arXiv.org Artificial IntelligenceSep-13-2022

Speaker verification (SV) provides billions of voice-enabled devices with access control, and ensures the security of voice-driven technologies. As a type of biometrics, it is necessary that SV is unbiased, with consistent and reliable performance across speakers irrespective of their demographic, social and economic attributes. Current SV evaluation practices are insufficient for evaluating bias: they are over-simplified and aggregate users, not representative of real-life usage scenarios, and consequences of errors are not accounted for. This paper proposes design guidelines for constructing SV evaluation datasets that address these short-comings. We propose a schema for grading the difficulty of utterance pairs, and present an algorithm for generating inclusive SV datasets. We empirically validate our proposed method in a set of experiments on the VoxCeleb1 dataset. Our results confirm that the count of utterance pairs/speaker, and the difficulty grading of utterance pairs have a significant effect on evaluation performance and variability. Our work contributes to the development of SV evaluation practices that are inclusive and fair.

dataset, evaluation, utterance pair, (14 more...)

arXiv.org Artificial Intelligence

2204.02281

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > Canada (0.05)
Asia > India (0.05)
(9 more...)

Genre: Research Report > New Finding (0.69)

Industry:

Law (0.46)
Information Technology > Security & Privacy (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Acoustic Processing (0.87)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.63)

Add feedback

DSBERT:Unsupervised Dialogue Structure learning with BERT

Chen, Bingkun, Dai, Shaobing, Zheng, Shenghua, Liao, Lei, Li, Yang

arXiv.org Artificial IntelligenceNov-8-2021

Unsupervised dialogue structure learning is an important and meaningful task in natural language processing. The extracted dialogue structure and process can help analyze human dialogue, and play a vital role in the design and evaluation of dialogue systems. The traditional dialogue system requires experts to manually design the dialogue structure, which is very costly. But through unsupervised dialogue structure learning, dialogue structure can be automatically obtained, reducing the cost of developers constructing dialogue process. The learned dialogue structure can be used to promote the dialogue generation of the downstream task system, and improve the logic and consistency of the dialogue robot's reply.In this paper, we propose a Bert-based unsupervised dialogue structure learning algorithm DSBERT (Dialogue Structure BERT). Different from the previous SOTA models VRNN and SVRNN, we combine BERT and AutoEncoder, which can effectively combine context information. In order to better prevent the model from falling into the local optimal solution and make the dialogue state distribution more uniform and reasonable, we also propose three balanced loss functions that can be used for dialogue structure learning. Experimental results show that DSBERT can generate a dialogue structure closer to the real structure, can distinguish sentences with different semantics and map them to different hidden states.

dialogue, dialogue structure, latent state, (12 more...)

arXiv.org Artificial Intelligence

2111.04933

Country: Asia > China (0.04)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback