AITopics | Hu, Ting-Yao

Collaborating Authors

Hu, Ting-Yao

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Mutual Reinforcement of LLM Dialogue Synthesis and Summarization Capabilities for Few-Shot Dialogue Summarization

Lu, Yen-Ju, Hu, Ting-Yao, Koppula, Hema Swetha, Pouransari, Hadi, Chang, Jen-Hao Rick, Xia, Yin, Kong, Xiang, Zhu, Qi, Wang, Simon, Tuzel, Oncel, Vemulapalli, Raviteja

arXiv.org Artificial IntelligenceFeb-24-2025

In this work, we propose Mutual Reinforcing Data Synthesis (MRDS) within LLMs to improve few-shot dialogue summarization task. Unlike prior methods that require external knowledge, we mutually reinforce the LLM\'s dialogue synthesis and summarization capabilities, allowing them to complement each other during training and enhance overall performances. The dialogue synthesis capability is enhanced by directed preference optimization with preference scoring from summarization capability. The summarization capability is enhanced by the additional high quality dialogue-summary paired data produced by the dialogue synthesis capability. By leveraging the proposed MRDS mechanism, we elicit the internal knowledge of LLM in the format of synthetic data, and use it to augment the few-shot real training dataset. Empirical results demonstrate that our method improves dialogue summarization, achieving a 1.5% increase in ROUGE scores and a 0.3% improvement in BERT scores in few-shot settings. Furthermore, our method attains the highest average scores in human evaluations, surpassing both the pre-trained models and the baselines fine-tuned solely for summarization tasks.

artificial intelligence, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.17328

Country:

North America > United States (0.28)
Europe > Middle East > Malta (0.14)
Asia > Middle East > UAE (0.14)
North America > Mexico > Mexico City (0.14)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

MUSCLE: A Model Update Strategy for Compatible LLM Evolution

Echterhoff, Jessica, Faghri, Fartash, Vemulapalli, Raviteja, Hu, Ting-Yao, Li, Chun-Liang, Tuzel, Oncel, Pouransari, Hadi

arXiv.org Artificial IntelligenceJul-12-2024

Large Language Models (LLMs) are frequently updated due to data or architecture changes to improve their performance. When updating models, developers often focus on increasing overall performance metrics with less emphasis on being compatible with previous model versions. However, users often build a mental model of the functionality and capabilities of a particular machine learning model they are interacting with. They have to adapt their mental model with every update -- a draining task that can lead to user dissatisfaction. In practice, fine-tuned downstream task adapters rely on pretrained LLM base models. When these base models are updated, these user-facing downstream task models experience instance regression or negative flips -- previously correct instances are now predicted incorrectly. This happens even when the downstream task training procedures remain identical. Our work aims to provide seamless model updates to a user in two ways. First, we provide evaluation metrics for a notion of compatibility to prior model versions, specifically for generative tasks but also applicable for discriminative tasks. We observe regression and inconsistencies between different model versions on a diverse set of tasks and model updates. Second, we propose a training strategy to minimize the number of inconsistencies in model updates, involving training of a compatibility model that can enhance task fine-tuned language models. We reduce negative flips -- instances where a prior model version was correct, but a new model incorrect -- by up to 40% from Llama 1 to Llama 2.

large language model, machine learning, model update, (22 more...)

arXiv.org Artificial Intelligence

2407.09435

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Industry: Education (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Corpus Synthesis for Zero-shot ASR domain Adaptation using Large Language Models

Su, Hsuan, Hu, Ting-Yao, Koppula, Hema Swetha, Vemulapalli, Raviteja, Chang, Jen-Hao Rick, Yang, Karren, Mantena, Gautam Varma, Tuzel, Oncel

arXiv.org Artificial IntelligenceSep-18-2023

While Automatic Speech Recognition (ASR) systems are widely used in many real-world applications, they often do not generalize well to new domains and need to be finetuned on data from these domains. However, target-domain data usually are not readily available in many scenarios. In this paper, we propose a new strategy for adapting ASR models to new target domains without any text or speech from those domains. To accomplish this, we propose a novel data synthesis pipeline that uses a Large Language Model (LLM) to generate a target domain text corpus, and a state-of-the-art controllable speech synthesis model to generate the corresponding speech. We propose a simple yet effective in-context instruction finetuning strategy to increase the effectiveness of LLM in generating text corpora for new domains. Experiments on the SLURP dataset show that the proposed method achieves an average relative word error rate improvement of $28\%$ on unseen target domains without any performance drop in source domains.

artificial intelligence, large language model, natural language, (19 more...)

arXiv.org Artificial Intelligence

2309.10707

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Text is All You Need: Personalizing ASR Models using Controllable Speech Synthesis

Yang, Karren, Hu, Ting-Yao, Chang, Jen-Hao Rick, Koppula, Hema Swetha, Tuzel, Oncel

arXiv.org Artificial IntelligenceMar-26-2023

Adapting generic speech recognition models to specific individuals is a challenging problem due to the scarcity of personalized data. Recent works have proposed boosting the amount of training data using personalized text-to-speech synthesis. Here, we ask two fundamental questions about this strategy: when is synthetic data effective for personalization, and why is it effective in those cases? To address the first question, we adapt a state-of-the-art automatic speech recognition (ASR) model to target speakers from four benchmark datasets representative of different speaker types. We show that ASR personalization with synthetic data is effective in all cases, but particularly when (i) the target speaker is underrepresented in the global data, and (ii) the capacity of the global model is limited. To address the second question of why personalized synthetic data is effective, we use controllable speech synthesis to generate speech with varied styles and content. Surprisingly, we find that the text content of the synthetic data, rather than style, is important for speaker adaptation. These results lead us to propose a data selection strategy for ASR personalization based on speech content.

artificial intelligence, personalization, speech recognition, (15 more...)

arXiv.org Artificial Intelligence

2303.14885

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Synthesis (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.92)

Add feedback

SapAugment: Learning A Sample Adaptive Policy for Data Augmentation

Hu, Ting-Yao, Shrivastava, Ashish, Chang, Rick, Koppula, Hema, Braun, Stefan, Hwang, Kyuyeon, Kalini, Ozlem, Tuzel, Oncel

arXiv.org Machine LearningNov-2-2020

Data augmentation methods usually apply the same augmentation (or a mix of them) to all the training samples. For example, to perturb data with noise, the noise is sampled from a Normal distribution with a fixed standard deviation, for all samples. We hypothesize that a hard sample with high training loss already provides strong training signal to update the model parameters and should be perturbed with mild or no augmentation. Perturbing a hard sample with a strong augmentation may also make it too hard to learn from. Furthermore, a sample with low training loss should be perturbed by a stronger augmentation to provide more robustness to a variety of conditions. To formalize these intuitions, we propose a novel method to learn a Sample-Adaptive Policy for Augmentation -- SapAugment. Our policy adapts the augmentation parameters based on the training loss of the data samples. In the example of Gaussian noise, a hard sample will be perturbed with a low variance noise and an easy sample with a high variance noise. Furthermore, the proposed method combines multiple augmentation methods into a methodical policy learning framework and obviates hand-crafting augmentation parameters by trial-and-error. We apply our method on an automatic speech recognition (ASR) task, and combine existing and novel augmentations using the proposed framework. We show substantial improvement, up to 21% relative reduction in word error rate on LibriSpeech dataset, over the state-of-the-art speech augmentation method.

augmentation, neural network, speech recognition, (20 more...)

arXiv.org Machine Learning

2011.01156

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.57)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

Integrating Verbal and Nonvebval Input into a Dynamic Response Spoken Dialogue System

Hu, Ting-Yao (Carnegie Mellon University) | Raman, Chirag (Carnegie Mellon University) | Maza, Salvador Medina (Carnegie Mellon University) | Gui, Liangke (Carnegie Mellon University) | Baltrusaitis, Tadas ( Carnegie Mellon University ) | Frederking, Robert (Carnegie Mellon University) | Morency, Louis-Philippe (Carnegie Mellon University) | Black, Alan W. (Carnegie Mellon University) | Eskenazi, Maxine (Carnegie Mellon University)

AAAI ConferencesFeb-14-2017

In this work, we present a dynamic response spoken dialogue system (DRSDS). It is capable of understanding the verbal and nonverbal language of users and making instant, situation-aware response. Incorporating with two external systems, MultiSense and email summarization, we built an email reading agent on mobile device to show the functionality of DRSDS.

artificial intelligence, natural language, spoken dialogue system, (15 more...)

AAAI Conferences

Thirty-First AAAI Conference on Artificial Intelligence

Country: North America > United States > Pennsylvania (0.15)

Technology: Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)

Add feedback