AITopics | Udagawa, Takuma

Collaborating Authors

Udagawa, Takuma

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Robust ASR Error Correction with Conservative Data Filtering

Udagawa, Takuma, Suzuki, Masayuki, Muraoka, Masayasu, Kurata, Gakuto

arXiv.org Artificial IntelligenceJul-18-2024

Error correction (EC) based on large language models is an emerging technology to enhance the performance of automatic speech recognition (ASR) systems. Generally, training data for EC are collected by automatically pairing a large set of ASR hypotheses (as sources) and their gold references (as targets). However, the quality of such pairs is not guaranteed, and we observed various types of noise which can make the EC models brittle, e.g. inducing overcorrection in out-of-domain (OOD) settings. In this work, we propose two fundamental criteria that EC training data should satisfy: namely, EC targets should (1) improve linguistic acceptability over sources and (2) be inferable from the available context (e.g. source phonemes). Through these criteria, we identify low-quality EC pairs and train the models not to make any correction in such cases, the process we refer to as conservative data filtering. In our experiments, we focus on Japanese ASR using a strong Conformer-CTC as the baseline and finetune Japanese LLMs for EC. Through our evaluation on a suite of 21 internal benchmarks, we demonstrate that our approach can significantly reduce overcorrection and improve both the accuracy and quality of ASR results in the challenging OOD settings.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2407.133

Country:

North America > Canada (0.14)
Europe > Spain (0.14)

Genre: Research Report > New Finding (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.91)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.89)
Information Technology > Data Science > Data Quality > Data Cleaning (0.72)

Add feedback

INDUS: Effective and Efficient Language Models for Scientific Applications

Bhattacharjee, Bishwaranjan, Trivedi, Aashka, Muraoka, Masayasu, Ramasubramanian, Muthukumaran, Udagawa, Takuma, Gurung, Iksha, Zhang, Rong, Dandala, Bharath, Ramachandran, Rahul, Maskey, Manil, Bugbee, Kaylin, Little, Mike, Fancher, Elizabeth, Sanders, Lauren, Costes, Sylvain, Blanco-Cuaresma, Sergi, Lockhart, Kelly, Allen, Thomas, Grezes, Felix, Ansdell, Megan, Accomazzi, Alberto, El-Kurdi, Yousef, Wertheimer, Davis, Pfitzmann, Birgit, Ramis, Cesar Berrospi, Dolfi, Michele, de Lima, Rafael Teixeira, Vagenas, Panagiotis, Mukkavilli, S. Karthik, Staar, Peter, Vahidinia, Sanaz, McGranaghan, Ryan, Mehrabian, Armin, Lee, Tsendgar

arXiv.org Artificial IntelligenceMay-20-2024

Large language models (LLMs) trained on general domain corpora showed remarkable results on natural language processing (NLP) tasks. However, previous research demonstrated LLMs trained using domain-focused corpora perform better on specialized tasks. Inspired by this pivotal insight, we developed INDUS, a comprehensive suite of LLMs tailored for the Earth science, biology, physics, heliophysics, planetary sciences and astrophysics domains and trained using curated scientific corpora drawn from diverse data sources. The suite of models include: (1) an encoder model trained using domain-specific vocabulary and corpora to address natural language understanding tasks, (2) a contrastive-learning-based general text embedding model trained using a diverse set of datasets drawn from multiple sources to address information retrieval tasks and (3) smaller versions of these models created using knowledge distillation techniques to address applications which have latency or resource constraints. We also created three new scientific benchmark datasets namely, CLIMATE-CHANGE-NER (entity-recognition), NASA-QA (extractive QA) and NASA-IR (IR) to accelerate research in these multi-disciplinary fields. Finally, we show that our models outperform both general-purpose encoders (RoBERTa) and existing domain-specific encoders (SciBERT) on these new tasks as well as existing benchmark tasks in the domains of interest.

information retrieval, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2405.10725

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.70)

Industry:

Health & Medicine (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Government > Space Agency (0.91)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Multiple Representation Transfer from Large Language Models to End-to-End ASR Systems

Udagawa, Takuma, Suzuki, Masayuki, Kurata, Gakuto, Muraoka, Masayasu, Saon, George

arXiv.org Artificial IntelligenceDec-25-2023

Transferring the knowledge of large language models (LLMs) is a promising technique to incorporate linguistic knowledge into end-to-end automatic speech recognition (ASR) systems. However, existing works only transfer a single representation of LLM (e.g. the last layer of pretrained BERT), while the representation of a text is inherently non-unique and can be obtained variously from different layers, contexts and models. In this work, we explore a wide range of techniques to obtain and transfer multiple representations of LLMs into a transducer-based ASR system. While being conceptually simple, we show that transferring multiple representations of LLMs can be an effective alternative to transferring only a single representation.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2309.04031

Genre: Research Report > Promising Solution (0.34)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Neural Architecture Search for Effective Teacher-Student Knowledge Transfer in Language Models

Trivedi, Aashka, Udagawa, Takuma, Merler, Michele, Panda, Rameswar, El-Kurdi, Yousef, Bhattacharjee, Bishwaranjan

arXiv.org Artificial IntelligenceOct-13-2023

Large pretrained language models have achieved state-of-the-art results on a variety of downstream tasks. Knowledge Distillation (KD) into a smaller student model addresses their inefficiency, allowing for deployment in resource-constrained environments. However, KD can be ineffective when the student is manually selected from a set of existing options, since it can be a sub-optimal choice within the space of all possible student architectures. We develop multilingual KD-NAS, the use of Neural Architecture Search (NAS) guided by KD to find the optimal student architecture for task agnostic distillation from a multilingual teacher. In each episode of the search process, a NAS controller predicts a reward based on the distillation loss and latency of inference. The top candidate architectures are then distilled from the teacher on a small proxy set. Finally the architecture(s) with the highest reward is selected, and distilled on the full training corpus. KD-NAS can automatically trade off efficiency and effectiveness, and recommends architectures suitable to various latency budgets. Using our multi-layer hidden state distillation process, our KD-NAS student model achieves a 7x speedup on CPU inference (2x on GPU) compared to a XLM-Roberta Base Teacher, while maintaining 90% performance, and has been deployed in 3 software offerings requiring large throughput, low latency and deployment on CPU.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2303.09639

Country:

North America > Canada (0.14)
Asia > China (0.14)

Genre: Research Report (0.64)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

A Comparative Analysis of Task-Agnostic Distillation Methods for Compressing Transformer Language Models

Udagawa, Takuma, Trivedi, Aashka, Merler, Michele, Bhattacharjee, Bishwaranjan

arXiv.org Artificial IntelligenceOct-12-2023

Large language models have become a vital component in modern NLP, achieving state of the art performance in a variety of tasks. However, they are often inefficient for real-world deployment due to their expensive inference costs. Knowledge distillation is a promising technique to improve their efficiency while retaining most of their effectiveness. In this paper, we reproduce, compare and analyze several representative methods for task-agnostic (general-purpose) distillation of Transformer language models. Our target of study includes Output Distribution (OD) transfer, Hidden State (HS) transfer with various layer mapping strategies, and Multi-Head Attention (MHA) transfer based on MiniLMv2. Through our extensive experiments, we study the effectiveness of each method for various student architectures in both monolingual (English) and multilingual settings. Overall, we show that MHA transfer based on MiniLMv2 is generally the best option for distillation and explain the potential reasons behind its success. Moreover, we show that HS transfer remains as a competitive baseline, especially under a sophisticated layer mapping strategy, while OD transfer consistently lags behind other approaches. Findings from this study helped us deploy efficient yet effective student models for latency-critical applications.

artificial intelligence, compressing transformer language model, natural language, (2 more...)

arXiv.org Artificial Intelligence

2310.08797

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Sentence Identification with BOS and EOS Label Combinations

Udagawa, Takuma, Kanayama, Hiroshi, Yoshida, Issei

arXiv.org Artificial IntelligenceJan-30-2023

The sentence is a fundamental unit in many NLP applications. Sentence segmentation is widely used as the first preprocessing task, where an input text is split into consecutive sentences considering the end of the sentence (EOS) as their boundaries. This task formulation relies on a strong assumption that the input text consists only of sentences, or what we call the sentential units (SUs). However, real-world texts often contain non-sentential units (NSUs) such as metadata, sentence fragments, nonlinguistic markers, etc. which are unreasonable or undesirable to be treated as a part of an SU. To tackle this issue, we formulate a novel task of sentence identification, where the goal is to identify SUs while excluding NSUs in a given text. To conduct sentence identification, we propose a simple yet effective method which combines the beginning of the sentence (BOS) and EOS labels to determine the most probable SUs and NSUs based on dynamic programming. To evaluate this task, we design an automatic, language-independent procedure to convert the Universal Dependencies corpora into sentence identification benchmarks. Finally, our experiments on the sentence identification task demonstrate that our proposed method generally outperforms sentence segmentation baselines which only utilize EOS labels.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2301.13352

Country:

North America > United States (1.00)
Europe (1.00)
Asia (0.93)

Genre: Research Report (0.82)

Industry: Law (0.34)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.89)

Add feedback

Policy-Adaptive Estimator Selection for Off-Policy Evaluation

Udagawa, Takuma, Kiyohara, Haruka, Narita, Yusuke, Saito, Yuta, Tateno, Kei

arXiv.org Artificial IntelligenceJan-29-2023

Off-policy evaluation (OPE) aims to accurately evaluate the performance of counterfactual policies using only offline logged data. Although many estimators have been developed, there is no single estimator that dominates the others, because the estimators' accuracy can vary greatly depending on a given OPE task such as the evaluation policy, number of actions, and noise level. Thus, the data-driven estimator selection problem is becoming increasingly important and can have a significant impact on the accuracy of OPE. However, identifying the most accurate estimator using only the logged data is quite challenging because the ground-truth estimation accuracy of estimators is generally unavailable. This paper studies this challenging problem of estimator selection for OPE for the first time. In particular, we enable an estimator selection that is adaptive to a given OPE task, by appropriately subsampling available logged data and constructing pseudo policies useful for the underlying estimator selection task. Comprehensive experiments on both synthetic and real-world company data demonstrate that the proposed procedure substantially improves the estimator selection compared to a non-adaptive heuristic.

artificial intelligence, estimator, machine learning, (12 more...)

arXiv.org Artificial Intelligence

2211.13904

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Evaluating the Robustness of Off-Policy Evaluation

Saito, Yuta, Udagawa, Takuma, Kiyohara, Haruka, Mogi, Kazuki, Narita, Yusuke, Tateno, Kei

arXiv.org Machine LearningAug-31-2021

Off-policy Evaluation (OPE), or offline evaluation in general, evaluates the performance of hypothetical policies leveraging only offline log data. It is particularly useful in applications where the online interaction involves high stakes and expensive setting such as precision medicine and recommender systems. Since many OPE estimators have been proposed and some of them have hyperparameters to be tuned, there is an emerging challenge for practitioners to select and tune OPE estimators for their specific application. Unfortunately, identifying a reliable estimator from results reported in research papers is often difficult because the current experimental procedure evaluates and compares the estimators' performance on a narrow set of hyperparameters and evaluation policies. Therefore, it is difficult to know which estimator is safe and reliable to use. In this work, we develop Interpretable Evaluation for Offline Evaluation (IEOE), an experimental procedure to evaluate OPE estimators' robustness to changes in hyperparameters and/or evaluation policies in an interpretable manner. Then, using the IEOE procedure, we perform extensive evaluation of a wide variety of existing estimators on Open Bandit Dataset, a large-scale public real-world dataset for OPE. We demonstrate that our procedure can evaluate the estimators' robustness to the hyperparamter choice, helping us avoid using unsafe estimators. Finally, we apply IEOE to real-world e-commerce platform data and demonstrate how to use our protocol in practice.

artificial intelligence, estimator, machine learning, (16 more...)

arXiv.org Machine Learning

doi: 10.1145/3460231.3474245

2108.13703

Country: North America > United States > California (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Maintaining Common Ground in Dynamic Environments

Udagawa, Takuma, Aizawa, Akiko

arXiv.org Artificial IntelligenceMay-29-2021

Common grounding is the process of creating and maintaining mutual understandings, which is a critical aspect of sophisticated human communication. While various task settings have been proposed in existing literature, they mostly focus on creating common ground under static context and ignore the aspect of maintaining them overtime under dynamic context. In this work, we propose a novel task setting to study the ability of both creating and maintaining common ground in dynamic environments. Based on our minimal task formulation, we collected a large-scale dataset of 5,617 dialogues to enable fine-grained evaluation and analysis of various dialogue systems. Through our dataset analyses, we highlight novel challenges introduced in our setting, such as the usage of complex spatio-temporal expressions to create and maintain common ground. Finally, we conduct extensive experiments to assess the capabilities of our baseline dialogue system and discuss future prospects of our research.

artificial intelligence, dialogue, survey article, (18 more...)

arXiv.org Artificial Intelligence

2105.14207

Country: Asia > Japan (0.14)

Genre:

Research Report (0.64)
Overview (0.46)

Industry: Leisure & Entertainment (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)
(2 more...)

Add feedback

A Linguistic Analysis of Visually Grounded Dialogues Based on Spatial Expressions

Udagawa, Takuma, Yamazaki, Takato, Aizawa, Akiko

arXiv.org Artificial IntelligenceOct-6-2020

Recent models achieve promising results in visually grounded dialogues. However, existing datasets often contain undesirable biases and lack sophisticated linguistic analyses, which make it difficult to understand how well current models recognize their precise linguistic structures. To address this problem, we make two design choices: first, we focus on OneCommon Corpus \citep{udagawa2019natural,udagawa2020annotated}, a simple yet challenging common grounding dataset which contains minimal bias by design. Second, we analyze their linguistic structures based on \textit{spatial expressions} and provide comprehensive and reliable annotation for 600 dialogues. We show that our annotation captures important linguistic structures including predicate-argument structure, modification and ellipsis. In our experiments, we assess the model's understanding of these structures through reference resolution. We demonstrate that our annotation can reveal both the strengths and weaknesses of baseline models in essential levels of detail. Overall, we propose a novel framework and resource for investigating fine-grained language understanding in visually grounded dialogues.

artificial intelligence, natural language, relation, (16 more...)

arXiv.org Artificial Intelligence

2010.03127

Country:

Europe (1.00)
Asia (0.68)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.93)

Add feedback