AITopics | Munkhdalai, Tsendsuren

Collaborating Authors

Munkhdalai, Tsendsuren

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

What Matters for Model Merging at Scale?

Yadav, Prateek, Vu, Tu, Lai, Jonathan, Chronopoulou, Alexandra, Faruqui, Manaal, Bansal, Mohit, Munkhdalai, Tsendsuren

arXiv.org Artificial IntelligenceOct-4-2024

Model merging aims to combine multiple expert models into a more capable single model, offering benefits such as reduced storage and serving costs, improved generalization, and support for decentralized model development. Despite its promise, previous studies have primarily focused on merging a few small models. This leaves many unanswered questions about the effect of scaling model size and how it interplays with other key factors -- like the base model quality and number of expert models -- , to affect the merged model's performance. This work systematically evaluates the utility of model merging at scale, examining the impact of these different factors. We experiment with merging fully fine-tuned models using 4 popular merging methods -- Averaging, Task~Arithmetic, Dare, and TIES -- across model sizes ranging from 1B-64B parameters and merging up to 8 different expert models. We evaluate the merged models on both held-in tasks, i.e., the expert's training tasks, and zero-shot generalization to unseen held-out tasks. Our experiments provide several new insights about model merging at scale and the interplay between different factors. First, we find that merging is more effective when experts are created from strong base models, i.e., models with good zero-shot performance. Second, larger models facilitate easier merging. Third merging consistently improves generalization capabilities. Notably, when merging 8 large expert models, the merged models often generalize better compared to the multitask trained models. Fourth, we can better merge more expert models when working with larger models. Fifth, different merging methods behave very similarly at larger scales. Overall, our findings shed light on some interesting properties of model merging while also highlighting some limitations. We hope that this study will serve as a reference point on large-scale merging for upcoming research.

arxiv preprint arxiv, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2410.03617

Country:

North America (0.46)
Asia (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)

Add feedback

Deferred NAM: Low-latency Top-K Context Injection via Deferred Context Encoding for Non-Streaming ASR

Wu, Zelin, Song, Gan, Li, Christopher, Rondon, Pat, Meng, Zhong, Velez, Xavier, Wang, Weiran, Caseiro, Diamantino, Pundak, Golan, Munkhdalai, Tsendsuren, Chandorkar, Angad, Prabhavalkar, Rohit

arXiv.org Artificial IntelligenceApr-23-2024

Contextual biasing enables speech recognizers to transcribe important phrases in the speaker's context, such as contact names, even if they are rare in, or absent from, the training data. Attention-based biasing is a leading approach which allows for full end-to-end cotraining of the recognizer and biasing system and requires no separate inference-time components. Such biasers typically consist of a context encoder; followed by a context filter which narrows down the context to apply, improving per-step inference time; and, finally, context application via cross attention. Though much work has gone into optimizing per-frame performance, the context encoder is at least as important: recognition cannot begin before context encoding ends. Here, we show the lightweight phrase selection pass can be moved before context encoding, resulting in a speedup of up to 16.1 times and enabling biasing to scale to 20K phrases with a maximum pre-decoding delay under 33ms. With the addition of phrase- and wordpiece-level cross-entropy losses, our technique also achieves up to a 37.5% relative WER reduction over the baseline without the losses and lightweight phrase selection pass.

artificial intelligence, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2404.1018

Country:

North America > United States (0.14)
Asia > China (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.91)

Add feedback

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Munkhdalai, Tsendsuren, Faruqui, Manaal, Gopal, Siddharth

arXiv.org Artificial IntelligenceApr-10-2024

This work introduces an efficient method to scale Transformer-based Large Language Models (LLMs) to infinitely long inputs with bounded memory and computation. A key component in our proposed approach is a new attention technique dubbed Infini-attention. The Infini-attention incorporates a compressive memory into the vanilla attention mechanism and builds in both masked local attention and long-term linear attention mechanisms in a single Transformer block. We demonstrate the effectiveness of our approach on long-context language modeling benchmarks, 1M sequence length passkey context block retrieval and 500K length book summarization tasks with 1B and 8B LLMs. Our approach introduces minimal bounded memory parameters and enables fast streaming inference for LLMs.

arxiv preprint arxiv, large language model, machine learning, (12 more...)

arXiv.org Artificial Intelligence

2404.07143

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Hierarchical Recurrent Adapters for Efficient Multi-Task Adaptation of Large Speech Models

Munkhdalai, Tsendsuren, Chen, Youzheng, Sim, Khe Chai, Biadsy, Fadi, Sainath, Tara, Mengibar, Pedro Moreno

arXiv.org Artificial IntelligenceMar-25-2024

Parameter efficient adaptation methods have become a key mechanism to train large pre-trained models for downstream tasks. However, their per-task parameter overhead is considered still high when the number of downstream tasks to adapt for is large. We introduce an adapter module that has a better efficiency in large scale multi-task adaptation scenario. Our adapter is hierarchical in terms of how the adapter parameters are allocated. The adapter consists of a single shared controller network and multiple task-level adapter heads to reduce the per-task parameter overhead without performance regression on downstream tasks. The adapter is also recurrent so the entire adapter parameters are reused across different layers of the pre-trained model. Our Hierarchical Recurrent Adapter (HRA) outperforms the previous adapter-based approaches as well as full model fine-tuning baseline in both single and multi-task adaptation settings when evaluated on automatic speech recognition tasks.

adapter head, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2403.19709

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Speech (0.70)

Add feedback

Contextual Biasing with the Knuth-Morris-Pratt Matching Algorithm

Wang, Weiran, Wu, Zelin, Caseiro, Diamantino, Munkhdalai, Tsendsuren, Sim, Khe Chai, Rondon, Pat, Pundak, Golan, Song, Gan, Prabhavalkar, Rohit, Meng, Zhong, Zhao, Ding, Sainath, Tara, Mengibar, Pedro Moreno

arXiv.org Artificial IntelligenceSep-29-2023

Contextual biasing refers to the problem of biasing the automatic speech recognition (ASR) systems towards rare entities that are relevant to the specific user or application scenarios. We propose algorithms for contextual biasing based on the Knuth-Morris-Pratt algorithm for pattern matching. During beam search, we boost the score of a token extension if it extends matching into a set of biasing phrases. Our method simulates the classical approaches often implemented in the weighted finite state transducer (WFST) framework, but avoids the FST language altogether, with careful considerations on memory footprint and efficiency on tensor processing units (TPUs) by vectorization. Without introducing additional model parameters, our method achieves significant word error rate (WER) reductions on biasing test sets by itself, and yields further performance gain when combined with a model-based biasing method.

knuth-morris-pratt matching algorithm, machine learning, speech recognition, (2 more...)

arXiv.org Artificial Intelligence

2310.00178

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.87)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.53)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.53)

Add feedback

Improving Speech Recognition for African American English With Audio Classification

Garg, Shefali, Huo, Zhouyuan, Sim, Khe Chai, Schwartz, Suzan, Chua, Mason, Aksënova, Alëna, Munkhdalai, Tsendsuren, King, Levi, Wright, Darryl, Mengesha, Zion, Hwang, Dongseong, Sainath, Tara, Beaufays, Françoise, Mengibar, Pedro Moreno

arXiv.org Artificial IntelligenceSep-16-2023

Automatic speech recognition (ASR) systems have been shown to have large quality disparities between the language varieties they are intended or expected to recognize. One way to mitigate this is to train or fine-tune models with more representative datasets. But this approach can be hindered by limited in-domain data for training and evaluation. We propose a new way to improve the robustness of a US English short-form speech recognizer using a small amount of out-of-domain (long-form) African American English (AAE) data. We use CORAAL, YouTube and Mozilla Common Voice to train an audio classifier to approximately output whether an utterance is AAE or some other variety including Mainstream American English (MAE). By combining the classifier output with coarse geographic information, we can select a subset of utterances from a large corpus of untranscribed short-form queries for semi-supervised learning at scale. Fine-tuning on this data results in a 38.5% relative word error rate disparity reduction between AAE and MAE without reducing MAE quality.

artificial intelligence, machine learning, utterance, (15 more...)

arXiv.org Artificial Intelligence

2309.09996

Country:

North America > United States (0.15)
Europe (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Add feedback

Fast Contextual Adaptation with Neural Associative Memory for On-Device Personalized Speech Recognition

Munkhdalai, Tsendsuren, Sim, Khe Chai, Chandorkar, Angad, Gao, Fan, Chua, Mason, Strohman, Trevor, Beaufays, Françoise

arXiv.org Artificial IntelligenceOct-6-2021

Fast contextual adaptation has shown to be effective in improving Automatic Speech Recognition (ASR) of rare words and when combined with an on-device personalized training, it can yield an even better recognition result. However, the traditional re-scoring approaches based on an external language model is prone to diverge during the personalized training. In this work, we introduce a model-based end-to-end contextual adaptation approach that is decoder-agnostic and amenable to on-device personalization. Our on-device simulation experiments demonstrate that the proposed approach outperforms the traditional re-scoring technique by 12% relative WER and 15.7% entity mention specific F1-score in a continues personalization scenario.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2110.0222

Country: North America > United States (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.65)

Add feedback

Sparse Meta Networks for Sequential Adaptation and its Application to Adaptive Language Modelling

Munkhdalai, Tsendsuren

arXiv.org Artificial IntelligenceSep-3-2020

Training a deep neural network requires a large amount of single-task data and involves a long time-consuming optimization phase. This is not scalable to complex, realistic environments with new unexpected changes. Humans can perform fast incremental learning on the fly and memory systems in the brain play a critical role. We introduce Sparse Meta Networks -- a meta-learning approach to learn online sequential adaptation algorithms for deep neural networks, by using deep neural networks. We augment a deep neural network with a layer-specific fast-weight memory. The fast-weights are generated sparsely at each time step and accumulated incrementally through time providing a useful inductive bias for online continual adaptation. We demonstrate strong performance on a variety of sequential adaptation scenarios, from a simple online reinforcement learning to a large scale adaptive language modelling.

arxiv preprint arxiv, deep learning, neural network, (16 more...)

arXiv.org Artificial Intelligence

2009.01803

Country:

Europe (0.93)
North America > Canada > Quebec (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Locally Adaptive Interpretable Regression

Munkhdalai, Lkhagvadorj, Munkhdalai, Tsendsuren, Ryu, Keun Ho

arXiv.org Artificial IntelligenceMay-13-2020

Machine learning models with both good predictability and high interpretability are crucial for decision support systems. Linear regression is one of the most interpretable prediction models. However, the linearity in a simple linear regression worsens its predictability. In this work, we introduce a locally adaptive interpretable regression (LoAIR). In LoAIR, a metamodel parameterized by neural networks predicts percentile of a Gaussian distribution for the regression coefficients for a rapid adaptation. Our experimental results on public benchmark datasets show that our model not only achieves comparable or better predictive performance than the other state-of-the-art baselines but also discovers some interesting relationships between input and target variables such as a parabolic relationship between CO2 emissions and Gross National Product (GNP). Therefore, LoAIR is a step towards bridging the gap between econometrics, statistics, and machine learning by improving the predictive ability of linear regression without depreciating its interpretability.

energy conservation, neural network, regression, (19 more...)

arXiv.org Artificial Intelligence

2005.0335

Country:

Europe (1.00)
North America > United States > California > San Francisco County > San Francisco (0.28)
North America > Canada > Ontario (0.28)

Genre: Research Report (1.00)

Industry: Banking & Finance > Economy (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Add feedback

Metalearned Neural Memory

Munkhdalai, Tsendsuren, Sordoni, Alessandro, Wang, Tong, Trischler, Adam

arXiv.org Machine LearningJul-23-2019

We augment recurrent neural networks with an external memory mechanism that builds upon recent progress in metalearning. We conceptualize this memory as a rapidly adaptable function that we parameterize as a deep neural network. Reading from the neural memory function amounts to pushing an input (the key vector) through the function to produce an output (the value vector). Writing to memory means changing the function; specifically, updating the parameters of the neural network to encode desired information. We leverage training and algorithmic techniques from metalearning to update the neural memory function in one shot. The proposed memory-augmented model achieves strong performance on a variety of learning problems, from supervised question answering to reinforcement learning.

deep learning, information, neural network, (18 more...)

arXiv.org Machine Learning

1907.0972

Country: North America > Canada > Quebec (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback