AITopics

2410.22499

Country: Asia (0.94)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

arXiv.org Artificial IntelligenceSep-20-2024

EMMeTT: Efficient Multimodal Machine Translation Training

Żelasko, Piotr, Chen, Zhehuai, Wang, Mengru, Galvez, Daniel, Hrinchuk, Oleksii, Ding, Shuoyang, Hu, Ke, Balam, Jagadeesh, Lavrukhin, Vitaly, Ginsburg, Boris

A rising interest in the modality extension of foundation language models warrants discussion on the most effective, and efficient, multimodal training approach. This work focuses on neural machine translation (NMT) and proposes a joint multimodal training regime of Speech-LLM to include automatic speech translation (AST). We investigate two different foundation model architectures, decoder-only GPT and encoder-decoder T5, extended with Canary-1B's speech encoder. To handle joint multimodal training, we propose a novel training framework called EMMeTT. EMMeTT improves training efficiency with the following: balanced sampling across languages, datasets, and modalities; efficient sequential data iteration; and a novel 2D bucketing scheme for multimodal data, complemented by a batch size optimizer (OOMptimizer). We show that a multimodal training consistently helps with both architectures. Moreover, SALM-T5 trained with EMMeTT retains the original NMT capability while outperforming AST baselines on four-language subsets of FLORES and FLEURS. The resultant Multimodal Translation Model produces strong text and speech translation results at the same time.

artificial intelligence, natural language, translation, (17 more...)

2409.13523

Genre: Research Report (0.65)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

arXiv.org Artificial IntelligenceJun-28-2024

Less is More: Accurate Speech Recognition & Translation without Web-Scale Data

Puvvada, Krishna C., Żelasko, Piotr, Huang, He, Hrinchuk, Oleksii, Koluguri, Nithin Rao, Dhawan, Kunal, Majumdar, Somshubra, Rastorgueva, Elena, Chen, Zhehuai, Lavrukhin, Vitaly, Balam, Jagadeesh, Ginsburg, Boris

It was observed in [6] that such long utterances harm the model convergence. We also note that this Recent advances in speech recognition and translation rely on approach may lead to significant padding in mini-batches, resulting hundreds of thousands of hours of Internet speech data. We argue in wasted computation on non-informative frames. We that state-of-the art accuracy can be reached without relying on present an alternative approach to sampling and batching that web-scale data. Canary - multilingual ASR and speech translation allows us to iterate through data twice as fast, while balancing model, outperforms current state-of-the-art models - Whisper, different languages and data sources better. We further accelerate OWSM, and Seamless-M4T on English, French, Spanish, and the training and inference by adopting a FastConformer [7] architecture German languages, while being trained on an order of magnitude and initializing the encoder from a ASR only pretrained less data than these models. Three key factors enables such dataefficient checkpoint.

canary, machine learning, natural language, (16 more...)

2406.19674

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.89)

arXiv.org Artificial IntelligenceOct-13-2023

SALM: Speech-augmented Language Model with In-context Learning for Speech Recognition and Translation

Chen, Zhehuai, Huang, He, Andrusenko, Andrei, Hrinchuk, Oleksii, Puvvada, Krishna C., Li, Jason, Ghosh, Subhankar, Balam, Jagadeesh, Ginsburg, Boris

We present a novel Speech Augmented Language Model (SALM) with {\em multitask} and {\em in-context} learning capabilities. SALM comprises a frozen text LLM, a audio encoder, a modality adapter module, and LoRA layers to accommodate speech input and associated task instructions. The unified SALM not only achieves performance on par with task-specific Conformer baselines for Automatic Speech Recognition (ASR) and Speech Translation (AST), but also exhibits zero-shot in-context learning capabilities, demonstrated through keyword-boosting task for ASR and AST. Moreover, {\em speech supervised in-context training} is proposed to bridge the gap between LLM training and downstream speech tasks, which further boosts the in-context learning ability of speech-to-text models. Proposed model is open-sourced via NeMo toolkit.

large language model, machine learning, natural language, (15 more...)

2310.09424

Genre: Research Report (0.50)

Industry: Information Technology (0.30)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

arXiv.org Artificial IntelligenceMay-7-2023

Leveraging Synthetic Targets for Machine Translation

Mittal, Sarthak, Hrinchuk, Oleksii, Kuchaiev, Oleksii

In this work, we provide a recipe for training machine translation models in a limited resource setting by leveraging synthetic target data generated using a large pre-trained model. We show that consistently across different benchmarks in bilingual, multilingual, and speech translation setups, training models on synthetic targets outperforms training on the actual ground-truth data. This performance gap grows bigger with increasing limits on the amount of available resources in the form of the size of the dataset and the number of parameters in the model. We also provide preliminary analysis into whether this boost in performance is linked to ease of optimization or more deterministic nature of the predictions, and whether this paradigm leads to better out-of-distribution performance across different testing domains.

machine learning, natural language, translation, (16 more...)

2305.06155

Country: Europe > Belgium (0.14)

Genre: Research Report (1.00)

Industry: Education (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningMay-27-2019

Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks

Ginsburg, Boris, Castonguay, Patrice, Hrinchuk, Oleksii, Kuchaiev, Oleksii, Lavrukhin, Vitaly, Leary, Ryan, Li, Jason, Nguyen, Huyen, Cohen, Jonathan M.

We propose NovoGrad, a first-order stochastic gradient method with layer-wise gradient normalization via second moment estimators and with decoupled weight decay for a better regularization. The method requires half as much memory as Adam/AdamW. We evaluated NovoGrad on a diverse set of problems, including image classification, speech recognition, neural machine translation and language modeling. On these problems, NovoGrad performed equal to or better than SGD and Adam/AdamW. Empirically we show that NovoGrad (1) is very robust during the initial training phase and does not require learning rate warm-up, (2) works well with the same learning rate policy for different problems, and (3) generally performs better than other optimizers for very large batch sizes.

deep learning, neural network, novograd, (20 more...)

1905.11286

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.72)

arXiv.org Machine LearningFeb-28-2019

Catalyst.RL: A Distributed Framework for Reproducible RL Research

Kolesnikov, Sergey, Hrinchuk, Oleksii

Despite the recent progress in deep reinforcement learning field (RL), and, arguably because of it, a large body of work remains to be done in reproducing and carefully comparing different RL algorithms. We present catalyst.RL, an open source framework for RL research with a focus on reproducibility and flexibility. Main features of our library include large-scale asynchronous distributed training, easy-to-use configuration files with the complete list of hyperparameters for the particular experiments, efficient implementations of various RL algorithms and auxiliary tricks, such as frame stacking, n-step returns, value distributions, etc. To vindicate the usefulness of our framework, we evaluate it on a range of benchmarks in a continuous control, as well as on the task of developing a controller to enable a physiologically-based human model with a prosthetic leg to walk and run. The latter task was introduced at NeurIPS 2018 AI for Prosthetics Challenge, where our team took the 3rd place, capitalizing on the ability of catalyst.RL to train high-quality and sample-efficient RL agents.

algorithm, artificial intelligence, reinforcement learning, (17 more...)

1903.00027

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment > Games (0.94)
Materials > Chemicals > Specialty Chemicals (0.85)
Health & Medicine (0.75)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.32)

arXiv.org Machine LearningFeb-6-2019

Artificial Intelligence for Prosthetics - challenge solutions

Kidziński, Łukasz, Ong, Carmichael, Mohanty, Sharada Prasanna, Hicks, Jennifer, Carroll, Sean F., Zhou, Bo, Zeng, Hongsheng, Wang, Fan, Lian, Rongzhong, Tian, Hao, Jaśkowski, Wojciech, Andersen, Garrett, Lykkebø, Odd Rune, Toklu, Nihat Engin, Shyam, Pranav, Srivastava, Rupesh Kumar, Kolesnikov, Sergey, Hrinchuk, Oleksii, Pechenko, Anton, Ljungström, Mattias, Wang, Zhen, Hu, Xu, Hu, Zehong, Qiu, Minghui, Huang, Jun, Shpilman, Aleksei, Sosin, Ivan, Svidchenko, Oleg, Malysheva, Aleksandra, Kudenko, Daniel, Rane, Lance, Bhatt, Aditya, Wang, Zhengfei, Qi, Penghui, Yu, Zeyang, Peng, Peng, Yuan, Quan, Li, Wenxin, Tian, Yunsheng, Yang, Ruihan, Ma, Pingchuan, Khadka, Shauharda, Majumdar, Somdeb, Dwiel, Zach, Liu, Yinyin, Tumer, Evren, Watson, Jeremy, Salathé, Marcel, Levine, Sergey, Delp, Scott

In the NeurIPS 2018 Artificial Intelligence for Prosthetics challenge, participants were tasked with building a controller for a musculoskeletal model with a goal of matching a given time-varying velocity vector. Top participants were invited to describe their algorithms. In this work, we describe the challenge and present thirteen solutions that used deep reinforcement learning approaches. Many solutions use similar relaxations and heuristics, such as reward shaping, frame skipping, discretization of the action space, symmetry, and policy blending. However, each team implemented different modifications of the known algorithms by, for example, dividing the task into subtasks, learning low-level control, or by incorporating expert knowledge and using imitation learning.

agent, computer game, neural network, (19 more...)

1902.02441

Country:

Europe (1.00)
Asia > China (0.46)
North America > United States > California (0.28)

Genre: Research Report > New Finding (0.45)

Industry:

Health & Medicine > Health Care Technology (0.94)
Education (0.92)
Leisure & Entertainment > Games > Computer Games (0.92)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Machine LearningJan-30-2019

Generalized Tensor Models for Recurrent Neural Networks

Khrulkov, Valentin, Hrinchuk, Oleksii, Oseledets, Ivan

Recurrent Neural Networks (RNNs) are very successful at solving challenging problems with sequential data. However, this observed efficiency is not yet entirely explained by theory. It is known that a certain class of multiplicative RNNs enjoys the property of depth efficiency --- a shallow network of exponentially large width is necessary to realize the same score function as computed by such an RNN. Such networks, however, are not very often applied to real life tasks. In this work, we attempt to reduce the gap between theory and practice by extending the theoretical analysis to RNNs which employ various nonlinearities, such as Rectified Linear Unit (ReLU), and show that they also benefit from properties of universality and depth efficiency. Our theoretical results are verified by a series of extensive computational experiments.

deep learning, generalized rnn, neural network, (19 more...)

1901.10801

Country:

Europe > Russia (0.28)
North America > United States > Oregon (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)