AITopics | Donato, Domenic

Collaborating Authors

Donato, Domenic

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Anatomy of Industrial Scale Multilingual ASR

Ramirez, Francis McCann, Chkhetiani, Luka, Ehrenberg, Andrew, McHardy, Robert, Botros, Rami, Khare, Yash, Vanzo, Andrea, Peyash, Taufiquzzaman, Oexle, Gabriel, Liang, Michael, Sklyar, Ilya, Fakhan, Enver, Etefy, Ahmed, McCrystal, Daniel, Flamini, Sam, Donato, Domenic, Yoshioka, Takuya

arXiv.org Artificial IntelligenceApr-16-2024

This paper describes AssemblyAI's industrial-scale automatic speech recognition (ASR) system, designed to meet the requirements of large-scale, multilingual ASR serving various application needs. Our system leverages a diverse training dataset comprising unsupervised (12.5M hours), supervised (188k hours), and pseudo-labeled (1.6M hours) data across four languages. We provide a detailed description of our model architecture, consisting of a full-context 600M-parameter Conformer encoder pre-trained with BEST-RQ and an RNN-T decoder fine-tuned jointly with the encoder. Our extensive evaluation demonstrates competitive word error rates (WERs) against larger and more computationally expensive models, such as Whisper large and Canary-1B. Furthermore, our architectural choices yield several key advantages, including an improved code-switching capability, a 5x inference speedup compared to an optimized Whisper baseline, a 30% reduction in hallucination rate on speech data, and a 90% reduction in ambient noise compared to Whisper, along with significantly improved time-stamp accuracy. Throughout this work, we adopt a system-centric approach to analyzing various aspects of fully-fledged ASR models to gain practically relevant insights useful for real-world services operating at scale.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2404.09841

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Add feedback

Conformer-1: Robust ASR via Large-Scale Semisupervised Bootstrapping

Zhang, Kevin, Chkhetiani, Luka, Ramirez, Francis McCann, Khare, Yash, Vanzo, Andrea, Liang, Michael, Martin, Sergio Ramirez, Oexle, Gabriel, Bousbib, Ruben, Peyash, Taufiquzzaman, Nguyen, Michael, Pulliam, Dillon, Donato, Domenic

arXiv.org Artificial IntelligenceApr-12-2024

These labels are then used in traditional supervised training schemas. This line of work in turn bifurcates This paper presents Conformer-1, an end-to-end Automatic into two main approaches. The first approach relies on generating Speech Recognition (ASR) model trained on an extensive pseudo-labels using a pre-existing baseline model [1, 6, 7], dataset of 570k hours of speech audio data, 91% of which was while the second approach attempts to source massive amounts acquired from publicly available sources. To achieve this, we of data of ambiguous quality from the public sources and then perform Noisy Student Training [1] after generating pseudolabels filter it down to a subset that is both human labeled and high for the unlabeled public data using a strong Conformer quality [8]. Our work attempts to address the data scarcity issue RNN-T baseline model. The addition of these pseudo-labeled head-on and leverages both data filtering and pseudo-labeling data results in remarkable improvements in relative Word Error to procure high-quality audio and labels at scale. Rate (WER) by 11.5% and 24.3% for our asynchronous and Following the example provided by Whisper [8], we realtime models, respectively. Additionally, the model is more sourced audio speech data from open and fair use sources available robust to background noise owing to the addition of these data.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2404.07341

Country: Europe > France (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.51)

Add feedback

MAD for Robust Reinforcement Learning in Machine Translation

Donato, Domenic, Yu, Lei, Ling, Wang, Dyer, Chris

arXiv.org Artificial IntelligenceJul-18-2022

We introduce a new distributed policy gradient algorithm and show that it outperforms existing reward-aware training procedures such as REINFORCE, minimum risk training (MRT) and proximal policy optimization (PPO) in terms of training stability and generalization performance when optimizing machine translation models. Our algorithm, which we call MAD (on account of using the mean absolute deviation in the importance weighting calculation), has distributed data generators sampling multiple candidates per source sentence on worker nodes, while a central learner updates the policy. MAD depends crucially on two variance reduction strategies: (1) a conditional reward normalization method that ensures each source sentence has both positive and negative reward translation examples and (2) a new robust importance weighting scheme that acts as a conditional entropy regularizer. Experiments on a variety of translation tasks show that policies learned using the MAD algorithm perform very well when using both greedy decoding and beam search, and that the learned policies are sensitive to the specific reward used during training.

machine learning, natural language, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2207.08583

Country:

North America > United States (1.00)
Europe (1.00)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.87)

Add feedback

Scaling Language Models: Methods, Analysis & Insights from Training Gopher

Rae, Jack W., Borgeaud, Sebastian, Cai, Trevor, Millican, Katie, Hoffmann, Jordan, Song, Francis, Aslanides, John, Henderson, Sarah, Ring, Roman, Young, Susannah, Rutherford, Eliza, Hennigan, Tom, Menick, Jacob, Cassirer, Albin, Powell, Richard, Driessche, George van den, Hendricks, Lisa Anne, Rauh, Maribeth, Huang, Po-Sen, Glaese, Amelia, Welbl, Johannes, Dathathri, Sumanth, Huang, Saffron, Uesato, Jonathan, Mellor, John, Higgins, Irina, Creswell, Antonia, McAleese, Nat, Wu, Amy, Elsen, Erich, Jayakumar, Siddhant, Buchatskaya, Elena, Budden, David, Sutherland, Esme, Simonyan, Karen, Paganini, Michela, Sifre, Laurent, Martens, Lena, Li, Xiang Lorraine, Kuncoro, Adhiguna, Nematzadeh, Aida, Gribovskaya, Elena, Donato, Domenic, Lazaridou, Angeliki, Mensch, Arthur, Lespiau, Jean-Baptiste, Tsimpoukelli, Maria, Grigorev, Nikolai, Fritz, Doug, Sottiaux, Thibault, Pajarskas, Mantas, Pohlen, Toby, Gong, Zhitao, Toyama, Daniel, d'Autume, Cyprien de Masson, Li, Yujia, Terzi, Tayfun, Mikulik, Vladimir, Babuschkin, Igor, Clark, Aidan, Casas, Diego de Las, Guy, Aurelia, Jones, Chris, Bradbury, James, Johnson, Matthew, Hechtman, Blake, Weidinger, Laura, Gabriel, Iason, Isaac, William, Lockhart, Ed, Osindero, Simon, Rimell, Laura, Dyer, Chris, Vinyals, Oriol, Ayoub, Kareem, Stanway, Jeff, Bennett, Lorrayne, Hassabis, Demis, Kavukcuoglu, Koray, Irving, Geoffrey

arXiv.org Artificial IntelligenceDec-8-2021

Natural language communication is core to intelligence, as it allows ideas to be efficiently shared between humans or artificially intelligent systems. The generality of language allows us to express many intelligence tasks as taking in natural language input and producing natural language output. Autoregressive language modelling -- predicting the future of a text sequence from its past -- provides a simple yet powerful objective that admits formulation of numerous cognitive tasks. At the same time, it opens the door to plentiful training data: the internet, books, articles, code, and other writing. However this training objective is only an approximation to any specific goal or application, since we predict everything in the sequence rather than only the aspects we care about. Yet if we treat the resulting models with appropriate caution, we believe they will be a powerful tool to capture some of the richness of human intelligence. Using language models as an ingredient towards intelligence contrasts with their original application: transferring text over a limited-bandwidth communication channel. Shannon's Mathematical Theory of Communication (Shannon, 1948) linked the statistical modelling of natural language with compression, showing that measuring the cross entropy of a language model is equivalent to measuring its compression rate.

consumer health, machine learning, natural language, (26 more...)

arXiv.org Artificial Intelligence

2112.11446

Country:

Europe (1.00)
Asia > Middle East > Syria (0.14)
Asia > Middle East > Iran (0.14)
(3 more...)

Genre:

Research Report > New Finding (1.00)
Personal (1.00)
Overview (1.00)
Research Report > Promising Solution (0.67)

Industry:

Media (1.00)
Leisure & Entertainment (1.00)
Law (1.00)
(10 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(3 more...)

Add feedback