McGraw, Ian
Sharing Low Rank Conformer Weights for Tiny Always-On Ambient Speech Recognition Models
Hernandez, Steven M., Zhao, Ding, Ding, Shaojin, Bruguier, Antoine, Prabhavalkar, Rohit, Sainath, Tara N., He, Yanzhang, McGraw, Ian
Continued improvements in machine learning techniques offer exciting new opportunities through the use of larger models and larger training datasets. However, there is a growing need to offer these new capabilities on-board low-powered devices such as smartphones, wearables and other embedded environments where only low memory is available. Towards this, we consider methods to reduce the model size of Conformer-based speech recognition models which typically require models with greater than 100M parameters down to just $5$M parameters while minimizing impact on model quality. Such a model allows us to achieve always-on ambient speech recognition on edge devices with low-memory neural processors. We propose model weight reuse at different levels within our model architecture: (i) repeating full conformer block layers, (ii) sharing specific conformer modules across layers, (iii) sharing sub-components per conformer module, and (iv) sharing decomposed sub-component weights after low-rank decomposition. By sharing weights at different levels of our model, we can retain the full model in-memory while increasing the number of virtual transformations applied to the input. Through a series of ablation studies and evaluations, we find that with weight sharing and a low-rank architecture, we can achieve a WER of 2.84 and 2.94 for Librispeech dev-clean and test-clean respectively with a $5$M parameter model.
Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling
Shen, Jonathan, Nguyen, Patrick, Wu, Yonghui, Chen, Zhifeng, Chen, Mia X., Jia, Ye, Kannan, Anjuli, Sainath, Tara, Cao, Yuan, Chiu, Chung-Cheng, He, Yanzhang, Chorowski, Jan, Hinsu, Smit, Laurenzo, Stella, Qin, James, Firat, Orhan, Macherey, Wolfgang, Gupta, Suyog, Bapna, Ankur, Zhang, Shuyuan, Pang, Ruoming, Weiss, Ron J., Prabhavalkar, Rohit, Liang, Qiao, Jacob, Benoit, Liang, Bowen, Lee, HyoukJoong, Chelba, Ciprian, Jean, Sรฉbastien, Li, Bo, Johnson, Melvin, Anil, Rohan, Tibrewal, Rajat, Liu, Xiaobing, Eriguchi, Akiko, Jaitly, Navdeep, Ari, Naveen, Cherry, Colin, Haghani, Parisa, Good, Otavio, Cheng, Youlong, Alvarez, Raziel, Caswell, Isaac, Hsu, Wei-Ning, Yang, Zongheng, Wang, Kuan-Chieh, Gonina, Ekaterina, Tomanek, Katrin, Vanik, Ben, Wu, Zelin, Jones, Llion, Schuster, Mike, Huang, Yanping, Chen, Dehao, Irie, Kazuki, Foster, George, Richardson, John, Macherey, Klaus, Bruguier, Antoine, Zen, Heiga, Raffel, Colin, Kumar, Shankar, Rao, Kanishka, Rybach, David, Murray, Matthew, Peddinti, Vijayaditya, Krikun, Maxim, Bacchiani, Michiel A. U., Jablin, Thomas B., Suderman, Rob, Williams, Ian, Lee, Benjamin, Bhatia, Deepti, Carlson, Justin, Yavuz, Semih, Zhang, Yu, McGraw, Ian, Galkin, Max, Ge, Qi, Pundak, Golan, Whipkey, Chad, Wang, Todd, Alon, Uri, Lepikhin, Dmitry, Tian, Ye, Sabour, Sara, Chan, William, Toshniwal, Shubham, Liao, Baohua, Nirschl, Michael, Rondon, Pat
Lingvo is a Tensorflow framework offering a complete solution for collaborative deep learning research, with a particular focus towards sequence-to-sequence models. Lingvo models are composed of modular building blocks that are flexible and easily extensible, and experiment configurations are centralized and highly customizable. Distributed training and quantized inference are supported directly within the framework, and it contains existing implementations of a large number of utilities, helper functions, and the newest research ideas. Lingvo has been used in collaboration by dozens of researchers in more than 20 papers over the last two years. This document outlines the underlying design of Lingvo and serves as an introduction to the various pieces of the framework, while also offering examples of advanced features that showcase the capabilities of the framework.
Residual Belief Propagation: Informed Scheduling for Asynchronous Message Passing
Elidan, Gal, McGraw, Ian, Koller, Daphne
Inference for probabilistic graphical models is still very much a practical challenge in large domains. The commonly used and effective belief propagation (BP) algorithm and its generalizations often do not converge when applied to hard, real-life inference tasks. While it is widely recognized that the scheduling of messages in these algorithms may have significant consequences, this issue remains largely unexplored. In this work, we address the question of how to schedule messages for asynchronous propagation so that a fixed point is reached faster and more often. We first show that any reasonable asynchronous BP converges to a unique fixed point under conditions similar to those that guarantee convergence of synchronous BP. In addition, we show that the convergence rate of a simple round-robin schedule is at least as good as that of synchronous propagation. We then propose residual belief propagation (RBP), a novel, easy-to-implement, asynchronous propagation algorithm that schedules messages in an informed way, that pushes down a bound on the distance from the fixed point. Finally, we demonstrate the superiority of RBP over state-of-the-art methods for a variety of challenging synthetic and real-life problems: RBP converges significantly more often than other methods; and it significantly reduces running time until convergence, even when other methods converge.