Wang, Yongqiang
Undergraduate Research of Decentralized Localization of Roombas Through Usage of Wall-Finding Software
Corvin, Madeline, McDowell, Johnathan, Anglea, Timothy, Wang, Yongqiang
This paper introduces the research effort of an undergraduate research team in realizing robot localization. More specifically, the undergraduate research team developed and tested wall-following software that allowed a ground robot Roombas to independently find their positions within a defined space. The software also allows a robot to send its localized position to other Roombas, so that each Roomba knows its relative location to realize robot cooperation.
Microvasculature Segmentation in Human BioMolecular Atlas Program (HuBMAP)
Sultan, Youssef, Wang, Yongqiang, Scanlon, James, D'lima, Lisa
Image segmentation serves as a critical tool across a range of applications, encompassing autonomous driving's pedestrian detection and pre-operative tumor delineation in the medical sector. Among these applications, we focus on the National Institutes of Health's (NIH) Human BioMolecular Atlas Program (HuBMAP), a significant initiative aimed at creating detailed cellular maps of the human body. In this study, we concentrate on segmenting various microvascular structures in human kidneys, utilizing 2D Periodic Acid-Schiff (PAS)-stained histology images. Our methodology begins with a foundational FastAI U-Net model, upon which we investigate alternative backbone architectures, delve into deeper models, and experiment with Feature Pyramid Networks. We rigorously evaluate these varied approaches by benchmarking their performance against our baseline U-Net model. This study thus offers a comprehensive exploration of cutting-edge segmentation techniques, providing valuable insights for future research in the field.
A Robust Dynamic Average Consensus Algorithm that Ensures both Differential Privacy and Accurate Convergence
Wang, Yongqiang
We propose a new dynamic average consensus algorithm that is robust to information-sharing noise arising from differential-privacy design. Not only is dynamic average consensus widely used in cooperative control and distributed tracking, it is also a fundamental building block in numerous distributed computation algorithms such as multi-agent optimization and distributed Nash equilibrium seeking. We propose a new dynamic average consensus algorithm that is robust to persistent and independent information-sharing noise added for the purpose of differential-privacy protection. In fact, the algorithm can ensure both provable convergence to the exact average reference signal and rigorous epsilon-differential privacy (even when the number of iterations tends to infinity), which, to our knowledge, has not been achieved before in average consensus algorithms. Given that channel noise in communication can be viewed as a special case of differential-privacy noise, the algorithm can also be used to counteract communication imperfections. Numerical simulation results confirm the effectiveness of the proposed approach.
Locally Differentially Private Distributed Online Learning with Guaranteed Optimality
Chen, Ziqin, Wang, Yongqiang
Distributed online learning is gaining increased traction due to its unique ability to process large-scale datasets and streaming data. To address the growing public awareness and concern on privacy protection, plenty of private distributed online learning algorithms have been proposed, mostly based on differential privacy which has emerged as the ``gold standard" for privacy protection. However, these algorithms often face the dilemma of trading learning accuracy for privacy. By exploiting the unique characteristics of online learning, this paper proposes an approach that tackles the dilemma and ensures both differential privacy and learning accuracy in distributed online learning. More specifically, while ensuring a diminishing expected instantaneous regret, the approach can simultaneously ensure a finite cumulative privacy budget, even on the infinite time horizon. To cater for the fully distributed setting, we adopt the local differential-privacy framework which avoids the reliance on a trusted data curator, and hence, provides stronger protection than the classic ``centralized" (global) differential privacy. To the best of our knowledge, this is the first algorithm that successfully ensures both rigorous local differential privacy and learning accuracy. The effectiveness of the proposed algorithm is evaluated using machine learning tasks, including logistic regression on the ``Mushrooms" and ``Covtype" datasets and CNN based image classification on the ``MNIST" and ``CIFAR-10" datasets.
AudioPaLM: A Large Language Model That Can Speak and Listen
Rubenstein, Paul K., Asawaroengchai, Chulayuth, Nguyen, Duc Dung, Bapna, Ankur, Borsos, Zalán, Quitry, Félix de Chaumont, Chen, Peter, Badawy, Dalia El, Han, Wei, Kharitonov, Eugene, Muckenhirn, Hannah, Padfield, Dirk, Qin, James, Rozenberg, Danny, Sainath, Tara, Schalkwyk, Johan, Sharifi, Matt, Ramanovich, Michelle Tadmor, Tagliasacchi, Marco, Tudor, Alexandru, Velimirović, Mihajlo, Vincent, Damien, Yu, Jiahui, Wang, Yongqiang, Zayats, Vicky, Zeghidour, Neil, Zhang, Yu, Zhang, Zhishuai, Zilka, Lukas, Frank, Christian
We introduce AudioPaLM, a large language model for speech understanding and generation. AudioPaLM fuses text-based and speech-based language models, PaLM-2 [Anil et al., 2023] and AudioLM [Borsos et al., 2022], into a unified multimodal architecture that can process and generate text and speech with applications including speech recognition and speech-to-speech translation. AudioPaLM inherits the capability to preserve paralinguistic information such as speaker identity and intonation from AudioLM and the linguistic knowledge present only in text large language models such as PaLM-2. We demonstrate that initializing AudioPaLM with the weights of a text-only large language model improves speech processing, successfully leveraging the larger quantity of text training data used in pretraining to assist with the speech tasks. The resulting model significantly outperforms existing systems for speech translation tasks and has the ability to perform zero-shot speech-to-text translation for many languages for which input/target language combinations were not seen in training. AudioPaLM also demonstrates features of audio language models, such as transferring a voice across languages based on a short spoken prompt.
Decentralized Nonconvex Optimization with Guaranteed Privacy and Accuracy
Wang, Yongqiang, Basar, Tamer
Privacy protection and nonconvexity are two challenging problems in decentralized optimization and learning involving sensitive data. Despite some recent advances addressing each of the two problems separately, no results have been reported that have theoretical guarantees on both privacy protection and saddle/maximum avoidance in decentralized nonconvex optimization. We propose a new algorithm for decentralized nonconvex optimization that can enable both rigorous differential privacy and saddle/maximum avoiding performance. The new algorithm allows the incorporation of persistent additive noise to enable rigorous differential privacy for data samples, gradients, and intermediate optimization variables without losing provable convergence, and thus circumventing the dilemma of trading accuracy for privacy in differential privacy design. More interestingly, the algorithm is theoretically proven to be able to efficiently { guarantee accuracy by avoiding} convergence to local maxima and saddle points, which has not been reported before in the literature on decentralized nonconvex optimization. The algorithm is efficient in both communication (it only shares one variable in each iteration) and computation (it is encryption-free), and hence is promising for large-scale nonconvex optimization and learning involving high-dimensional optimization parameters. Numerical experiments for both a decentralized estimation problem and an Independent Component Analysis (ICA) problem confirm the effectiveness of the proposed approach.
Accelerating RNN-T Training and Inference Using CTC guidance
Wang, Yongqiang, Chen, Zhehuai, Zheng, Chengjian, Zhang, Yu, Han, Wei, Haghani, Parisa
We propose a novel method to accelerate training and inference process of recurrent neural network transducer (RNN-T) based on the guidance from a co-trained connectionist temporal classification (CTC) model. We made a key assumption that if an encoder embedding frame is classified as a blank frame by the CTC model, it is likely that this frame will be aligned to blank for all the partial alignments or hypotheses in RNN-T and it can be discarded from the decoder input. We also show that this frame reduction operation can be applied in the middle of the encoder, which result in significant speed up for the training and inference in RNN-T. We further show that the CTC alignment, a by-product of the CTC decoder, can also be used to perform lattice reduction for RNN-T during training. Our method is evaluated on the Librispeech and SpeechStew tasks. We demonstrate that the proposed method is able to accelerate the RNN-T inference by 2.2 times with similar or slightly better word error rates (WER).
BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition
Zhang, Yu, Park, Daniel S., Han, Wei, Qin, James, Gulati, Anmol, Shor, Joel, Jansen, Aren, Xu, Yuanzhong, Huang, Yanping, Wang, Shibo, Zhou, Zongwei, Li, Bo, Ma, Min, Chan, William, Yu, Jiahui, Wang, Yongqiang, Cao, Liangliang, Sim, Khe Chai, Ramabhadran, Bhuvana, Sainath, Tara N., Beaufays, Françoise, Chen, Zhifeng, Le, Quoc V., Chiu, Chung-Cheng, Pang, Ruoming, Wu, Yonghui
We summarize the results of a host of efforts using giant automatic speech recognition (ASR) models pre-trained using large, diverse unlabeled datasets containing approximately a million hours of audio. We find that the combination of pre-training, self-training and scaling up model size greatly increases data efficiency, even for extremely large tasks with tens of thousands of hours of labeled data. In particular, on an ASR task with 34k hours of labeled data, by fine-tuning an 8 billion parameter pre-trained Conformer model we can match state-of-the-art (SoTA) performance with only 3% of the training data and significantly improve SoTA with the full training set. We also report on the universal benefits gained from using big pre-trained and self-trained models for a large set of downstream tasks that cover a wide range of speech domains and span multiple orders of magnitudes of dataset sizes, including obtaining SoTA performance on many public benchmarks. In addition, we utilize the learned representation of pre-trained networks to achieve SoTA results on non-ASR tasks.