Not enough data to create a plot.
Try a different view from the menu above.
Wang, Yongqiang
Communication Efficient Federated Learning with Linear Convergence on Heterogeneous Data
Liu, Jie, Wang, Yongqiang
By letting local clients perform multiple local updates before communicating with a parameter server, modern federated learning algorithms such as FedAvg tackle the communication bottleneck problem in distributed learning and have found many successful applications. However, this asynchrony between local updates and communication also leads to a ''client-drift'' problem when the data is heterogeneous (not independent and identically distributed), resulting in errors in the final learning result. In this paper, we propose a federated learning algorithm, which is called FedCET, to ensure accurate convergence even under heterogeneous distributions of data across clients. Inspired by the distributed optimization algorithm NIDS, we use learning rates to weight information received from local clients to eliminate the ''client-drift''. We prove that under appropriate learning rates, FedCET can ensure linear convergence to the exact solution. Different from existing algorithms which have to share both gradients and a drift-correction term to ensure accurate convergence under heterogeneous data distributions, FedCET only shares one variable, which significantly reduces communication overhead. Numerical comparison with existing counterpart algorithms confirms the effectiveness of FedCET.
Ensuring Truthfulness in Distributed Aggregative Optimization
Chen, Ziqin, Egerstedt, Magnus, Wang, Yongqiang
--Distributed aggregative optimization methods are gaining increased traction due to their ability to address cooperative control and optimization problems, where the objective function of each agent depends not only on its own decision variable but also on the aggregation of other agents' decision variables. Nevertheless, existing distributed aggregative optimization methods implicitly assume all agents to be truthful in information sharing, which can be unrealistic in real-world scenarios, where agents may act selfishly or strategically. In fact, an opportunistic agent may deceptively share false information in its own favor to minimize its own loss, which, however, will compromise the network-level global performance. T o solve this issue, we propose a new distributed aggregative optimization algorithm that can ensure truthfulness of agents and convergence performance. T o the best of our knowledge, this is the first algorithm that ensures truthfulness in a fully distributed setting, where no "centralized" aggregator exists to collect private information/decision variables from participating agents. We systematically characterize the convergence rate of our algorithm under nonconvex/convex/strongly convex objective functions, which generalizes existing distributed aggregative optimization results that only focus on convex objective functions. We also rigorously quantify the tradeoff between convergence performance and the level of enabled truthfulness under different convexity conditions. Numerical simulations using distributed charging of electric vehicles confirm the efficacy of our algorithm. Index T erms --Distributed aggregative optimization, joint differential privacy, truthfulness. Recently, there has been a surge of interest in distributed optimization which underpins numerous applications in cooperative control [1], [2], signal processing [3], and machine learning [4]. In distributed optimization, a group of agents cooperatively learns a common decision variable that minimizes a global objective function that is the sum of individual agents' objective functions. The work was supported in part by the National Science Foundation under Grants ECCS-1912702, CCF-2106293, CCF-2215088, CNS-2219487, and CCF-2334449. Ziqin Chen and Y ongqiang Wang are with the Department of Electrical and Computer Engineering, Clemson University, Clemson, SC 29634 USA and Magnus Egerstedt is with the Department of Electrical Engineering and Computer Science, University of California, Irvine, Irvine, CA 92697 USA. To solve problem (1), several gradient-tracking-based algorithms have been proposed for strongly convex objective functions [5]-[11] and convex objective functions [12]-[15]. Recently, some results have also been reported for nonconvex objective functions [16], [17].
Experimental Study of Decentralized Robot Network Coordination
Lemon, Martyn, Wang, Yongqiang
Synchronization and desynchronization in networks is a highly studied topic in many electrical systems, but there is a distinct lack of research on this topic with respect to robotics. Creating an effective decentralized synchronization algorithm for a robotic network would allow multiple robots to work together to achieve a task and would be able to adapt to the addition or loss of robots in real-time. The purpose of this study is to improve algorithms implemented developed by the authors for this purpose and experimentally evaluate these methods. The most effective algorithm for synchronization and desynchronization found in a former study were modified to improve testing and vary its methods of calculation. A multi-robot platform composed of multiple Roomba robots was used in the experimental study. Observation of data showed how adjusting parameters of the algorithms affected both the time to reach a desired state of synchronization or desynchronization and how the network maintained this state. Testing three different methods on each algorithm showed differing results. Future work in cooperative robotics will likely see success using these algorithms to accomplish a variety of tasks.
Quantization Avoids Saddle Points in Distributed Optimization
Bo, Yanan, Wang, Yongqiang
Distributed nonconvex optimization underpins key functionalities of numerous distributed systems, ranging from power systems, smart buildings, cooperative robots, vehicle networks to sensor networks. Recently, it has also merged as a promising solution to handle the enormous growth in data and model sizes in deep learning. A fundamental problem in distributed nonconvex optimization is avoiding convergence to saddle points, which significantly degrade optimization accuracy. We discover that the process of quantization, which is necessary for all digital communications, can be exploited to enable saddle-point avoidance. More specifically, we propose a stochastic quantization scheme and prove that it can effectively escape saddle points and ensure convergence to a second-order stationary point in distributed nonconvex optimization. With an easily adjustable quantization granularity, the approach allows a user to control the number of bits sent per iteration and, hence, to aggressively reduce the communication overhead. Numerical experimental results using distributed optimization and learning problems on benchmark datasets confirm the effectiveness of the approach.
Privacy-Preserving Distributed Optimization and Learning
Chen, Ziqin, Wang, Yongqiang
Distributed optimization and learning has recently garnered great attention due to its wide applications in sensor networks, smart grids, machine learning, and so forth. Despite rapid development, existing distributed optimization and learning algorithms require each agent to exchange messages with its neighbors, which may expose sensitive information and raise significant privacy concerns. In this survey paper, we overview privacy-preserving distributed optimization and learning methods. We first discuss cryptography, differential privacy, and other techniques that can be used for privacy preservation and indicate their pros and cons for privacy protection in distributed optimization and learning. We believe that among these approaches, differential privacy is most promising due to its low computational and communication complexities, which are extremely appealing for modern learning based applications with high dimensions of optimization variables. We then introduce several differential-privacy algorithms that can simultaneously ensure privacy and optimization accuracy. Moreover, we provide example applications in several machine learning problems to confirm the real-world effectiveness of these algorithms. Finally, we highlight some challenges in this research domain and discuss future directions.
Multilingual and Fully Non-Autoregressive ASR with Large Language Model Fusion: A Comprehensive Study
Huang, W. Ronny, Allauzen, Cyril, Chen, Tongzhou, Gupta, Kilol, Hu, Ke, Qin, James, Zhang, Yu, Wang, Yongqiang, Chang, Shuo-Yiin, Sainath, Tara N.
In the era of large models, the autoregressive nature of decoding often results in latency serving as a significant bottleneck. We propose a non-autoregressive LM-fused ASR system that effectively leverages the parallelization capabilities of accelerator hardware. Our approach combines the Universal Speech Model (USM) and the PaLM 2 language model in per-segment scoring mode, achieving an average relative WER improvement across all languages of 10.8% on FLEURS and 3.6% on YouTube captioning. Furthermore, our comprehensive ablation study analyzes key parameters such as LLM size, context length, vocabulary size, fusion methodology. For instance, we explore the impact of LLM size ranging from 128M to 340B parameters on ASR performance. This study provides valuable insights into the factors influencing the effectiveness of practical large-scale LM-fused speech recognition systems.
USM-SCD: Multilingual Speaker Change Detection Based on Large Pretrained Foundation Models
Zhao, Guanlong, Wang, Yongqiang, Pelecanos, Jason, Zhang, Yu, Liao, Hank, Huang, Yiling, Lu, Han, Wang, Quan
We introduce a multilingual speaker change detection model (USM-SCD) that can simultaneously detect speaker turns and perform ASR for 96 languages. This model is adapted from a speech foundation model trained on a large quantity of supervised and unsupervised data, demonstrating the utility of fine-tuning from a large generic foundation model for a downstream task. We analyze the performance of this multilingual speaker change detection model through a series of ablation studies. We show that the USM-SCD model can achieve more than 75% average speaker change detection F1 score across a test set that consists of data from 96 languages. On American English, the USM-SCD model can achieve an 85.8% speaker change detection F1 score across various public and internal test sets, beating the previous monolingual baseline model by 21% relative. We also show that we only need to fine-tune one-quarter of the trainable model parameters to achieve the best model performance. The USM-SCD model exhibits state-of-the-art ASR quality compared with a strong public ASR baseline, making it suitable to handle both tasks with negligible additional computational cost.
Locally Differentially Private Gradient Tracking for Distributed Online Learning over Directed Graphs
Chen, Ziqin, Wang, Yongqiang
Distributed online learning has been proven extremely effective in solving large-scale machine learning problems over streaming data. However, information sharing between learners in distributed learning also raises concerns about the potential leakage of individual learners' sensitive data. To mitigate this risk, differential privacy, which is widely regarded as the "gold standard" for privacy protection, has been widely employed in many existing results on distributed online learning. However, these results often face a fundamental tradeoff between learning accuracy and privacy. In this paper, we propose a locally differentially private gradient tracking based distributed online learning algorithm that successfully circumvents this tradeoff. We prove that the proposed algorithm converges in mean square to the exact optimal solution while ensuring rigorous local differential privacy, with the cumulative privacy budget guaranteed to be finite even when the number of iterations tends to infinity. The algorithm is applicable even when the communication graph among learners is directed. To the best of our knowledge, this is the first result that simultaneously ensures learning accuracy and rigorous local differential privacy in distributed online learning over directed graphs. We evaluate our algorithm's performance by using multiple benchmark machine-learning applications, including logistic regression of the "Mushrooms" dataset and CNN-based image classification of the "MNIST" and "CIFAR-10" datasets, respectively. The experimental results confirm that the proposed algorithm outperforms existing counterparts in both training and testing accuracies.
SLM: Bridge the thin gap between speech and text foundation models
Wang, Mingqiu, Han, Wei, Shafran, Izhak, Wu, Zelin, Chiu, Chung-Cheng, Cao, Yuan, Wang, Yongqiang, Chen, Nanxin, Zhang, Yu, Soltau, Hagen, Rubenstein, Paul, Zilka, Lukas, Yu, Dian, Meng, Zhong, Pundak, Golan, Siddhartha, Nikhil, Schalkwyk, Johan, Wu, Yonghui
We present a joint Speech and Language Model (SLM), a multitask, multilingual, and dual-modal model that takes advantage of pretrained foundational speech and language models. SLM freezes the pretrained foundation models to maximally preserves their capabilities, and only trains a simple adapter with just 1\% (156M) of the foundation models' parameters. This adaptation not only leads SLM to achieve strong performance on conventional tasks such as speech recognition (ASR) and speech translation (AST), but also introduces the novel capability of zero-shot instruction-following for more diverse tasks: given a speech input and a text instruction, SLM is able to perform unseen generation tasks including contextual biasing ASR using real-time context, dialog generation, speech continuation, and question answering, etc. Our approach demonstrates that the representational gap between pretrained speech and language models might be narrower than one would expect, and can be bridged by a simple adaptation mechanism. As a result, SLM is not only efficient to train, but also inherits strong capabilities already acquired in foundation models of different modalities.
Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
Zhang, Yu, Han, Wei, Qin, James, Wang, Yongqiang, Bapna, Ankur, Chen, Zhehuai, Chen, Nanxin, Li, Bo, Axelrod, Vera, Wang, Gary, Meng, Zhong, Hu, Ke, Rosenberg, Andrew, Prabhavalkar, Rohit, Park, Daniel S., Haghani, Parisa, Riesa, Jason, Perng, Ginger, Soltau, Hagen, Strohman, Trevor, Ramabhadran, Bhuvana, Sainath, Tara, Moreno, Pedro, Chiu, Chung-Cheng, Schalkwyk, Johan, Beaufays, Françoise, Wu, Yonghui
We introduce the Universal Speech Model (USM), a single large model that performs automatic speech recognition (ASR) across 100+ languages. This is achieved by pre-training the encoder of the model on a large unlabeled multilingual dataset of 12 million (M) hours spanning over 300 languages, and fine-tuning on a smaller labeled dataset. We use multilingual pre-training with random-projection quantization and speech-text modality matching to achieve state-of-the-art performance on downstream multilingual ASR and speech-to-text translation tasks. We also demonstrate that despite using a labeled training set 1/7-th the size of that used for the Whisper model [1], our model exhibits comparable or better performance on both in-domain and out-of-domain speech recognition tasks across many languages.