Not enough data to create a plot.
Try a different view from the menu above.
Tian, Zhi
Towards Trustworthy Federated Learning
Basharat, Alina, Bian, Yijun, Xu, Ping, Tian, Zhi
This paper develops a comprehensive framework to address three critical trustworthy challenges in federated learning (FL): robustness against Byzantine attacks, fairness, and privacy preservation. To improve the system's defense against Byzantine attacks that send malicious information to bias the system's performance, we develop a Two-sided Norm Based Screening (TNBS) mechanism, which allows the central server to crop the gradients that have the l lowest norms and h highest norms. TNBS functions as a screening tool to filter out potential malicious participants whose gradients are far from the honest ones. To promote egalitarian fairness, we adopt the q-fair federated learning (q-FFL). Furthermore, we adopt a differential privacy-based scheme to prevent raw data at local clients from being inferred by curious parties. Convergence guarantees are provided for the proposed framework under different scenarios. Experimental results on real datasets demonstrate that the proposed framework effectively improves robustness and fairness while managing the trade-off between privacy and accuracy. This work appears to be the first study that experimentally and theoretically addresses fairness, privacy, and robustness in trustworthy FL.
Unraveling the Mystery of Scaling Laws: Part I
Su, Hui, Tian, Zhi, Shen, Xiaoyu, Cai, Xunliang
Scaling law principles indicate a power-law correlation between loss and variables such as model size, dataset size, and computational resources utilized during training. These principles play a vital role in optimizing various aspects of model pre-training, ultimately contributing to the success of large language models such as GPT-4, Llama and Gemini. However, the original scaling law paper by OpenAI did not disclose the complete details necessary to derive the precise scaling law formulas, and their conclusions are only based on models containing up to 1.5 billion parameters. Though some subsequent works attempt to unveil these details and scale to larger models, they often neglect the training dependency of important factors such as the learning rate, context length and batch size, leading to their failure to establish a reliable formula for predicting the test loss trajectory. In this technical report, we confirm that the scaling law formulations proposed in the original OpenAI paper remain valid when scaling the model size up to 33 billion, but the constant coefficients in these formulas vary significantly with the experiment setup. We meticulously identify influential factors and provide transparent, step-by-step instructions to estimate all constant terms in scaling-law formulas by training on models with only 1M~60M parameters. Using these estimated formulas, we showcase the capability to accurately predict various attributes for models with up to 33B parameters before their training, including (1) the minimum possible test loss; (2) the minimum required training steps and processed tokens to achieve a specific loss; (3) the critical batch size with an optimal time/computation trade-off at any loss value; and (4) the complete test loss trajectory with arbitrary batch size.
Distributed Swarm Learning for Edge Internet of Things
Wang, Yue, Tian, Zhi, Fan, FXin, Cai, Zhipeng, Nowzari, Cameron, Zeng, Kai
The rapid growth of Internet of Things (IoT) has led to Challenge-2: Non-convex optimization. Gradient-based algorithms the widespread deployment of smart IoT devices at wireless get trapped in local optima when tackling non-convex edge for collaborative machine learning tasks, ushering in a problems, e.g., training neural networks with nonlinear activation. With a huge number of hardwareconstrained This problem worsens in distributed learning, particularly IoT devices operating in resource-limited wireless in IoT scenarios where edge devices access limited data. Edge learning including communication and computation bottlenecks, device faces statistical heterogeneity in local training data across and data heterogeneity, security risks, privacy leakages, nonconvex workers, also known as the non-i.i.d. To heterogeneity in IoT hardware capability and link quality, address these issues, this article explores a novel framework which degrades edge learning performance significantly.
FastPillars: A Deployment-friendly Pillar-based 3D Detector
Zhou, Sifan, Tian, Zhi, Chu, Xiangxiang, Zhang, Xinyu, Zhang, Bo, Lu, Xiaobo, Feng, Chengjian, Jie, Zequn, Chiang, Patrick Yin, Ma, Lin
The deployment of 3D detectors strikes one of the major challenges in real-world self-driving scenarios. Existing BEV-based (i.e., Bird Eye View) detectors favor sparse convolutions (known as SPConv) to speed up training and inference, which puts a hard barrier for deployment, especially for on-device applications. In this paper, to tackle the challenge of efficient 3D object detection from an industry perspective, we devise a deployment-friendly pillar-based 3D detector, termed FastPillars. First, we introduce a novel lightweight Max-and-Attention Pillar Encoding (MAPE) module specially for enhancing small 3D objects. Second, we propose a simple yet effective principle for designing a backbone in pillar-based 3D detection. We construct FastPillars based on these designs, achieving high performance and low latency without SPConv. Extensive experiments on two large-scale datasets demonstrate the effectiveness and efficiency of FastPillars for on-device 3D detection regarding both performance and speed. Specifically, FastPillars delivers state-of-the-art accuracy on Waymo Open Dataset with 1.8X speed up and 3.8 mAPH/L2 improvement over CenterPoint (SPConv-based). Our code is publicly available at: https://github.com/StiphyJay/FastPillars.
Byzantine-Robust Distributed Online Learning: Taming Adversarial Participants in An Adversarial Environment
Dong, Xingrong, Wu, Zhaoxian, Ling, Qing, Tian, Zhi
This paper studies distributed online learning under Byzantine attacks. The performance of an online learning algorithm is often characterized by (adversarial) regret, which evaluates the quality of one-step-ahead decision-making when an environment provides adversarial losses, and a sublinear bound is preferred. But we prove that, even with a class of state-of-the-art robust aggregation rules, in an adversarial environment and in the presence of Byzantine participants, distributed online gradient descent can only achieve a linear adversarial regret bound, which is tight. This is the inevitable consequence of Byzantine attacks, even though we can control the constant of the linear adversarial regret to a reasonable level. Interestingly, when the environment is not fully adversarial so that the losses of the honest participants are i.i.d. (independent and identically distributed), we show that sublinear stochastic regret, in contrast to the aforementioned adversarial regret, is possible. We develop a Byzantine-robust distributed online momentum algorithm to attain such a sublinear stochastic regret bound. Extensive numerical experiments corroborate our theoretical analysis.
Fed2: Feature-Aligned Federated Learning
Yu, Fuxun, Zhang, Weishan, Qin, Zhuwei, Xu, Zirui, Wang, Di, Liu, Chenchen, Tian, Zhi, Chen, Xiang
Federated learning learns from scattered data by fusing collaborative models from local nodes. However, the conventional coordinate-based model averaging by FedAvg ignored the random information encoded per parameter and may suffer from structural feature misalignment. In this work, we propose Fed2, a feature-aligned federated learning framework to resolve this issue by establishing a firm structure-feature alignment across the collaborative models. Fed2 is composed of two major designs: First, we design a feature-oriented model structure adaptation method to ensure explicit feature allocation in different neural network structures. Applying the structure adaptation to collaborative models, matchable structures with similar feature information can be initialized at the very early training stage. During the federated learning process, we then propose a feature paired averaging scheme to guarantee aligned feature distribution and maintain no feature fusion conflicts under either IID or non-IID scenarios. Eventually, Fed2 could effectively enhance the federated learning convergence performance under extensive homo- and heterogeneous settings, providing excellent convergence speed, accuracy, and computation/communication efficiency.
Twins: Revisiting the Design of Spatial Attention in Vision Transformers
Chu, Xiangxiang, Tian, Zhi, Wang, Yuqing, Zhang, Bo, Ren, Haibing, Wei, Xiaolin, Xia, Huaxia, Shen, Chunhua
Very recently, a variety of vision transformer architectures for dense prediction tasks have been proposed and they show that the design of spatial attention is critical to their success in these tasks. In this work, we revisit the design of the spatial attention and demonstrate that a carefully-devised yet simple spatial attention mechanism performs favourably against the state-of-the-art schemes. As a result, we propose two vision transformer architectures, namely, Twins-PCPVT and Twins-SVT. Our proposed architectures are highly-efficient and easy to implement, only involving matrix multiplications that are highly optimized in modern deep learning frameworks. More importantly, the proposed architectures achieve excellent performance on a wide range of visual tasks including imagelevel classification as well as dense detection and segmentation. The simplicity and strong performance suggest that our proposed architectures may serve as stronger backbones for many vision tasks. Our code will be released soon at https://github.com/Meituan-AutoML/Twins .
Conditional Positional Encodings for Vision Transformers
Chu, Xiangxiang, Tian, Zhi, Zhang, Bo, Wang, Xinlong, Wei, Xiaolin, Xia, Huaxia, Shen, Chunhua
We propose a conditional positional encoding (CPE) scheme for vision Transformers. Unlike previous fixed or learnable positional encodings, which are pre-defined and independent of input tokens, CPE is dynamically generated and conditioned on the local neighborhood of the input tokens. As a result, CPE can easily generalize to the input sequences that are longer than what the model has ever seen during training. Besides, CPE can keep the desired translation-invariance in the image classification task, resulting in improved classification accuracy. CPE can be effortlessly implemented with a simple Position Encoding Generator (PEG), and it can be seamlessly incorporated into the current Transformer framework. Built on PEG, we present Conditional Position encoding Vision Transformer (CPVT). We demonstrate that CPVT has visually similar attention maps compared to those with learned positional encodings. Benefit from the conditional positional encoding scheme, we obtain state-of-the-art results on the ImageNet classification task compared with vision Transformers to date. Our code will be made available at https://github.com/Meituan-AutoML/CPVT .
Heterogeneous Federated Learning
Yu, Fuxun, Zhang, Weishan, Qin, Zhuwei, Xu, Zirui, Wang, Di, Liu, Chenchen, Tian, Zhi, Chen, Xiang
Federated learning learns from scattered data by fusing collaborative models from local nodes. However, due to chaotic information distribution, the model fusion may suffer from structural misalignment with regard to unmatched parameters. In this work, we propose a novel federated learning framework to resolve this issue by establishing a firm structure-information alignment across collaborative models. Specifically, we design a feature-oriented regulation method ({$\Psi$-Net}) to ensure explicit feature information allocation in different neural network structures. Applying this regulating method to collaborative models, matchable structures with similar feature information can be initialized at the very early training stage. During the federated learning process under either IID or non-IID scenarios, dedicated collaboration schemes further guarantee ordered information distribution with definite structure matching, so as the comprehensive model alignment. Eventually, this framework effectively enhances the federated learning applicability to extensive heterogeneous settings, while providing excellent convergence speed, accuracy, and computation/communication efficiency.
A Class of Distributed Event-Triggered Average Consensus Algorithms for Multi-Agent Systems
Xu, Ping, Nowzari, Cameron, Tian, Zhi
This paper proposes a class of distributed event-triggered algorithms that solve the average consensus problem in multi-agent systems. By designing events such that a specifically chosen Lyapunov function is monotonically decreasing, event-triggered algorithms succeed in reducing communications among agents while still ensuring that the entire system converges to the desired state. However, depending on the chosen Lyapunov function the transient behaviors can be very different. Moreover, performance requirements also vary from application to application. Consequently, we are instead interested in considering a class of Lyapunov functions such that each Lyapunov function produces a different event-triggered coordination algorithm to solve the multi-agent average consensus problem. The proposed class of algorithms all guarantee exponential convergence of the resulting system and exclusion of Zeno behaviors. This allows us to easily implement different algorithms that all guarantee correctness to meet varying performance needs. We show that our findings can be applied to the practical clock synchronization problem in wireless sensor networks (WSNs) and further corroborate their effectiveness with simulation results.