Yang, Tao
Invariant Federated Learning for Edge Intelligence: Mitigating Heterogeneity and Asynchrony via Exit Strategy and Invariant Penalty
Hao, Ziruo, Cui, Zhenhua, Yang, Tao, Hu, Bo, Wu, Xiaofeng, Feng, Hui
This paper provides an invariant federated learning system for resource-constrained edge intelligence. This framework can avoid the impact of heterogeneity and asynchrony by exit strategy and invariant penalty. We decompose local information into two orthogonal components to measure the contribution or impact of heterogeneous and asynchronous clients. We propose that the exit of abnormal clients can guarantee the effect of the model on most clients. Meanwhile, to ensure the models' performance on exited abnormal clients and those who lack training resources, we propose Federated Learning with Invariant Penalty for Generalization (FedIPG) based on the invariant orthogonal decomposition of parameters. Theoretical proof shows that FedIPG reduces the Out-Of-Distribution prediction loss without increasing the communication burden. The performance of FedIPG combined with an exit strategy is tested empirically in multiple scales using four datasets. It shows our system can enhance In-Distribution performance and outperform the state-of-the-art algorithm in Out-Of-Distribution generalization while maintaining model convergence. Additionally, the results of the visual experiment prove that FedIPG contains preliminary causality in terms of ignoring confounding features.
Stabilization Analysis and Mode Recognition of Kerosene Supersonic Combustion: A Deep Learning Approach Based on Res-CNN-beta-VAE
Xu, Weiming, Yang, Tao, Liu, Chang, Wu, Kun, Zhang, Peng
The scramjet engine is a key propulsion system for hypersonic vehicles, leveraging supersonic airflow to achieve high specific impulse, making it a promising technology for aerospace applications. Understanding and controlling the complex interactions between fuel injection, turbulent combustion, and aerodynamic effects of compressible flows are crucial for ensuring stable combustion in scramjet engines. However, identifying stable modes in scramjet combustors is often challenging due to limited experimental measurement means and extremely complex spatiotemporal evolution of supersonic turbulent combustion. This work introduces an innovative deep learning framework that combines dimensionality reduction via the Residual Convolutional Neural Network-beta-Variational Autoencoder (Res-CNN-beta-VAE) model with unsupervised clustering (K-means) to identify and analyze dynamical combustion modes in a supersonic combustor. By mapping high-dimensional data of combustion snapshots to a reduced three-dimensional latent space, the Res-CNN-beta-VAE model captures the essential temporal and spatial features of flame behaviors and enables the observation of transitions between combustion states. By analyzing the standard deviation of latent variable trajectories, we introduce a novel method for objectively distinguishing between dynamic transitions, which provides a scalable and expert-independent alternative to traditional classification methods. Besides, the unsupervised K-means clustering approach effectively identifies the complex interplay between the cavity and the jet-wake stabilization mechanisms, offering new insights into the system's behavior across different gas-to-liquid mass flow ratios (GLRs).
Dynamical Mode Recognition of Turbulent Flames in a Swirl-stabilized Annular Combustor by a Time-series Learning Approach
Yang, Tao, Xu, Weiming, Xu, Liangliang, Zhang, Peng
Thermoacoustic instability in annular combustors, essential to aero engines and modern gas turbines, can severely impair operational stability and efficiency, accurately recognizing and understanding various combustion modes is the prerequisite for understanding and controlling combustion instabilities. However, the high-dimensional spatial-temporal dynamics of turbulent flames typically pose considerable challenges to mode recognition. Based on the bidirectional temporal and nonlinear dimensionality reduction models, this study introduces a two-layer bidirectional long short-term memory variational autoencoder, Bi-LSTM-VAE model, to effectively recognize dynamical modes in annular combustion systems. Specifically, leveraging 16 pressure signals from a swirl-stabilized annular combustor, the model maps complex dynamics into a low-dimensional latent space while preserving temporal dependency and nonlinear behavior features through the recurrent neural network structure. The results show that the novel Bi-LSTM-VAE method enables a clear representation of combustion states in two-dimensional state space. Analysis of latent variable distributions reveals distinct patterns corresponding to a wide range of equivalence ratios and premixed fuel and air mass flow rates, offering novel insights into mode classification and transitions, highlighting this model's potential for deciphering complex thermoacoustic phenomena.
Inductive Spatio-Temporal Kriging with Physics-Guided Increment Training Strategy for Air Quality Inference
Yang, Songlin, Yang, Tao, Hu, Bo
The deployment of sensors for air quality monitoring is constrained by high costs, leading to inadequate network coverage and data deficits in some areas. Utilizing existing observations, spatio-temporal kriging is a method for estimating air quality at unobserved locations during a specific period. Inductive spatio-temporal kriging with increment training strategy has demonstrated its effectiveness using virtual nodes to simulate unobserved nodes. However, a disparity between virtual and real nodes persists, complicating the application of learning patterns derived from virtual nodes to actual unobserved ones. To address these limitations, this paper presents a Physics-Guided Increment Training Strategy (PGITS). Specifically, we design a dynamic graph generation module to incorporate the advection and diffusion processes of airborne particles as physical knowledge into the graph structure, dynamically adjusting the adjacency matrix to reflect physical interactions between nodes. By using physics principles as a bridge between virtual and real nodes, this strategy ensures the features of virtual nodes and their pseudo labels are closer to actual nodes. Consequently, the learned patterns of virtual nodes can be applied to actual unobserved nodes for effective kriging.
Efficient UAV Swarm-Based Multi-Task Federated Learning with Dynamic Task Knowledge Sharing
Yang, Yubo, Yang, Tao, Wu, Xiaofeng, Guo, Ziyu, Hu, Bo
UAV swarms are widely used in emergency communications, area monitoring, and disaster relief. Coordinated by control centers, they are ideal for federated learning (FL) frameworks. However, current UAV-assisted FL methods primarily focus on single tasks, overlooking the need for multi-task training. In disaster relief scenarios, UAVs perform tasks such as crowd detection, road feasibility analysis, and disaster assessment, which exhibit time-varying demands and potential correlations. In order to meet the time-varying requirements of tasks and complete multiple tasks efficiently under resource constraints, in this paper, we propose a UAV swarm based multi-task FL framework, where ground emergency vehicles (EVs) collaborate with UAVs to accomplish multiple tasks efficiently under constrained energy and bandwidth resources. Through theoretical analysis, we identify key factors affecting task performance and introduce a task attention mechanism to dynamically evaluate task importance, thereby achieving efficient resource allocation. Additionally, we propose a task affinity (TA) metric to capture the dynamic correlation among tasks, thereby promoting task knowledge sharing to accelerate training and improve the generalization ability of the model in different scenarios. To optimize resource allocation, we formulate a two-layer optimization problem to jointly optimize UAV transmission power, computation frequency, bandwidth allocation, and UAV-EV associations. For the inner problem, we derive closed-form solutions for transmission power, computation frequency, and bandwidth allocation and apply a block coordinate descent method for optimization. For the outer problem, a two-stage algorithm is designed to determine optimal UAV-EV associations. Furthermore, theoretical analysis reveals a trade-off between UAV energy consumption and multi-task performance.
Drift-Aware Federated Learning: A Causal Perspective
Fang, Yunjie, Wu, Sheng, Yang, Tao, Wu, Xiaofeng, Hu, Bo
Federated learning (FL) facilitates collaborative model training among multiple clients while preserving data privacy, often resulting in enhanced performance compared to models trained by individual clients. However, factors such as communication frequency and data distribution can contribute to feature drift, hindering the attainment of optimal training performance. This paper examine the relationship between model update drift and global as well as local optimizer from causal perspective. The influence of the global optimizer on feature drift primarily arises from the participation frequency of certain clients in server updates, whereas the effect of the local optimizer is typically associated with imbalanced data distributions.To mitigate this drift, we propose a novel framework termed Causal drift-Aware Federated lEarning (CAFE). CAFE exploits the causal relationship between feature-invariant components and classification outcomes to independently calibrate local client sample features and classifiers during the training phase. In the inference phase, it eliminated the drifts in the global model that favor frequently communicating clients.Experimental results demonstrate that CAFE's integration of feature calibration, parameter calibration, and historical information effectively reduces both drift towards majority classes and tendencies toward frequently communicating nodes.
The Impact Analysis of Delays in Asynchronous Federated Learning with Data Heterogeneity for Edge Intelligence
Hao, Ziruo, Cui, Zhenhua, Yang, Tao, Hu, Bo, Wu, Xiaofeng, Feng, Hui
Federated learning (FL) has provided a new methodology for coordinating a group of clients to train a machine learning model collaboratively, bringing an efficient paradigm in edge intelligence. Despite its promise, FL faces several critical challenges in practical applications involving edge devices, such as data heterogeneity and delays stemming from communication and computation constraints. This paper examines the impact of unknown causes of delay on training performance in an Asynchronous Federated Learning (AFL) system with data heterogeneity. Initially, an asynchronous error definition is proposed, based on which the solely adverse impact of data heterogeneity is theoretically analyzed within the traditional Synchronous Federated Learning (SFL) framework. Furthermore, Asynchronous Updates with Delayed Gradients (AUDG), a conventional AFL scheme, is discussed. Investigation into AUDG reveals that the negative influence of data heterogeneity is correlated with delays, while a shorter average delay from a specific client does not consistently enhance training performance. In order to compensate for the scenarios where AUDG are not adapted, Pseudo-synchronous Updates by Reusing Delayed Gradients (PSURDG) is proposed, and its theoretical convergence is analyzed. In both AUDG and PSURDG, only a random set of clients successfully transmits their updated results to the central server in each iteration. The critical difference between them lies in whether the delayed information is reused. Finally, both schemes are validated and compared through theoretical analysis and simulations, demonstrating more intuitively that discarding outdated information due to time delays is not always the best approach.
MAB-Based Channel Scheduling for Asynchronous Federated Learning in Non-Stationary Environments
Li, Zhiyin, Yang, Yubo, Yang, Tao, Wu, Xiaofeng, Guo, Ziyu, Hu, Bo
--Federated learning enables distributed model training across clients under central coordination without raw data exchange. However, in wireless implementations, frequent parameter updates between the server and clients create significant communication overhead. While existing research assumes either known channel state information (CSI) or that the channel follows a stationary distribution, practical wireless channels exhibit non-stationary characteristics due to channel fading, user mobility, and hostile attacks in telecommunication networks. The unavailability of both CSI and time-varying channel distribution can lead to unpredictable failures in parameter transmission, exacerbating clients staleness thus affecting model convergence. T o address these challenges, we propose an asynchronous federated learning scheduling framework for non-stationary channel environments, designed to reduce clients staleness while promoting both fair and efficient communication and aggregation. This framework considers two channel scenarios: extremely non-stationary and piecewise-stationary channels. Age of Information (AoI) serves as a metric to quantify client staleness under non-stationary conditions. Firstly, we perform a rigorous convergence analysis to explore the impact of AoI and per-round client participation on learning performance. The channel scheduling problem in the non-stationary scenario is addressed and formulated within the multi-armed bandit (MAB) framework and we derive the achievable theoretical lower bounds on the AoI regret. Based on this framework, we propose corresponding scheduling strategies for the two non-stationary channel scenarios that leverage the foundations of the GLR-CUCB and M-exp3 algorithms, along with derivations of their respective upper bounds on AoI regret. Additionally, to address the issue of imbalanced client updates in non-stationary channels, we introduce an adaptive matching strategy that incorporates considerations of marginal utility and fairness of clients. Simulation results demonstrate that the proposed algorithm achieves sub-linear growth in AoI regret, accelerates federated learning convergence, and promotes fairer aggregation. HE proliferation of Internet of Things (IoT) devices and the rise of edge computing have resulted in an increasingly decentralized distribution of data across end devices, such as smartphones and sensors. In traditional centralized machine learning approaches, data consolidation at a single location is required, which raises privacy concerns and incurs significant communication overhead.
PsyPlay: Personality-Infused Role-Playing Conversational Agents
Yang, Tao, Zhu, Yuhua, Quan, Xiaojun, Liu, Cong, Wang, Qifan
The current research on Role-Playing Conversational Agents (RPCAs) with Large Language Models (LLMs) primarily focuses on imitating specific speaking styles and utilizing character backgrounds, neglecting the depiction of deeper personality traits.~In this study, we introduce personality-infused role-playing for LLM agents, which encourages agents to accurately portray their designated personality traits during dialogues. We then propose PsyPlay, a dialogue generation framework that facilitates the expression of rich personalities among multiple LLM agents. Specifically, PsyPlay enables agents to assume roles with distinct personality traits and engage in discussions centered around specific topics, consistently exhibiting their designated personality traits throughout the interactions. Validation on generated dialogue data demonstrates that PsyPlay can accurately portray the intended personality traits, achieving an overall success rate of 80.31% on GPT-3.5. Notably, we observe that LLMs aligned with positive values are more successful in portraying positive personality roles compared to negative ones. Moreover, we construct a dialogue corpus for personality-infused role-playing, called PsyPlay-Bench. The corpus, which consists of 4745 instances of correctly portrayed dialogues using PsyPlay, aims to further facilitate research in personalized role-playing and dialogue personality detection.
Multimodal Magic Elevating Depression Detection with a Fusion of Text and Audio Intelligence
Gan, Lindy, Huang, Yifan, Gao, Xiaoyang, Tan, Jiaming, Zhao, Fujun, Yang, Tao
ABSTRACT This study proposes an innovative multimodal fusion model based on a teacherstudent architecture to enhance the accuracy of depression classification. Our designed model addresses the limitations of traditional methods in feature fusion and modality weight allocation by introducing multi-head attention mechanisms and weighted multimodal transfer learning. Leveraging the DAIC-WOZ dataset, the student fusion model, guided by textual and auditory teacher models, achieves significant improvements in classification accuracy. Ablation experiments demonstrate that the proposed model attains an F1 score of 99. 1% on the test set, significantly outperforming unimodal and conventional approaches. Our method effectively captures the complementarity between textual and audio features while dynamically adjusting the contributions of the teacher models to enhance generalization capabilities. The experimental results highlight the robustness and adaptability of the proposed framework in handling complex multimodal data. This research provides a novel technical framework for multimodal large model learning in depression analysis, offering new insights into addressing the limitations of existing methods in modality fusion and feature extraction. INTRODUCTION Depression is a significant global health concern that affects millions of individuals across various demographics, leading to considerable social, economic, and health-related impacts. According to the World Health Organization (WHO), depression is one of the leading causes of disability worldwide, with over 264 million people affected.