Zhang, Ping
Towards Agentic AI Networking in 6G: A Generative Foundation Model-as-Agent Approach
Xiao, Yong, Shi, Guangming, Zhang, Ping
The promising potential of AI and network convergence in improving networking performance and enabling new service capabilities has recently attracted significant interest. Existing network AI solutions, while powerful, are mainly built based on the close-loop and passive learning framework, resulting in major limitations in autonomous solution finding and dynamic environmental adaptation. Agentic AI has recently been introduced as a promising solution to address the above limitations and pave the way for true generally intelligent and beneficial AI systems. The key idea is to create a networking ecosystem to support a diverse range of autonomous and embodied AI agents in fulfilling their goals. In this paper, we focus on the novel challenges and requirements of agentic AI networking. We propose AgentNet, a novel framework for supporting interaction, collaborative learning, and knowledge transfer among AI agents. We introduce a general architectural framework of AgentNet and then propose a generative foundation model (GFM)-based implementation in which multiple GFM-as-agents have been created as an interactive knowledge-base to bootstrap the development of embodied AI agents according to different task requirements and environmental features. We consider two application scenarios, digital-twin-based industrial automation and metaverse-based infotainment system, to describe how to apply AgentNet for supporting efficient task-driven collaboration and interaction among AI agents.
Revisiting semi-supervised learning in the era of foundation models
Zhang, Ping, Mai, Zheda, Nguyen, Quang-Huy, Chao, Wei-Lun
Semi-supervised learning (SSL) leverages abundant unlabeled data alongside limited labeled data to enhance learning. As vision foundation models (VFMs) increasingly serve as the backbone of vision applications, it remains unclear how SSL interacts with these pre-trained models. To address this gap, we develop new SSL benchmark datasets where frozen VFMs underperform and systematically evaluate representative SSL methods. We make a surprising observation: parameter-efficient fine-tuning (PEFT) using only labeled data often matches SSL performance, even without leveraging unlabeled data. This motivates us to revisit self-training, a conceptually simple SSL baseline, where we use the supervised PEFT model to pseudo-label unlabeled data for further training. To overcome the notorious issue of noisy pseudo-labels, we propose ensembling multiple PEFT approaches and VFM backbones to produce more robust pseudo-labels. Empirical results validate the effectiveness of this simple yet powerful approach, providing actionable insights into SSL with VFMs and paving the way for more scalable and practical semi-supervised learning in the era of foundation models.
Federated Inverse Probability Treatment Weighting for Individual Treatment Effect Estimation
Yin, Changchang, Chen, Hong-You, Chao, Wei-Lun, Zhang, Ping
Individual treatment effect (ITE) estimation is to evaluate the causal effects of treatment strategies on some important outcomes, which is a crucial problem in healthcare. Most existing ITE estimation methods are designed for centralized settings. However, in real-world clinical scenarios, the raw data are usually not shareable among hospitals due to the potential privacy and security risks, which makes the methods not applicable. In this work, we study the ITE estimation task in a federated setting, which allows us to harness the decentralized data from multiple hospitals. Due to the unavoidable confounding bias in the collected data, a model directly learned from it would be inaccurate. One well-known solution is Inverse Probability Treatment Weighting (IPTW), which uses the conditional probability of treatment given the covariates to re-weight each training example. Applying IPTW in a federated setting, however, is non-trivial. We found that even with a well-estimated conditional probability, the local model training step using each hospital's data alone would still suffer from confounding bias. To address this, we propose FED-IPTW, a novel algorithm to extend IPTW into a federated setting that enforces both global (over all the data) and local (within each hospital) decorrelation between covariates and treatments. We validated our approach on the task of comparing the treatment effects of mechanical ventilation on improving survival probability for patients with breadth difficulties in the intensive care unit (ICU). We conducted experiments on both synthetic and real-world eICU datasets and the results show that FED-IPTW outperform state-of-the-art methods on all the metrics on factual prediction and ITE estimation tasks, paving the way for personalized treatment strategy design in mechanical ventilation usage.
Biomedical Foundation Model: A Survey
Liu, Xiangrui, Zhang, Yuanyuan, Lu, Yingzhou, Yin, Changchang, Hu, Xiaoling, Liu, Xiaoou, Chen, Lulu, Wang, Sheng, Rodriguez, Alexander, Yao, Huaxiu, Yang, Yezhou, Zhang, Ping, Chen, Jintai, Fu, Tianfan, Wang, Xiao
Foundation models, first introduced in 2021, are large-scale pre-trained models (e.g., large language models (LLMs) and vision-language models (VLMs)) that learn from extensive unlabeled datasets through unsupervised methods, enabling them to excel in diverse downstream tasks. These models, like GPT, can be adapted to various applications such as question answering and visual understanding, outperforming task-specific AI models and earning their name due to broad applicability across fields. The development of biomedical foundation models marks a significant milestone in leveraging artificial intelligence (AI) to understand complex biological phenomena and advance medical research and practice. This survey explores the potential of foundation models across diverse domains within biomedical fields, including computational biology, drug discovery and development, clinical informatics, medical imaging, and public health. The purpose of this survey is to inspire ongoing research in the application of foundation models to health science.
Continual Learning-Aided Super-Resolution Scheme for Channel Reconstruction and Generalization in OFDM Systems
Chen, Jianqiao, Ma, Nan, Liu, Wenkai, Xu, Xiaodong, Zhang, Ping
Channel reconstruction and generalization capability are of equal importance for developing channel estimation schemes within deep learning (DL) framework. In this paper, we exploit a novel DL-based scheme for efficient OFDM channel estimation where the neural networks for channel reconstruction and generalization are respectively designed. For the former, we propose a dual-attention-aided super-resolution neural network (DA-SRNN) to map the channels at pilot positions to the whole time-frequency channels. Specifically, the channel-spatial attention mechanism is first introduced to sequentially infer attention maps along two separate dimensions corresponding to two types of underlying channel correlations, and then the lightweight SR module is developed for efficient channel reconstruction. For the latter, we introduce continual learning (CL)-aided training strategies to make the neural network adapt to different channel distributions. Specifically, the elastic weight consolidation (EWC) is introduced as the regularization term in regard to loss function of channel reconstruction, which can constrain the direction and space of updating the important weights of neural networks among different channel distributions. Meanwhile, the corresponding training process is provided in detail. By evaluating under 3rd Generation Partnership Project (3GPP) channel models, numerical results verify the superiority of the proposed channel estimation scheme with significantly improved channel reconstruction and generalization performance over counterparts.
NeRFCom: Feature Transform Coding Meets Neural Radiance Field for Free-View 3D Scene Semantic Transmission
Yue, Weijie, Si, Zhongwei, Wu, Bolin, Wang, Sixian, Qin, Xiaoqi, Niu, Kai, Dai, Jincheng, Zhang, Ping
Abstract--We introduce NeRFCom, a novel communication system designed for end-to-end 3D scene transmission. Comp ared to traditional systems relying on handcrafted NeRF semanti c feature decomposition for compression and well-adaptive c hannel coding for transmission error correction, our NeRFCom empl oys a nonlinear transform and learned probabilistic models, en abling flexible variable-rate joint source-channel coding and effi cient bandwidth allocation aligned with the NeRF semantic featur e's different contribution to the 3D scene synthesis fidelity. E xperi-mental results demonstrate that NeRFCom achieves free-vie w 3D scene efficient transmission while maintaining robustness under adverse channel conditions. Index T erms --Neural radiance field (NeRF), 3D scene transmission, semantic features, nonlinear transform coding. IRTUAL reality (VR) and augmented reality (AR) construct 3D scenes to provide users with immersive experiences [ 1 ]. However, traditional 3D scene synthesis techniques often rely on manual scene modeling, and the complex workflow increases the cost of deploying 3D technologies.
CLLoRA: An Approach to Measure the Effects of the Context Length for LLM Fine-Tuning
Zhang, Ping, Zhang, Zhaorui, Di, Sheng, Xin, Yao, Liu, Benben
Large language model fine-tuning has been identified as an efficient approach to applying the pre-trained Large language models to other domains. To guarantee data privacy for different data owners, models are often fine-tuned in federated learning environments across different data owners, which often involve data heterogeneity issues and affect the fine-tuning performance. In addition, the length of the context for the training data has been identified as a major factor that affects the LLM's model performance. To efficiently measure how the context length affects the LLM's model performance in heterogeneous federated learning environments, we propose CLLoRA. CLLoRA utilizes the parameter-efficient fine-tuning approach LoRA based on different kinds of LLMs with varying sizes as the fine-tuning approach to investigate whether the quality and length of contexts can serve as standards for measuring non-IID context. The findings indicate that an imbalance in context quality not only affects local training on clients but also impacts the global model's performance. However, context length has a minimal effect on local training but a more significant influence on the global model. These results provide insights into how context quality and length affect the model performance for LLM fine-tuning in federated learning environments.
Learnable Residual-based Latent Denoising in Semantic Communication
Xu, Mingkai, Wu, Yongpeng, Shi, Yuxuan, Xia, Xiang-Gen, Zhang, Wenjun, Zhang, Ping
A latent denoising semantic communication (SemCom) framework is proposed for robust image transmission over noisy channels. By incorporating a learnable latent denoiser into the receiver, the received signals are preprocessed to effectively remove the channel noise and recover the semantic information, thereby enhancing the quality of the decoded images. Specifically, a latent denoising mapping is established by an iterative residual learning approach to improve the denoising efficiency while ensuring stable performance. Moreover, channel signal-to-noise ratio (SNR) is utilized to estimate and predict the latent similarity score (SS) for conditional denoising, where the number of denoising steps is adapted based on the predicted SS sequence, further reducing the communication latency. Finally, simulations demonstrate that the proposed framework can effectively and efficiently remove the channel noise at various levels and reconstruct visual-appealing images.
Memory Analysis on the Training Course of DeepSeek Models
Zhang, Ping, Su, Lei
We present a theoretical analysis of GPU memory consumption during the training of DeepSeek models such as DeepSeek-v2 and DeepSeek-v3. Our primary objective is to clarify the device-level memory requirements associated with various distributed training configurations. Specifically, we examine critical factors influencing memory usage, including micro-batch size, activation recomputation policies, 3D parallelism, and ZeRO optimizations. It is important to emphasize that the training policies discussed in this report are not representative of DeepSeek's official configurations. Instead, they are explored to provide a deeper understanding of memory dynamics in training of large-scale mixture-of-experts model.
WatchGuardian: Enabling User-Defined Personalized Just-in-Time Intervention on Smartwatch
Lei, Ying, Cao, Yancheng, Wang, Will, Dong, Yuanzhe, Yin, Changchang, Cao, Weidan, Zhang, Ping, Yang, Jingzhen, Yao, Bingsheng, Peng, Yifan, Weng, Chunhua, Auerbach, Randy, Mamykina, Lena, Wang, Dakuo, Wang, Yuntao, Xu, Xuhai
While just-in-time interventions (JITIs) have effectively targeted common health behaviors, individuals often have unique needs to intervene in personal undesirable actions that can negatively affect physical, mental, and social well-being. We present WatchGuardian, a smartwatch-based JITI system that empowers users to define custom interventions for these personal actions with a small number of samples. For the model to detect new actions based on limited new data samples, we developed a few-shot learning pipeline that finetuned a pre-trained inertial measurement unit (IMU) model on public hand-gesture datasets. We then designed a data augmentation and synthesis process to train additional classification layers for customization. Our offline evaluation with 26 participants showed that with three, five, and ten examples, our approach achieved an average accuracy of 76.8%, 84.7%, and 87.7%, and an F1 score of 74.8%, 84.2%, and 87.2% We then conducted a four-hour intervention study to compare WatchGuardian against a rule-based intervention. Our results demonstrated that our system led to a significant reduction by 64.0 +- 22.6% in undesirable actions, substantially outperforming the baseline by 29.0%. Our findings underscore the effectiveness of a customizable, AI-driven JITI system for individuals in need of behavioral intervention in personal undesirable actions. We envision that our work can inspire broader applications of user-defined personalized intervention with advanced AI solutions.