Goto

Collaborating Authors

 Zhou, Jian


Robust Federated Learning in Unreliable Wireless Networks: A Client Selection Approach

arXiv.org Artificial Intelligence

Federated learning (FL) has emerged as a promising distributed learning paradigm for training deep neural networks (DNNs) at the wireless edge, but its performance can be severely hindered by unreliable wireless transmission and inherent data heterogeneity among clients. Existing solutions primarily address these challenges by incorporating wireless resource optimization strategies, often focusing on uplink resource allocation across clients under the assumption of homogeneous client-server network standards. However, these approaches overlooked the fact that mobile clients may connect to the server via diverse network standards (e.g., 4G, 5G, Wi-Fi) with customized configurations, limiting the flexibility of server-side modifications and restricting applicability in real-world commercial networks. This paper presents a novel theoretical analysis about how transmission failures in unreliable networks distort the effective label distributions of local samples, causing deviations from the global data distribution and introducing convergence bias in FL. Our analysis reveals that a carefully designed client selection strategy can mitigate biases induced by network unreliability and data heterogeneity . Motivated by this insight, we propose FedCote, a client selection approach that optimizes client selection probabilities without relying on wireless resource scheduling. Experimental results demonstrate the robustness of FedCote in DNN-based classification tasks under unreliable networks with frequent transmission failures. With rapid advancements in mobile communications and artificial intelligence (AI), edge AI, which leverages locally generated data to train deep neural networks (DNNs) at the wireless edge, has gained significant attention from both academia and industry [1], [2], [3], [4]. A prominent approach in this domain is federated learning (FL), where an edge server coordinates mobile clients in collaboratively training a shared DNN model while ensuring client privacy [5], [6], [7]. However, FL faces a critical challenge due to ubiquitous data heterogeneity across clients, where training data are distributed in a non-i.i.d. and unbalanced manner. If not addressed, data heterogeneity can severely degrade FL performance [8], [9], [10], [11], [12]. Numerous FL algorithms have been proposed to mitigate this issue. For example, FedProx [13] introduced a regularization term in the local objective function to control model divergence, while SCAFFOLD [14] employed control variates to correct local model drift. HFMDS [15] learned essential class-relevant features of real samples to generate an auxiliary synthetic dataset, which was shared among clients for local training, helping to alleviate data heterogeneity .


Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model

arXiv.org Artificial Intelligence

We present Step-Video-T2V, a state-of-the-art text-to-video pre-trained model with 30B parameters and the ability to generate videos up to 204 frames in length. A deep compression Variational Autoencoder, Video-VAE, is designed for video generation tasks, achieving 16x16 spatial and 8x temporal compression ratios, while maintaining exceptional video reconstruction quality. User prompts are encoded using two bilingual text encoders to handle both English and Chinese. A DiT with 3D full attention is trained using Flow Matching and is employed to denoise input noise into latent frames. A video-based DPO approach, Video-DPO, is applied to reduce artifacts and improve the visual quality of the generated videos. We also detail our training strategies and share key observations and insights. Step-Video-T2V's performance is evaluated on a novel video generation benchmark, Step-Video-T2V-Eval, demonstrating its state-of-the-art text-to-video quality when compared with both open-source and commercial engines. Additionally, we discuss the limitations of current diffusion-based model paradigm and outline future directions for video foundation models. We make both Step-Video-T2V and Step-Video-T2V-Eval available at https://github.com/stepfun-ai/Step-Video-T2V. The online version can be accessed from https://yuewen.cn/videos as well. Our goal is to accelerate the innovation of video foundation models and empower video content creators.


Inverse Flow and Consistency Models

arXiv.org Artificial Intelligence

Inverse generation problems, such as denoising without ground truth observations, is a critical challenge in many scientific inquiries and real-world applications. While recent advances in generative models like diffusion models, conditional flow matching, and consistency models achieved impressive results by casting generation as denoising problems, they cannot be directly used for inverse generation without access to clean data. Here we introduce Inverse Flow (IF), a novel framework that enables using these generative models for inverse generation problems including denoising without ground truth. Inverse Flow can be flexibly applied to nearly any continuous noise distribution and allows complex dependencies. We propose two algorithms for learning Inverse Flows, Inverse Flow Matching (IFM) and Inverse Consistency Model (ICM). Notably, to derive the computationally efficient, simulation-free inverse consistency model objective, we generalized consistency training to any forward diffusion processes or conditional flows, which have applications beyond denoising. We demonstrate the effectiveness of IF on synthetic and real datasets, outperforming prior approaches while enabling noise distributions that previous methods cannot support. Finally, we showcase applications of our techniques to fluorescence microscopy and single-cell genomics data, highlighting IF's utility in scientific problems. Overall, this work expands the applications of powerful generative models to inversion generation problems.


Uncertainty-Aware Decision-Making and Planning for Autonomous Forced Merging

arXiv.org Artificial Intelligence

Abstract-- In this paper, we develop an uncertainty-aware decision-making and motion-planning method for an autonomous ego vehicle in forced merging scenarios, considering the motion uncertainty of surrounding vehicles. The forced merging scenario on the highway. Following the decision and the SVs' occupancy, the reference trajectory for the EV is calculated I. However, the motion uncertainties of dynamic obstacles The method is aware of the environmental uncertainties using introduce challenges to the safety in motion-planning online estimation of the acceleration bounds of the SVs, such problems. Among various uncertain scenarios, forced merging that it dynamically captures the uncertainties of SVs without on highways presents notable difficulties, as it demands inferring their intentions.


Boosting Medical Image-based Cancer Detection via Text-guided Supervision from Reports

arXiv.org Artificial Intelligence

The absence of adequately sufficient expert-level tumor annotations hinders the effectiveness of supervised learning based opportunistic cancer screening on medical imaging. Clinical reports (that are rich in descriptive textual details) can offer a "free lunch'' supervision information and provide tumor location as a type of weak label to cope with screening tasks, thus saving human labeling workloads, if properly leveraged. However, predicting cancer only using such weak labels can be very changeling since tumors are usually presented in small anatomical regions compared to the whole 3D medical scans. Weakly semi-supervised learning (WSSL) utilizes a limited set of voxel-level tumor annotations and incorporates alongside a substantial number of medical images that have only off-the-shelf clinical reports, which may strike a good balance between minimizing expert annotation workload and optimizing screening efficacy. In this paper, we propose a novel text-guided learning method to achieve highly accurate cancer detection results. Through integrating diagnostic and tumor location text prompts into the text encoder of a vision-language model (VLM), optimization of weakly supervised learning can be effectively performed in the latent space of VLM, thereby enhancing the stability of training. Our approach can leverage clinical knowledge by large-scale pre-trained VLM to enhance generalization ability, and produce reliable pseudo tumor masks to improve cancer detection. Our extensive quantitative experimental results on a large-scale cancer dataset, including 1,651 unique patients, validate that our approach can reduce human annotation efforts by at least 70% while maintaining comparable cancer detection accuracy to competing fully supervised methods (AUC value 0.961 versus 0.966).


Medical Visual Prompting (MVP): A Unified Framework for Versatile and High-Quality Medical Image Segmentation

arXiv.org Artificial Intelligence

Accurate segmentation of lesion regions is crucial for clinical diagnosis and treatment across various diseases. While deep convolutional networks have achieved satisfactory results in medical image segmentation, they face challenges such as loss of lesion shape information due to continuous convolution and downsampling, as well as the high cost of manually labeling lesions with varying shapes and sizes. To address these issues, we propose a novel medical visual prompting (MVP) framework that leverages pre-training and prompting concepts from natural language processing (NLP). The framework utilizes three key components: Super-Pixel Guided Prompting (SPGP) for superpixelating the input image, Image Embedding Guided Prompting (IEGP) for freezing patch embedding and merging with superpixels to provide visual prompts, and Adaptive Attention Mechanism Guided Prompting (AAGP) for pinpointing prompt content and efficiently adapting all layers. By integrating SPGP, IEGP, and AAGP, the MVP enables the segmentation network to better learn shape prompting information and facilitates mutual learning across different tasks. Extensive experiments conducted on five datasets demonstrate superior performance of this method in various challenging medical image tasks, while simplifying single-task medical segmentation models. This novel framework offers improved performance with fewer parameters and holds significant potential for accurate segmentation of lesion regions in various medical tasks, making it clinically valuable.


Robust Predictive Motion Planning by Learning Obstacle Uncertainty

arXiv.org Artificial Intelligence

Safe motion planning for robotic systems in dynamic environments is nontrivial in the presence of uncertain obstacles, where estimation of obstacle uncertainties is crucial in predicting future motions of dynamic obstacles. The worst-case characterization gives a conservative uncertainty prediction and may result in infeasible motion planning for the ego robotic system. In this paper, an efficient, robust, and safe motion-planing algorithm is developed by learning the obstacle uncertainties online. More specifically, the unknown yet intended control set of obstacles is efficiently computed by solving a linear programming problem. The learned control set is used to compute forward reachable sets of obstacles that are less conservative than the worst-case prediction. Based on the forward prediction, a robust model predictive controller is designed to compute a safe reference trajectory for the ego robotic system that remains outside the reachable sets of obstacles over the prediction horizon. The method is applied to a car-like mobile robot in both simulations and hardware experiments to demonstrate its effectiveness.


Hybrid Control Policy for Artificial Pancreas via Ensemble Deep Reinforcement Learning

arXiv.org Artificial Intelligence

Objective: The artificial pancreas (AP) has shown promising potential in achieving closed-loop glucose control for individuals with type 1 diabetes mellitus (T1DM). However, designing an effective control policy for the AP remains challenging due to the complex physiological processes, delayed insulin response, and inaccurate glucose measurements. While model predictive control (MPC) offers safety and stability through the dynamic model and safety constraints, it lacks individualization and is adversely affected by unannounced meals. Conversely, deep reinforcement learning (DRL) provides personalized and adaptive strategies but faces challenges with distribution shifts and substantial data requirements. Methods: We propose a hybrid control policy for the artificial pancreas (HyCPAP) to address the above challenges. HyCPAP combines an MPC policy with an ensemble DRL policy, leveraging the strengths of both policies while compensating for their respective limitations. To facilitate faster deployment of AP systems in real-world settings, we further incorporate meta-learning techniques into HyCPAP, leveraging previous experience and patient-shared knowledge to enable fast adaptation to new patients with limited available data. Results: We conduct extensive experiments using the FDA-accepted UVA/Padova T1DM simulator across three scenarios. Our approaches achieve the highest percentage of time spent in the desired euglycemic range and the lowest occurrences of hypoglycemia. Conclusion: The results clearly demonstrate the superiority of our methods for closed-loop glucose management in individuals with T1DM. Significance: The study presents novel control policies for AP systems, affirming the great potential of proposed methods for efficient closed-loop glucose control.


Dirichlet Diffusion Score Model for Biological Sequence Generation

arXiv.org Artificial Intelligence

Designing biological sequences is an important challenge that requires satisfying complex constraints and thus is a natural problem to address with deep generative modeling. Diffusion generative models have achieved considerable success in many applications. Score-based generative stochastic differential equations (SDE) model is a continuous-time diffusion model framework that enjoys many benefits, but the originally proposed SDEs are not naturally designed for modeling discrete data. To develop generative SDE models for discrete data such as biological sequences, here we introduce a diffusion process defined in the probability simplex space with stationary distribution being the Dirichlet distribution. This makes diffusion in continuous space natural for modeling discrete data. We refer to this approach as Dirchlet diffusion score model. We demonstrate that this technique can generate samples that satisfy hard constraints using a Sudoku generation task. This generative model can also solve Sudoku, including hard puzzles, without additional training. Finally, we applied this approach to develop the first human promoter DNA sequence design model and showed that designed sequences share similar properties with natural promoter sequences.


Controlled physics-informed data generation for deep learning-based remaining useful life prediction under unseen operation conditions

arXiv.org Artificial Intelligence

Limited availability of representative time-to-failure (TTF) trajectories either limits the performance of deep learning (DL)-based approaches on remaining useful life (RUL) prediction in practice or even precludes their application. Generating synthetic data that is physically plausible is a promising way to tackle this challenge. In this study, a novel hybrid framework combining the controlled physics-informed data generation approach with a deep learning-based prediction model for prognostics is proposed. In the proposed framework, a new controlled physics-informed generative adversarial network (CPI-GAN) is developed to generate synthetic degradation trajectories that are physically interpretable and diverse. Five basic physics constraints are proposed as the controllable settings in the generator. A physics-informed loss function with penalty is designed as the regularization term, which ensures that the changing trend of system health state recorded in the synthetic data is consistent with the underlying physical laws. Then, the generated synthetic data is used as input of the DL-based prediction model to obtain the RUL estimations. The proposed framework is evaluated based on new Commercial Modular Aero-Propulsion System Simulation (N-CMAPSS), a turbofan engine prognostics dataset where a limited avail-ability of TTF trajectories is assumed. The experimental results demonstrate that the proposed framework is able to generate synthetic TTF trajectories that are consistent with underlying degradation trends. The generated trajectories enable to significantly improve the accuracy of RUL predictions.