AITopics

2601.07247

Genre: Research Report > New Finding (0.65)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Modeling & Simulation (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Neural Information Processing SystemsOct-10-2025, 14:59:08 GMT

Convergence Analysis of Split Federated Learning on Heterogeneous Data

However, FL is usually computationally intensive.

participation, sfl-v1, sfl-v2, (13 more...)

Neural Information Processing Systems

Country:

Asia > China (0.04)
North America > United States > Virginia (0.04)
North America > United States > California > Yolo County > Davis (0.04)
Asia > Nepal (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine (0.67)
Social Sector (0.46)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Faradonbeh, Mohamad Kazem Shirani, Shirani, Sadegh, Bayati, Mohsen

Analysis of Thompson Sampling for Controlling Unknown Linear Diffusion Processes

arXiv.org Artificial IntelligenceJun-10-2025

Linear diffusion processes serve as canonical continuous-time models for dynamic decision-making under uncertainty. These systems evolve according to drift matrices that specify the instantaneous rates of change in the expected system state, while also experiencing continuous random disturbances modeled by Brownian noise. For instance, in medical applications such as artificial pancreas systems, the drift matrices represent the internal dynamics of glucose concentrations. Classical results in stochastic control provide optimal policies under perfect knowledge of the drift matrices. However, practical decision-making scenarios typically feature uncertainty about the drift; in medical contexts, such parameters are patient-specific and unknown, requiring adaptive policies for efficiently learning the drift matrices while ensuring system stability and optimal performance. We study the Thompson sampling (TS) algorithm for decision-making in linear diffusion processes with unknown drift matrices. For this algorithm that designs control policies as if samples from a posterior belief about the parameters fully coincide with the unknown truth, we establish efficiency. That is, Thompson sampling learns optimal control actions fast, incurring only a square-root of time regret, and also learns to stabilize the system in a short time period. To our knowledge, this is the first such result for TS in a diffusion process control problem. Moreover, our empirical simulations in three settings that involve blood-glucose and flight control demonstrate that TS significantly improves regret, compared to the state-of-the-art algorithms, suggesting it explores in a more guarded fashion. Our theoretical analysis includes characterization of a certain optimality manifold that relates the geometry of the drift matrices to the optimal control of the diffusion process, among others.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

2206.09977

Country: North America > United States (0.27)

Genre: Research Report > New Finding (0.45)

Industry:

Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)
Transportation > Air (0.93)
Health & Medicine > Health Care Technology (0.87)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.67)

Hafshejani, Reza Sadeghi, Fradonbeh, Mohamad Kazem Shirani

Learning Unstable Continuous-Time Stochastic Linear Control Systems

arXiv.org Machine LearningSep-17-2024

We study the problem of system identification for stochastic continuous-time dynamics, based on a single finite-length state trajectory. We present a method for estimating the possibly unstable open-loop matrix by employing properly randomized control inputs. Then, we establish theoretical performance guarantees showing that the estimation error decays with trajectory length, a measure of excitability, and the signal-to-noise ratio, while it grows with dimension. Numerical illustrations that showcase the rates of learning the dynamics, will be provided as well. To perform the theoretical analysis, we develop new technical tools that are of independent interest. That includes non-asymptotic stochastic bounds for highly non-stationary martingales and generalized laws of iterated logarithms, among others.

matrix, nullnull null null 2, probability, (15 more...)

2409.11327

Country:

Asia > Middle East > Jordan (0.05)
North America > United States > Texas > Dallas County > Dallas (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Hungary > Budapest > Budapest (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

arXiv.org Machine LearningMay-24-2024

Accelerating Diffusion Models with Parallel Sampling: Inference at Sub-Linear Time Complexity

Chen, Haoxuan, Ren, Yinuo, Ying, Lexing, Rotskoff, Grant M.

Diffusion models have become a leading method for generativ e modeling of both image and scientific data. As these models are costly to train and evaluate, reducing the inference cost for diffusion models remains a maj or goal. Inspired by the recent empirical success in accelerating diffusion mod els via the parallel sampling technique [1], we propose to divide the sampling proce ss into O (1) blocks with parallelizable Picard iterations within each block. R igorous theoretical analysis reveals that our algorithm achieves null O (poly log d) overall time complexity, marking the first implementation with provable sub-linear complexi ty w.r .t. the data dimension d. Our analysis is based on a generalized version of Girsanov' s theorem and is compatible with both the SDE and probability fl ow ODE implementations. Our results shed light on the potential of fast a nd efficient sampling of high-dimensional data on fast-evolving modern large-me mory GPU clusters.

arxiv preprint arxiv, complexity, diffusion model, (13 more...)

2405.15986

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > Japan (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)

Kapllani, Lorenc, Teng, Long

A backward differential deep learning-based algorithm for solving high-dimensional nonlinear backward stochastic differential equations

arXiv.org Artificial IntelligenceApr-12-2024

In this work, we propose a novel backward differential deep learning-based algorithm for solving high-dimensional nonlinear backward stochastic differential equations (BSDEs), where the deep neural network (DNN) models are trained not only on the inputs and labels but also the differentials of the corresponding labels. This is motivated by the fact that differential deep learning can provide an efficient approximation of the labels and their derivatives with respect to inputs. The BSDEs are reformulated as differential deep learning problems by using Malliavin calculus. The Malliavin derivatives of solution to a BSDE satisfy themselves another BSDE, resulting thus in a system of BSDEs. Such formulation requires the estimation of the solution, its gradient, and the Hessian matrix, represented by the triple of processes $\left(Y, Z, \Gamma\right).$ All the integrals within this system are discretized by using the Euler-Maruyama method. Subsequently, DNNs are employed to approximate the triple of these unknown processes. The DNN parameters are backwardly optimized at each time step by minimizing a differential learning type loss function, which is defined as a weighted sum of the dynamics of the discretized BSDE system, with the first term providing the dynamics of the process $Y$ and the other the process $Z$. An error analysis is carried out to show the convergence of the proposed algorithm. Various numerical experiments up to $50$ dimensions are provided to demonstrate the high efficiency. Both theoretically and numerically, it is demonstrated that our proposed scheme is more efficient compared to other contemporary deep learning-based methodologies, especially in the computation of the process $\Gamma$.

algorithm, approximation, mse value, (11 more...)

2404.08456

Country:

North America > United States (0.16)
Europe > Germany (0.04)
Asia (0.04)

Genre: Research Report (0.50)

Industry: Banking & Finance (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceFeb-23-2024

Convergence Analysis of Split Federated Learning on Heterogeneous Data

Han, Pengchao, Huang, Chao, Tian, Geng, Tang, Ming, Liu, Xin

Split federated learning (SFL) is a recent distributed approach for collaborative model training among multiple clients. In SFL, a global model is typically split into two parts, where clients train one part in a parallel federated manner, and a main server trains the other. Despite the recent research on SFL algorithm development, the convergence analysis of SFL is missing in the literature, and this paper aims to fill this gap. The analysis of SFL can be more challenging than that of federated learning (FL), due to the potential dual-paced updates at the clients and the main server. We provide convergence analysis of SFL for strongly convex and general convex objectives on heterogeneous data. The convergence rates are $O(1/T)$ and $O(1/\sqrt[3]{T})$, respectively, where $T$ denotes the total number of rounds for SFL training. We further extend the analysis to non-convex objectives and where some clients may be unavailable during training. Numerical experiments validate our theoretical results and show that SFL outperforms FL and split learning (SL) when data is highly heterogeneous across a large number of clients.

convergence analysis, participation, split federated learning, (10 more...)

2402.15166

Country:

North America > United States > California > Santa Clara County > Stanford (0.04)
Asia > Nepal (0.04)

Genre: Research Report > New Finding (0.65)

Industry: Health & Medicine (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Pervej, Md Ferdous, Molisch, Andreas F.

Resource-Aware Hierarchical Federated Learning in Wireless Video Caching Networks

arXiv.org Artificial IntelligenceFeb-6-2024

Backhaul traffic congestion caused by the video traffic of a few popular files can be alleviated by storing the to-be-requested content at various levels in wireless video caching networks. Typically, content service providers (CSPs) own the content, and the users request their preferred content from the CSPs using their (wireless) internet service providers (ISPs). As these parties do not reveal their private information and business secrets, traditional techniques may not be readily used to predict the dynamic changes in users' future demands. Motivated by this, we propose a novel resource-aware hierarchical federated learning (RawHFL) solution for predicting user's future content requests. A practical data acquisition technique is used that allows the user to update its local training dataset based on its requested content. Besides, since networking and other computational resources are limited, considering that only a subset of the users participate in the model training, we derive the convergence bound of the proposed algorithm. Based on this bound, we minimize a weighted utility function for jointly configuring the controllable parameters to train the RawHFL energy efficiently under practical resource constraints. Our extensive simulation results validate the proposed algorithm's superiority, in terms of test accuracy and energy cost, over existing baselines.

edge round, gradient, sc 1null, (15 more...)

2402.04216

Country: North America > United States > California > Los Angeles County > Los Angeles (0.27)

Genre: Research Report (0.63)

Industry:

Information Technology > Security & Privacy (0.93)
Education (0.87)
Telecommunications (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Pal, Soumyabrata, Varshney, Prateek, Jain, Prateek, Thakurta, Abhradeep Guha, Madan, Gagan, Aggarwal, Gaurav, Shenoy, Pradeep, Srivastava, Gaurav

Sample-Efficient Personalization: Modeling User Parameters as Low Rank Plus Sparse Components

arXiv.org Machine LearningSep-5-2023

Personalization of machine learning (ML) predictions for individual users/domains/enterprises is critical for practical recommendation systems. Standard personalization approaches involve learning a user/domain specific embedding that is fed into a fixed global model which can be limiting. On the other hand, personalizing/fine-tuning model itself for each user/domain -- a.k.a meta-learning -- has high storage/infrastructure cost. Moreover, rigorous theoretical studies of scalable personalization approaches have been very limited. To address the above issues, we propose a novel meta-learning style approach that models network weights as a sum of low-rank and sparse components. This captures common information from multiple individuals/users together in the low-rank part while sparse part captures user-specific idiosyncrasies. We then study the framework in the linear setting, where the problem reduces to that of estimating the sum of a rank-$r$ and a $k$-column sparse matrix using a small number of linear measurements. We propose a computationally efficient alternating minimization method with iterative hard thresholding -- AMHT-LRS -- to learn the low-rank and sparse part. Theoretically, for the realizable Gaussian data setting, we show that AMHT-LRS solves the problem efficiently with nearly optimal sample complexity. Finally, a significant challenge in personalization is ensuring privacy of each user's sensitive data. We alleviate this problem by proposing a differentially private variant of our method that also is equipped with strong generalization guarantees.

artificial intelligence, machine learning, rd log, (18 more...)

2210.03505

Country: Asia > India (0.04)

Genre: Research Report (0.81)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Rout, Litu, Raoof, Negin, Daras, Giannis, Caramanis, Constantine, Dimakis, Alexandros G., Shakkottai, Sanjay

Solving Linear Inverse Problems Provably via Posterior Sampling with Latent Diffusion Models

arXiv.org Artificial IntelligenceJul-2-2023

We present the first framework to solve linear inverse problems leveraging pre-trained latent diffusion models. Previously proposed algorithms (such as DPS and DDRM) only apply to pixel-space diffusion models. We theoretically analyze our algorithm showing provable sample recovery in a linear model setting. The algorithmic insight obtained from our analysis extends to more general settings often considered in practice. Experimentally, we outperform previously proposed posterior sampling algorithms in a wide variety of problems including random inpainting, block inpainting, denoising, deblurring, destriping, and super-resolution.

artificial intelligence, diffusion model, machine learning, (15 more...)

2307.00619

Country:

North America > United States > Virginia (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
Asia > Singapore (0.04)

Genre:

Workflow (0.67)
Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)