nullnull null null 2
Multi-environment Invariance Learning with Missing Data
Learning models that can handle distribution shifts is a key challenge in domain generalization. Invariance learning, an approach that focuses on identifying features invariant across environments, improves model generalization by capturing stable relationships, which may represent causal effects when the data distribution is encoded within a structural equation model (SEM) and satisfies modularity conditions. This has led to a growing body of work that builds on invariance learning, leveraging the inherent heterogeneity across environments to develop methods that provide causal explanations while enhancing robust prediction. However, in many practical scenarios, obtaining complete outcome data from each environment is challenging due to the high cost or complexity of data collection. This limitation in available data hinders the development of models that fully leverage environmental heterogeneity, making it crucial to address missing outcomes to improve both causal insights and robust prediction. In this work, we derive an estimator from the invariance objective under missing outcomes. We establish non-asymptotic guarantees on variable selection property and $\ell_2$ error convergence rates, which are influenced by the proportion of missing data and the quality of imputation models across environments. We evaluate the performance of the new estimator through extensive simulations and demonstrate its application using the UCI Bike Sharing dataset to predict the count of bike rentals. The results show that despite relying on a biased imputation model, the estimator is efficient and achieves lower prediction error, provided the bias is within a reasonable range.
- North America > United States > District of Columbia > Washington (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > China (0.04)
- North America > United States > Virginia (0.04)
- North America > United States > California > Yolo County > Davis (0.04)
- Asia > Nepal (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine (0.67)
- Social Sector (0.46)
Analysis of Thompson Sampling for Controlling Unknown Linear Diffusion Processes
Faradonbeh, Mohamad Kazem Shirani, Shirani, Sadegh, Bayati, Mohsen
Linear diffusion processes serve as canonical continuous-time models for dynamic decision-making under uncertainty. These systems evolve according to drift matrices that specify the instantaneous rates of change in the expected system state, while also experiencing continuous random disturbances modeled by Brownian noise. For instance, in medical applications such as artificial pancreas systems, the drift matrices represent the internal dynamics of glucose concentrations. Classical results in stochastic control provide optimal policies under perfect knowledge of the drift matrices. However, practical decision-making scenarios typically feature uncertainty about the drift; in medical contexts, such parameters are patient-specific and unknown, requiring adaptive policies for efficiently learning the drift matrices while ensuring system stability and optimal performance. We study the Thompson sampling (TS) algorithm for decision-making in linear diffusion processes with unknown drift matrices. For this algorithm that designs control policies as if samples from a posterior belief about the parameters fully coincide with the unknown truth, we establish efficiency. That is, Thompson sampling learns optimal control actions fast, incurring only a square-root of time regret, and also learns to stabilize the system in a short time period. To our knowledge, this is the first such result for TS in a diffusion process control problem. Moreover, our empirical simulations in three settings that involve blood-glucose and flight control demonstrate that TS significantly improves regret, compared to the state-of-the-art algorithms, suggesting it explores in a more guarded fashion. Our theoretical analysis includes characterization of a certain optimality manifold that relates the geometry of the drift matrices to the optimal control of the diffusion process, among others.
- Asia > Middle East > Jordan (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > United States > Virginia (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)
- Transportation > Air (0.93)
- Health & Medicine > Health Care Technology (0.87)
Learning Unstable Continuous-Time Stochastic Linear Control Systems
Hafshejani, Reza Sadeghi, Fradonbeh, Mohamad Kazem Shirani
We study the problem of system identification for stochastic continuous-time dynamics, based on a single finite-length state trajectory. We present a method for estimating the possibly unstable open-loop matrix by employing properly randomized control inputs. Then, we establish theoretical performance guarantees showing that the estimation error decays with trajectory length, a measure of excitability, and the signal-to-noise ratio, while it grows with dimension. Numerical illustrations that showcase the rates of learning the dynamics, will be provided as well. To perform the theoretical analysis, we develop new technical tools that are of independent interest. That includes non-asymptotic stochastic bounds for highly non-stationary martingales and generalized laws of iterated logarithms, among others.
- Asia > Middle East > Jordan (0.05)
- North America > United States > Texas > Dallas County > Dallas (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Hungary > Budapest > Budapest (0.04)
Accelerating Diffusion Models with Parallel Sampling: Inference at Sub-Linear Time Complexity
Chen, Haoxuan, Ren, Yinuo, Ying, Lexing, Rotskoff, Grant M.
Diffusion models have become a leading method for generativ e modeling of both image and scientific data. As these models are costly to train and evaluate, reducing the inference cost for diffusion models remains a maj or goal. Inspired by the recent empirical success in accelerating diffusion mod els via the parallel sampling technique [1], we propose to divide the sampling proce ss into O (1) blocks with parallelizable Picard iterations within each block. R igorous theoretical analysis reveals that our algorithm achieves null O (poly log d) overall time complexity, marking the first implementation with provable sub-linear complexi ty w.r .t. the data dimension d. Our analysis is based on a generalized version of Girsanov' s theorem and is compatible with both the SDE and probability fl ow ODE implementations. Our results shed light on the potential of fast a nd efficient sampling of high-dimensional data on fast-evolving modern large-me mory GPU clusters.
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Asia > Japan (0.04)
A backward differential deep learning-based algorithm for solving high-dimensional nonlinear backward stochastic differential equations
In this work, we propose a novel backward differential deep learning-based algorithm for solving high-dimensional nonlinear backward stochastic differential equations (BSDEs), where the deep neural network (DNN) models are trained not only on the inputs and labels but also the differentials of the corresponding labels. This is motivated by the fact that differential deep learning can provide an efficient approximation of the labels and their derivatives with respect to inputs. The BSDEs are reformulated as differential deep learning problems by using Malliavin calculus. The Malliavin derivatives of solution to a BSDE satisfy themselves another BSDE, resulting thus in a system of BSDEs. Such formulation requires the estimation of the solution, its gradient, and the Hessian matrix, represented by the triple of processes $\left(Y, Z, \Gamma\right).$ All the integrals within this system are discretized by using the Euler-Maruyama method. Subsequently, DNNs are employed to approximate the triple of these unknown processes. The DNN parameters are backwardly optimized at each time step by minimizing a differential learning type loss function, which is defined as a weighted sum of the dynamics of the discretized BSDE system, with the first term providing the dynamics of the process $Y$ and the other the process $Z$. An error analysis is carried out to show the convergence of the proposed algorithm. Various numerical experiments up to $50$ dimensions are provided to demonstrate the high efficiency. Both theoretically and numerically, it is demonstrated that our proposed scheme is more efficient compared to other contemporary deep learning-based methodologies, especially in the computation of the process $\Gamma$.
- North America > United States (0.16)
- Europe > Germany (0.04)
- Asia (0.04)
Convergence Analysis of Split Federated Learning on Heterogeneous Data
Han, Pengchao, Huang, Chao, Tian, Geng, Tang, Ming, Liu, Xin
Split federated learning (SFL) is a recent distributed approach for collaborative model training among multiple clients. In SFL, a global model is typically split into two parts, where clients train one part in a parallel federated manner, and a main server trains the other. Despite the recent research on SFL algorithm development, the convergence analysis of SFL is missing in the literature, and this paper aims to fill this gap. The analysis of SFL can be more challenging than that of federated learning (FL), due to the potential dual-paced updates at the clients and the main server. We provide convergence analysis of SFL for strongly convex and general convex objectives on heterogeneous data. The convergence rates are $O(1/T)$ and $O(1/\sqrt[3]{T})$, respectively, where $T$ denotes the total number of rounds for SFL training. We further extend the analysis to non-convex objectives and where some clients may be unavailable during training. Numerical experiments validate our theoretical results and show that SFL outperforms FL and split learning (SL) when data is highly heterogeneous across a large number of clients.
- North America > United States > California > Santa Clara County > Stanford (0.04)
- Asia > Nepal (0.04)
Resource-Aware Hierarchical Federated Learning in Wireless Video Caching Networks
Pervej, Md Ferdous, Molisch, Andreas F.
Backhaul traffic congestion caused by the video traffic of a few popular files can be alleviated by storing the to-be-requested content at various levels in wireless video caching networks. Typically, content service providers (CSPs) own the content, and the users request their preferred content from the CSPs using their (wireless) internet service providers (ISPs). As these parties do not reveal their private information and business secrets, traditional techniques may not be readily used to predict the dynamic changes in users' future demands. Motivated by this, we propose a novel resource-aware hierarchical federated learning (RawHFL) solution for predicting user's future content requests. A practical data acquisition technique is used that allows the user to update its local training dataset based on its requested content. Besides, since networking and other computational resources are limited, considering that only a subset of the users participate in the model training, we derive the convergence bound of the proposed algorithm. Based on this bound, we minimize a weighted utility function for jointly configuring the controllable parameters to train the RawHFL energy efficiently under practical resource constraints. Our extensive simulation results validate the proposed algorithm's superiority, in terms of test accuracy and energy cost, over existing baselines.
- Information Technology > Security & Privacy (0.93)
- Education (0.87)
- Telecommunications (0.67)
Sample-Efficient Personalization: Modeling User Parameters as Low Rank Plus Sparse Components
Pal, Soumyabrata, Varshney, Prateek, Jain, Prateek, Thakurta, Abhradeep Guha, Madan, Gagan, Aggarwal, Gaurav, Shenoy, Pradeep, Srivastava, Gaurav
Personalization of machine learning (ML) predictions for individual users/domains/enterprises is critical for practical recommendation systems. Standard personalization approaches involve learning a user/domain specific embedding that is fed into a fixed global model which can be limiting. On the other hand, personalizing/fine-tuning model itself for each user/domain -- a.k.a meta-learning -- has high storage/infrastructure cost. Moreover, rigorous theoretical studies of scalable personalization approaches have been very limited. To address the above issues, we propose a novel meta-learning style approach that models network weights as a sum of low-rank and sparse components. This captures common information from multiple individuals/users together in the low-rank part while sparse part captures user-specific idiosyncrasies. We then study the framework in the linear setting, where the problem reduces to that of estimating the sum of a rank-$r$ and a $k$-column sparse matrix using a small number of linear measurements. We propose a computationally efficient alternating minimization method with iterative hard thresholding -- AMHT-LRS -- to learn the low-rank and sparse part. Theoretically, for the realizable Gaussian data setting, we show that AMHT-LRS solves the problem efficiently with nearly optimal sample complexity. Finally, a significant challenge in personalization is ensuring privacy of each user's sensitive data. We alleviate this problem by proposing a differentially private variant of our method that also is equipped with strong generalization guarantees.
Solving Linear Inverse Problems Provably via Posterior Sampling with Latent Diffusion Models
Rout, Litu, Raoof, Negin, Daras, Giannis, Caramanis, Constantine, Dimakis, Alexandros G., Shakkottai, Sanjay
We present the first framework to solve linear inverse problems leveraging pre-trained latent diffusion models. Previously proposed algorithms (such as DPS and DDRM) only apply to pixel-space diffusion models. We theoretically analyze our algorithm showing provable sample recovery in a linear model setting. The algorithmic insight obtained from our analysis extends to more general settings often considered in practice. Experimentally, we outperform previously proposed posterior sampling algorithms in a wide variety of problems including random inpainting, block inpainting, denoising, deblurring, destriping, and super-resolution.
- South America > Peru > Loreto Department (0.04)
- North America > United States > Virginia (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- Asia > Singapore (0.04)
- Workflow (0.67)
- Research Report (0.64)