Goto

Collaborating Authors

 thiscompletestheproof


FL-Sailer: Efficient and Privacy-Preserving Federated Learning for Scalable Single-Cell Epigenetic Data Analysis via Adaptive Sampling

arXiv.org Machine Learning

Single-cell ATAC-seq (scATAC-seq) enables high-resolution mapping of chromatin accessibility, yet privacy regulations and data size constraints hinder multi-institutional sharing. Federated learning (FL) offers a privacy-preserving alternative, but faces three fundamental barriers in scATAC-seq analysis: ultra-high dimensionality, extreme sparsity, and severe cross-institutional heterogeneity. We propose FL-Sailer, the first FL framework designed for scATAC-seq data. FL-Sailer integrates two key innovations: (i) adaptive leverage score sampling, which selects biologically interpretable features while reducing dimensionality by 80%, and (ii) an invariant VAE architecture, which disentangles biological signals from technical confounders via mutual information minimization. We provide a convergence guarantee, showing that FL-Sailer converges to an approximate solution of the original high-dimensional problem with bounded error. Extensive experiments on synthetic and real epigenomic datasets demonstrate that FL-Sailer not only enables previously infeasible multi-institutional collaborations but also surpasses centralized methods by leveraging adaptive sampling as an implicit regularizer to suppress technical noise. Our work establishes that federated learning, when tailored to domain-specific challenges, can become a superior paradigm for collaborative epigenomic research.


Estimating heterogeneous treatment effects with survival outcomes via a deep survival learner

arXiv.org Machine Learning

Estimating heterogeneous treatment effects in survival settings is complicated by right censoring as well as the time-varying nature of the estimand. While the conditional average treatment effect (CATE) provides a natural target, most existing approaches focus on a single prespecified time point and do not account for the temporal trajectory, leading to instability in estimation. We propose a deep survival learner (DSL) for estimating heterogeneous treatment effects with right-censored outcomes. The method is based on a doubly robust pseudo-outcome whose conditional expectation identifies time-specific CATEs under standard assumptions. This construction remains unbiased if either the outcome model or the treatment assignment model is correctly specified, when properly accounting for censoring. To estimate CATEs over a clinically relevant time spectrum, DSL employs a multi-output deep neural network with shared representations, enabling joint estimation of treatment effect trajectories. From a theoretical perspective, we derive error bounds for both pointwise and joint estimation over time. We show that joint estimation can leverage temporal structure to control estimation error without incurring much additional approximation cost under smoothness conditions, leading to improved stability relative to separate estimation. Cross-fitting is incorporated to reduce overfitting and mitigate bias arising from flexible nuisance estimation. Simulation studies demonstrate favorable finite-sample performance, particularly under nuisance model misspecification. Applied to the Boston Lung Cancer Study, DSL reveals heterogeneity in the effects of perioperative chemotherapy across patient characteristics and over time.


MuonEq: Balancing Before Orthogonalization with Lightweight Equilibration

arXiv.org Machine Learning

Orthogonalized-update optimizers such as Muon improve training of matrix-valued parameters, but existing extensions mostly act either after orthogonalization by rescaling updates or before it with heavier whitening-based preconditioners. We introduce {\method}, a lightweight family of pre-orthogonalization equilibration schemes for Muon in three forms: two-sided row/column normalization (RC), row normalization (R), and column normalization (C). These variants rebalance the momentum matrix before finite-step Newton--Schulz using row/column squared-norm statistics and only $\mathcal{O}(m+n)$ auxiliary state. We show that finite-step orthogonalization is governed by input spectral properties, especially stable rank and condition number, and that row/column normalization is a zeroth-order whitening surrogate that removes marginal scale mismatch. For the hidden matrix weights targeted by {\method}, the row-normalized variant R is the natural default and preserves the $\widetilde{\mathcal{O}}(T^{-1/4})$ stationarity guarantee of Muon-type methods. In LLaMA2 pretraining on C4, the default R variant consistently outperforms Muon on 130M and 350M models, yielding faster convergence and lower validation perplexity.


AlleviateAnchor-Shift: ExploreBlindSpotswith Cross-ViewReconstructionforIncompleteMulti-View Clustering

Neural Information Processing Systems

Despite efficiencyimprovements, existing methods overlook themisguidance in anchors learning induced by partial missing samples,i.e., the absence of samples results in shift of learned anchors, further leading to sub-optimal clustering performance.


NeuS: LearningNeuralImplicitSurfaces byVolumeRenderingforMulti-viewReconstruction-SupplementaryMaterial-ADerivationforComputingOpacityฮฑi

Neural Information Processing Systems

Next consider the case where[ti,ti+1] lies in a range[t`,tr] over which the camera ray is exiting the surface, i.e. the signed distance function is increasing onp(t) over [t`,tr]. Then we have ( f(p(t)) v) < 0 in [ti,ti+1]. Then, according to Eqn. 1, we haveฯ(t) = 0. Therefore, by Eqn.12ofthepaper,wehave ฮฑi=1 exp Recall that our S-density fieldฯ†s(f(x)) is defined using the logistic density functionฯ†s(x) = se sx/(1+e sx)2, which is the derivative of the Sigmoid functionฮฆs(x) = (1+e sx) 1, i.e. ฯ†s(x)=ฮฆ0s(x). As a first-order approximation of signed distance functionf, suppose that locally the surface is tangentially approximated byasufficiently small planar patch with itsoutwardunitnormal vector denotedas n. Nowsupposep(t)isapoint on the surfaceS,that is, f(p(t)) = 0. Next we will examine the value ofdwdt(t) at t = t . Thesigneddistancefunction f ismodeledbyanMLP that consists of 8hidden layers with hidden size of 256.



7274ed909a312d4d869cc328ad1c5f04-Supplemental-Conference.pdf

Neural Information Processing Systems

Machine learned models are increasingly entering wider ranges ofdomains inour lives, driving a constantly increasing number of important systems. Large scale systems can be trained in highly parallel and distributed training environments, with a large amount of randomness in training the models.




2 Frameworkandassumptions 2.1 Stochasticoptimizationundertimedrift ThroughoutSections2-4,weconsiderthesequenceofstochasticoptimizationproblems min

Neural Information Processing Systems

Our results concisely explain the interplay between the learning rate, the noise variance in the gradient oracle, and the strength ofthetime drift. The high-probability results merely assume that thegradient noise and time drift have light tails. Moreover, none of the results require the objectives to have bounded domains.