Goto

Collaborating Authors

 Genre


GravityGraphSAGE: Link Prediction in Directed Attributed Graphs

arXiv.org Machine Learning

Link prediction (inferring missing or future connections between nodes in a graph) is a fundamental problem in network science with widespread applications in, e.g., biological systems, recommender systems, finance and cybersecurity. The ability to accurately predict links has significant real-world applications, such as detecting fraudulent financial transactions or identifying drug-target interactions in biomedicine. Despite a rich literature, link prediction is still challenging, especially for graphs enriched with information on edges (direction) and nodes (attributes). In fact, research on link prediction, especially the one based on Graph Deep Learning (GDL), has mostly focused on undirected graphs, without fully leveraging node attributes. Here, we fill this gap by proposing Gravity-GraphSAGE (GG-SAGE), a modified version of GraphSAGE, a GDL model for node embeddings, composed of a gravity-inspired decoder. This implementation is the first example in the literature of a GraphSAGE backbone adopted for directed link prediction. Using the benchmark datasets Cora, Citeseer, PubMed and 16 real-world graphs from the online Netzschleuder repository, we show that our proposed model outperforms state-of-the-art GDL link prediction techniques. Using further experimental evidence, we relate the quality of the output of our model with various characteristics of the graph, suggesting that our framework scales well when applied to data of increasing complexity.


Inverse Design for Conditional Distribution Matching

arXiv.org Machine Learning

Generative models are powerful tools for sampling from a learned distribution $\mathcal{P}(Y \mid X)$, and inverse-design methods invert this map to find an input $x$ that produces a desired point output $y^*$. However, many design goals are naturally distributional rather than pointwise, incorporating the inherent uncertainty of $Y$ and targeting a specific form for it, a task not addressed by standard inverse design. To address this issue we introduce Conditional Distribution Matching (CDM), a new inverse-design problem class in generative modeling: given a joint distribution $\mathcal{P}(X, Y)$ and a target distribution $\mathcal{G}(Y)$, find an input $x^*$ whose induced conditional distribution $\mathcal{P}(Y \mid X = x^*)$ matches $\mathcal{G}$. We formally define two variants: Conditional Distribution Matching Sampling (CDMS) and Conditional Distribution Matching Optimization (CDMO). To solve these problems, we propose MLGD-F (Matching-Loss Guided Diffusion with a Fast inner sampler), a plug-and-play inference-time algorithm that combines a pretrained score-based diffusion model with a pretrained fast conditional sampler, requiring no additional training or fine-tuning. By leveraging single-step conditional sampling, MLGD-F enables tractable gradient computation, making the estimation of $\mathcal{P}(Y \mid X)$ both memory-efficient and computationally lightweight. We validate MLGD-F on synthetic benchmarks, structured image transformations, and generative editing optimization, demonstrating reliable recovery of inputs whose conditional distributions match diverse user-specified targets, including discrete mixtures and continuous low-rank supports.


Optimal Regret for Single Index Bandits

arXiv.org Machine Learning

We study the $\textit{single-index bandit}$ problem, where rewards depend on an unknown one-dimensional projection of high-dimensional contexts through an unknown reward function. This model extends linear and generalized linear bandits to a nonparametric setting, and is particularly relevant when the reward function is not known in advance. While optimal regret guarantees are known for monotone reward functions, the general non-monotone case remains poorly understood, with the best known bound being $\tilde{\mathcal{O}}(T^{3/4})$ (under standard boundedness and Lipschitz assumptions on the reward function [Kang et al., 2025]). We close this gap by establishing the optimal regret for general single-index bandits. We propose a simple two-phase algorithm, namely, Zoomed Single Index Bandit with Upper Confidence Bound ($\texttt{ZoomSIB-UCB}$), that first estimates the projection direction via a normalized Stein estimator, and then reduces the problem to a one-dimensional bandit using discretization and finally use UCB. This approach achieves a regret of $\tilde{\mathcal{O}}(T^{2/3})$, and improves significantly upon prior work without any additional assumptions. We also prove a matching minimax lower bound of $\tildeฮฉ(T^{2/3})$, showing that the upper bound is essentially tight. Our upper and lower bounds together provide a sharp characterization of the regret in single-index bandits. Moreover, the empirical results further demonstrate the effectiveness and robustness of our approach.


Quantitative Local Convergence of Mean-Field Stein Variational Gradient Flow

arXiv.org Machine Learning

Stein Variational Gradient Descent (SVGD), introduced in [LW16], is a deterministic interactingparticle method for sampling from a target probability measure ฯ€ e V, only requiring access to V. In the mean-field and continuous-time limit, the distribution of particles converges to a flow (ฯt) in the space of probability measures that solves a variant of the Fokker-Planck equation with a velocity field smoothed by weighted convolution with a positive definite kernel [LLN19]. This flow can be interpreted as the gradient flow of the relative entropy H( |ฯ€) with respect to a "kernelized" Wasserstein metric [Liu17]. The goal of this paper is to investigate the convergence of (ฯt) towards ฯ€. To this end, we focus on the model case of Riesz kernels of order s on the d-dimensional torus Td. This is a family of translation-invariant kernels whose Fourier coefficients decay as |ฮพ| 2s. The parameter s hence directly controls the "smoothing strength" of the interaction; in particular, continuous kernels correspond to s > d/2, C1 kernels to s > (d+1)/2, and C2 kernels to s > (d+2)/2. What is known: qualitative weak convergence The starting point of convergence analyses is the energy dissipation formula [Liu17] d dt H(ฯt|ฯ€) = Is(ฯt|ฯ€), (1.1) Authors are listed in alphabetical order.


Proximal Path-Specific Inference

arXiv.org Machine Learning

Mediation analysis (Robins & Greenland 1992, Pearl 2001, Imai, Keele & Tingley 2010, Tchetgen Tchetgen & Shpitser 2012) provides a principled framework for investigating causal mechanisms by decomposing the effect of a treatment A on an outcome Y into pathways operating through a mediator of interest M. Classical mediation analysis focuses on the natural indirect effect, corresponding to the pathway from Ato Y through M, and the natural direct effect, corresponding to pathways not through M. These estimands are well understood when a single mediator is present and strong identification assumptions hold. However, in many applications, there exist multiple intermediate variables between treatment and outcome. In such settings, conventional mediation analysis typically requires the absence of treatment-induced mediator-outcome confounders--often referred to as recanting witnesses--as well as the absence of unmeasured confounding. Under these circumstances, commonly used identification assumptions such as sequential ignorability (Imai, Keele & Yamamoto 2010) or nonparametric structural equation models with independent errors (NPSEM-IE) (Pearl 2009) no longer suffice to identify natural indirect effects (Avin et al. 2005, Tchetgen Tchetgen & VanderWeele 2014). Figure 1 illustrates this issue: the recanting witness D is directly affected by A and simultaneously confounds the relationship between M and Y. Such treatment-induced confounding is common in epidemiologic studies, particularly when the mediator of interest occurs long after the treatment initiation (Robins 1999). A motivating example arises in studies of preterm birth. Mediation analysis has been widely used to explore whether adequate prenatal care (A) reduces the risk of preterm birth (Y) through preeclampsia (M) (Vansteelandt & VanderWeele 2012, VanderWeele et al. 2014, Xia & Chan 2023).


Empirical Bayes 1-bit matrix completion

arXiv.org Machine Learning

Matrix completion is a fundamental problem in machine learning, where the objective is to recover missing entries of a partially observed matrix. A prominent example is the Netflix Prize (Bennett and Lanning, 2007), which involved predicting a matrix of movie ratings by users for recommendation purposes. Beyond recommendation, matrix completion has recently found applications in causal inference for panel data (Athey et al., 2021). A standard assumption in matrix completion is that the underlying matrix is approximately low-rank, reflecting a few latent factors that govern interactions between rows and columns. A substantial body of work has established theoretical guarantees and developed efficient algorithms for matrix completion (Cai, Cand`es and Shen, 2010; Cand`es and Recht, 2008; Keshavan, Montanari, and Oh, 2010; Mazumder, Hastie and Tibshirani, 2010; Recht, 2011), predominantly focusing on cases where the observed entries are continuous-valued. In many applications, however, observations are not continuous-valued but binary.


Learning stochastic multiscale models through normalizing flows

arXiv.org Machine Learning

Many systems in physics, engineering, and biology exhibit multiscale stochastic dynamics, where low-dimensional slow variables evolve under the influence of high-dimensional fast processes. In practice, observations are often limited to a single trajectory of the slow component, while the fast dynamics remain unobserved, making statistical learning challenging. Approaches based on partial differential equations (PDE), such as Fokker-Planck formulations, aim to characterize the evolution of probability densities, typically requiring dense space-time data or grid-based solvers. In contrast, we adopt a trajectory-based perspective and develop a data-driven framework for learning effective stochastic dynamics from a single observed path. We model the dynamics by coupled multiscale stochastic differential equations (SDEs) and first obtain a principled model reduction through stochastic averaging. Unlike generic model reduction techniques such as PCA, this respects the dynamical structure of the original system and explicitly incorporates the interaction between slow and fast scales. A central challenge, however, is that the reduced model depends on the invariant distribution of the fast process, which is a solution to an intractable and often unknown PDE. We introduce a novel learning framework that parameterizes the invariant distribution using normalizing flows, enabling expressive density modeling in the latent fast-variable space. The flow is trained end-to-end by optimizing a penalized likelihood objective induced by the reduced stochastic dynamics. Furthermore, we develop a Bayesian variational inference procedure for uncertainty quantification, employing a second normalizing flow to approximate the posterior distribution over model parameters. This yields a scalable approach to capturing epistemic uncertainty in multiscale systems.


Accelerating Power Method with Fast Sketching for Stronger Low-Rank Approximation

arXiv.org Machine Learning

The power method is one of the most fundamental tools for extracting top principal components from data through low-rank matrix approximation. Yet, when the target rank is large, the cost of matrix multiplication associated with this procedure becomes a major bottleneck. We develop an algorithmic and theoretical framework for accelerating the power method using fast sketching, which is a popular paradigm in randomized linear algebra. Our framework leads to simple and provably efficient methods for singular value decomposition, low-rank factorization, and Nystrรถm approximation, which attain strong numerical performance on benchmark problems. The key novelty in our analysis is the use of regularized spectral approximation, a property of fast sketching methods which proves more flexible in generalizing power method guarantees than traditional arguments.


On Uniform Error Bounds for Kernel Regression under Non-Gaussian Noise

arXiv.org Machine Learning

Providing non-conservative uncertainty quantification for function estimates derived from noisy observations remains a fundamental challenge in statistical machine learning, particularly for applications in safety-critical domains. In this work, we propose novel non-asymptotic probabilistic uniform error bounds for kernel-based regression. Compared to related bounds in the literature that are restricted to (conditionally) independent sub-Gaussian noise, our bounds allow to consider a broad class of non-Gaussian distributions, such as sub-Gaussian, bounded, sub-exponential, and variance/moment-bounded noise. Moreover, our results apply to correlated and uncorrelated noise. We compare our proposed error bounds with existing results in terms of the induced uncertainty region and their performance in safe control, demonstrating the tightness of the proposed bounds.


Unified Approach for Weakly Supervised Multicalibration

arXiv.org Machine Learning

Multicalibration requires predicted scores to agree with label probabilities across rich families of subgroups and score-dependent tests, but existing methods require clean input-label pairs for evaluation and post-processing. This assumption fails in weakly supervised learning (WSL) regimes -- including positive-unlabeled, unlabeled-unlabeled, and positive-confidence learning -- where clean labels are costly or unavailable even though reliable uncertainty estimates may be crucial. We address this gap by developing estimators of multicalibration error and post-hoc correction methods for WSL settings in which clean input-label pairs are unavailable. We propose a unified framework for estimating and correcting multicalibration under weak supervision by combining contamination-matrix risk rewrites with witness-based calibration constraints, yielding corrected multicalibration moments with finite-sample guarantees. We further propose weak-label multicalibration boost (WLMC), a generic post-hoc recalibration algorithm under weak supervision. Finally, we conduct experiments across multiple weak-supervision settings to evaluate multicalibration behavior and offer empirical insight into uncertainty estimation under weak supervision.