Goto

Collaborating Authors

Bayesian Semiparametric Causal Inference: Targeted Doubly Robust Estimation of Treatment Effects

Sert, Gözde, Chakrabortty, Abhishek, Bhattacharya, Anirban

arXiv.org Machine Learning

We propose a semiparametric Bayesian methodology for estimating the average treatment effect (ATE) within the potential outcomes framework using observational data with high-dimensional nuisance parameters. Our method introduces a Bayesian debiasing procedure that corrects for bias arising from nuisance estimation and employs a targeted modeling strategy based on summary statistics rather than the full data. These summary statistics are identified in a debiased manner, enabling the estimation of nuisance bias via weighted observables and facilitating hierarchical learning of the ATE. By combining debiasing with sample splitting, our approach separates nuisance estimation from inference on the target parameter, reducing sensitivity to nuisance model specification. We establish that, under mild conditions, the marginal posterior for the ATE satisfies a Bernstein-von Mises theorem when both nuisance models are correctly specified and remains consistent and robust when only one is correct, achieving Bayesian double robustness. This ensures asymptotic efficiency and frequentist validity. Extensive simulations confirm the theoretical results, demonstrating accurate point estimation and credible intervals with nominal coverage, even in high-dimensional settings. The proposed framework can also be extended to other causal estimands, and its key principles offer a general foundation for advancing Bayesian semiparametric inference more broadly.


Angular Graph Fractional Fourier Transform: Theory and Application

Zhao, Feiyue, He, Yangfan, Zhang, Zhichao

arXiv.org Machine Learning

Graph spectral representations are fundamental in graph signal processing, offering a rigorous framework for analyzing and processing graph-structured data. The graph fractional Fourier transform (GFRFT) extends the classical graph Fourier transform (GFT) with a fractional-order parameter, enabling flexible spectral analysis while preserving mathematical consistency. The angular graph Fourier transform (AGFT) introduces angular control via GFT eigenvector rotation; however, existing constructions fail to degenerate to the GFT at zero angle, which is a critical flaw that undermines theoretical consistency and interpretability. To resolve these complementary limitations - GFRFT's lack of angular regulation and AGFT's defective degeneracy - this study proposes an angular GFRFT (AGFRFT), a unified framework that integrates fractional-order and angular spectral analyses with theoretical rigor. A degeneracy-friendly rotation matrix family ensures exact GFT degeneration at zero angle, with two AGFRFT variants (I-AGFRFT and II-AGFRFT) defined accordingly. Rigorous theoretical analyses confirm their unitarity, invertibility, and smooth parameter dependence. Both support learnable joint parameterization of the angle and fractional order, enabling adaptive spectral processing for diverse graph signals. Extensive experiments on real-world data denoising, image denoising, and point cloud denoising demonstrate that AGFRFT outperforms GFRFT and AGFT in terms of spectral concentration, reconstruction quality, and controllable spectral manipulation, establishing a robust and flexible tool for integrated angular fractional spectral analysis in graph signal processing.


Optimal Fairness under Local Differential Privacy

Ghoukasian, Hrad, Asoodeh, Shahab

arXiv.org Machine Learning

We investigate how to optimally design local differential privacy (LDP) mechanisms that reduce data unfairness and thereby improve fairness in downstream classification. We first derive a closed-form optimal mechanism for binary sensitive attributes and then develop a tractable optimization framework that yields the corresponding optimal mechanism for multi-valued attributes. As a theoretical contribution, we establish that for discrimination-accuracy optimal classifiers, reducing data unfairness necessarily leads to lower classification unfairness, thus providing a direct link between privacy-aware pre-processing and classification fairness. Empirically, we demonstrate that our approach consistently outperforms existing LDP mechanisms in reducing data unfairness across diverse datasets and fairness metrics, while maintaining accuracy close to that of non-private models. Moreover, compared with leading pre-processing and post-processing fairness methods, our mechanism achieves a more favorable accuracy-fairness trade-off while simultaneously preserving the privacy of sensitive attributes. Taken together, these results highlight LDP as a principled and effective pre-processing fairness intervention technique.


Approximation rates of quantum neural networks for periodic functions via Jackson's inequality

Neufeld, Ariel, Schmocker, Philipp, Tran, Viet Khoa

arXiv.org Machine Learning

Quantum neural networks (QNNs) are an analog of classical neural networks in the world of quantum computing, which are represented by a unitary matrix with trainable parameters. Inspired by the universal approximation property of classical neural networks, ensuring that every continuous function can be arbitrarily well approximated uniformly on a compact set of a Euclidean space, some recent works have established analogous results for QNNs, ranging from single-qubit to multi-qubit QNNs, and even hybrid classical-quantum models. In this paper, we study the approximation capabilities of QNNs for periodic functions with respect to the supremum norm. We use the Jackson inequality to approximate a given function by implementing its approximating trigonometric polynomial via a suitable QNN. In particular, we see that by restricting to the class of periodic functions, one can achieve a quadratic reduction of the number of parameters, producing better approximation results than in the literature. Moreover, the smoother the function, the fewer parameters are needed to construct a QNN to approximate the function.


Failure of uniform laws of large numbers for subdifferentials and beyond

Tian, Lai, Royset, Johannes O.

arXiv.org Machine Learning

We provide counterexamples showing that uniform laws of large numbers do not hold for subdifferentials under natural assumptions. Our results apply to random Lipschitz functions and random convex functions with a finite number of smooth pieces. Consequently, they resolve the questions posed by Shapiro and Xu [J. Math. Anal. Appl., 325(2), 2007] in the negative and highlight the obstacles nonsmoothness poses to uniform results.


Spectral Identifiability for Interpretable Probe Geometry

Huang, William Hao-Cheng

arXiv.org Machine Learning

Linear probes are widely used to interpret and evaluate neural representations, yet their reliability remains unclear, as probes may appear accurate in some regimes but collapse unpredictably in others. We uncover a spectral mechanism behind this phenomenon and formalize it as the Spectral Identifiability Principle (SIP), a verifiable Fisher-inspired condition for probe stability. When the eigengap separating task-relevant directions is larger than the Fisher estimation error, the estimated subspace concentrates and accuracy remains consistent, whereas closing this gap induces instability in a phase-transition manner. Our analysis connects eigengap geometry, sample size, and misclassification risk through finite-sample reasoning, providing an interpretable diagnostic rather than a loose generalization bound. Controlled synthetic studies, where Fisher quantities are computed exactly, confirm these predictions and show how spectral inspection can anticipate unreliable probes before they distort downstream evaluation.


Improving Iterative Gaussian Processes via Warm Starting Sequential Posteriors

Dong, Alan Yufei, Lin, Jihao Andreas, Hernández-Lobato, José Miguel

arXiv.org Machine Learning

Scalable Gaussian process (GP) inference is essential for sequential decision-making tasks, yet improving GP scalability remains a challenging problem with many open avenues of research. This paper focuses on iterative GPs, where iterative linear solvers, such as conjugate gradients, stochastic gradient descent or alternative projections, are used to approximate the GP posterior. We propose a new method which improves solver convergence of a large linear system by leveraging the known solution to a smaller system contained within. This is significant for tasks with incremental data additions, and we show that our technique achieves speed-ups when solving to tolerance, as well as improved Bayesian optimisation performance under a fixed compute budget.


ECPv2: Fast, Efficient, and Scalable Global Optimization of Lipschitz Functions

Fourati, Fares, Alouini, Mohamed-Slim, Aggarwal, Vaneet

arXiv.org Machine Learning

We propose ECPv2, a scalable and theoretically grounded algorithm for global optimization of Lipschitz-continuous functions with unknown Lipschitz constants. Building on the Every Call is Precious (ECP) framework, which ensures that each accepted function evaluation is potentially informative, ECPv2 addresses key limitations of ECP, including high computational cost and overly conservative early behavior. ECPv2 introduces three innovations: (i) an adaptive lower bound to avoid vacuous acceptance regions, (ii) a Worst-m memory mechanism that restricts comparisons to a fixed-size subset of past evaluations, and (iii) a fixed random projection to accelerate distance computations in high dimensions. We theoretically show that ECPv2 retains ECP's no-regret guarantees with optimal finite-time bounds and expands the acceptance region with high probability. We further empirically validate these findings through extensive experiments and ablation studies. Using principled hyperparameter settings, we evaluate ECPv2 across a wide range of high-dimensional, non-convex optimization problems. Across benchmarks, ECPv2 consistently matches or outperforms state-of-the-art optimizers, while significantly reducing wall-clock time.


Collaborative Recurrent Autoencoder: Recommend while Learning to Fill in the Blanks

Neural Information Processing Systems

Hybrid methods that utilize both content and rating information are commonly used in many recommender systems. However, most of them use either handcrafted features or the bag-of-words representation as a surrogate for the content information but they are neither effective nor natural enough. To address this problem, we develop a collaborative recurrent autoencoder (CRAE) which is a denoising recurrent autoencoder (DRAE) that models the generation of content sequences in the collaborative filtering (CF) setting.


Conditional Generative Moment-Matching Networks

Neural Information Processing Systems

Maximum mean discrepancy (MMD) has been successfully applied to learn deep generative models for characterizing a joint distribution of variables via kernel mean embedding. In this paper, we present conditional generative moment-matching networks (CGMMN), which learn a conditional distribution given some input variables based on a conditional maximum mean discrepancy (CMMD) criterion. The learning is performed by stochastic gradient descent with the gradient calculated by back-propagation. We evaluate CGMMN on a wide range of tasks, including predictive modeling, contextual generation, and Bayesian dark knowledge, which distills knowledge from a Bayesian model by learning a relatively small CGMMN student network. Our results demonstrate competitive performance in all the tasks.