Goto

Collaborating Authors

 arXiv.org Machine Learning


A Novel Algorithm for Personalized Federated Learning: Knowledge Distillation with Weighted Combination Loss

arXiv.org Machine Learning

Federated learning (FL) offers a privacy-preserving framework for distributed machine learning, enabling collaborative model training across diverse clients without centralizing sensitive data. However, statistical heterogeneity, characterized by non-independent and identically distributed (non-IID) client data, poses significant challenges, leading to model drift and poor generalization. This paper proposes a novel algorithm, pFedKD-WCL (Personalized Federated Knowledge Distillation with Weighted Combination Loss), which integrates knowledge distillation with bi-level optimization to address non-IID challenges. pFedKD-WCL leverages the current global model as a teacher to guide local models, optimizing both global convergence and local personalization efficiently. We evaluate pFedKD-WCL on the MNIST dataset and a synthetic dataset with non-IID partitioning, using multinomial logistic regression and multilayer perceptron models. Experimental results demonstrate that pFedKD-WCL outperforms state-of-the-art algorithms, including FedAvg, FedProx, Per-FedAvg, and pFedMe, in terms of accuracy and convergence speed.


A Consequentialist Critique of Binary Classification Evaluation Practices

arXiv.org Machine Learning

ML-supported decisions, such as ordering tests or determining preventive custody, often involve binary classification based on probabilistic forecasts. Evaluation frameworks for such forecasts typically consider whether to prioritize independent-decision metrics (e.g., Accuracy) or top-K metrics (e.g., Precision@K), and whether to focus on fixed thresholds or threshold-agnostic measures like AUC-ROC. We highlight that a consequentialist perspective, long advocated by decision theorists, should naturally favor evaluations that support independent decisions using a mixture of thresholds given their prevalence, such as Brier scores and Log loss. However, our empirical analysis reveals a strong preference for top-K metrics or fixed thresholds in evaluations at major conferences like ICML, FAccT, and CHIL. To address this gap, we use this decision-theoretic framework to map evaluation metrics to their optimal use cases, along with a Python package, briertools, to promote the broader adoption of Brier scores. In doing so, we also uncover new theoretical connections, including a reconciliation between the Brier Score and Decision Curve Analysis, which clarifies and responds to a longstanding critique by (Assel, et al. 2017) regarding the clinical utility of proper scoring rules.


Cramer-Rao Bounds for Laplacian Matrix Estimation

arXiv.org Machine Learning

Abstract--In this paper, we analyze the performance of the estimation of Laplacian matrices under general observatio n models. Laplacian matrix estimation involves structural c on-straints, including symmetry and null-space properties, a long with matrix sparsity. By exploiting a linear reparametriza tion that enforces the structural constraints, we derive closed -form matrix expressions for the Cram er-Rao Bound (CRB) specifically tailored to Laplacian matrix estimation. We further extend the derivation to the sparsity-constrained case, introduc ing two oracle CRBs that incorporate prior information of the suppo rt set, i.e. the locations of the nonzero entries in the Laplaci an matrix. We examine the properties and order relations betwe en the bounds, and provide the associated Slepian-Bangs formu la for the Gaussian case. We demonstrate the use of the new CRBs in three representative applications: (i) topology identi fication in power systems, (ii) graph filter identification in diffuse d models, and (iii) precision matrix estimation in Gaussian M arkov random fields under Laplacian constraints. The CRBs are eval - uated and compared with the mean-squared-errors (MSEs) of the constrained maximum likelihood estimator (CMLE), whic h integrates both equality and inequality constraints along with sparsity constraints, and of the oracle CMLE, which knows the locations of the nonzero entries of the Laplacian matrix . We perform this analysis for the applications of power syste m topology identification and graphical LASSO, and demonstra te that the MSEs of the estimators converge to the CRB and oracle CRB, given a sufficient number of measurements. Graph-structured data and signals arise in numerous applications, including power systems, communications, finance, social networks, and biological networks, for analysis and inference of networks [ 2 ], [ 3 ]. In this context, the Laplacian matrix, which captures node connectivity and edge weights, serves as a fundamental tool for clustering [ 4 ], modeling graph diffusion processes [ 5 ], [ 6 ], topology inference [ 6 ]-[ 12 ], anomaly detection [ 13 ], graph-based filtering [ 14 ]-[ 18 ], and analyzing smoothness on graphs [ 19 ]. M. Halihal and T. Routtenberg are with the School of Electric al and Computer Engineering, Ben-Gurion University of the Negev, Beer-Sheva 84105, Israel, e-mail: moradha@post.bgu.ac.il, tirzar@b gu.ac.il.


Randomised Postiterations for Calibrated BayesCG

arXiv.org Machine Learning

The Bayesian conjugate gradient method offers probabilistic solutions to linear systems but suffers from poor calibration, limiting its utility in uncertainty quantification tasks. Recent approaches leveraging postit-erations to construct priors have improved computational properties but failed to correct calibration issues. In this work, we propose a novel randomised postiteration strategy that enhances the calibration of the BayesCG posterior while preserving its favourable convergence characteristics. We present theoretical guarantees for the improved calibration, supported by results on the distribution of posterior errors. Numerical experiments demonstrate the efficacy of the method in both synthetic and inverse problem settings, showing enhanced uncertainty quantification and better propagation of uncertainties through computational pipelines.


Causal Inference Isn't Special: Why It's Just Another Prediction Problem

arXiv.org Machine Learning

Causal inference is often portrayed as fundamentally distinct from predictive modeling, with its own terminology, goals, and intellectual challenges. But at its core, causal inference is simply a structured instance of prediction under distribution shift. In both cases, we begin with labeled data from a source domain and seek to generalize to a target domain where outcomes are not observed. The key difference is that in causal inference, the labels -- potential outcomes -- are selectively observed based on treatment assignment, introducing bias that must be addressed through assumptions. This perspective reframes causal estimation as a familiar generalization problem and highlights how techniques from predictive modeling, such as reweighting and domain adaptation, apply directly to causal tasks. It also clarifies that causal assumptions are not uniquely strong -- they are simply more explicit. By viewing causal inference through the lens of prediction, we demystify its logic, connect it to familiar tools, and make it more accessible to practitioners and educators alike.


Randomised Splitting Methods and Stochastic Gradient Descent

arXiv.org Machine Learning

We explore an explicit link between stochastic gradient descent using common batching strategies and splitting methods for ordinary differential equations. From this perspective, we introduce a new minibatching strategy (called Symmetric Minibatching Strategy) for stochastic gradient optimisation which shows greatly reduced stochastic gradient bias (from $\mathcal{O}(h^2)$ to $\mathcal{O}(h^4)$ in the optimiser stepsize $h$), when combined with momentum-based optimisers. We justify why momentum is needed to obtain the improved performance using the theory of backward analysis for splitting integrators and provide a detailed analytic computation of the stochastic gradient bias on a simple example. Further, we provide improved convergence guarantees for this new minibatching strategy using Lyapunov techniques that show reduced stochastic gradient bias for a fixed stepsize (or learning rate) over the class of strongly-convex and smooth objective functions. Via the same techniques we also improve the known results for the Random Reshuffling strategy for stochastic gradient descent methods with momentum. We argue that this also leads to a faster convergence rate when considering a decreasing stepsize schedule. Both the reduced bias and efficacy of decreasing stepsizes are demonstrated numerically on several motivating examples.


CATS: Mitigating Correlation Shift for Multivariate Time Series Classification

arXiv.org Machine Learning

Unsupervised Domain Adaptation (UDA) leverages labeled source data to train models for unlabeled target data. Given the prevalence of multivariate time series (MTS) data across various domains, the UDA task for MTS classification has emerged as a critical challenge. However, for MTS data, correlations between variables often vary across domains, whereas most existing UDA works for MTS classification have overlooked this essential characteristic. To bridge this gap, we introduce a novel domain shift, {\em correlation shift}, measuring domain differences in multivariate correlation. To mitigate correlation shift, we propose a scalable and parameter-efficient \underline{C}orrelation \underline{A}dapter for M\underline{TS} (CATS). Designed as a plug-and-play technique compatible with various Transformer variants, CATS employs temporal convolution to capture local temporal patterns and a graph attention module to model the changing multivariate correlation. The adapter reweights the target correlations to align the source correlations with a theoretically guaranteed precision. A correlation alignment loss is further proposed to mitigate correlation shift, bypassing the alignment challenge from the non-i.i.d. nature of MTS data. Extensive experiments on four real-world datasets demonstrate that (1) compared with vanilla Transformer-based models, CATS increases over $10\%$ average accuracy while only adding around $1\%$ parameters, and (2) all Transformer variants equipped with CATS either reach or surpass state-of-the-art baselines.


Batch Bayesian Optimization for High-Dimensional Experimental Design: Simulation and Visualization

arXiv.org Machine Learning

Bayesian Optimization (BO) is increasingly used to guide experimental optimization tasks. To elucidate BO behavior in noisy and high-dimensional settings typical for materials science applications, we perform batch BO of two six-dimensional test functions: an Ackley function representing a needle-in-a-haystack problem and a Hartmann function representing a problem with a false maximum with a value close to the global maximum. We show learning curves, performance metrics, and visualization to effectively track the evolution of optimization in high dimensions and evaluate how they are affected by noise, batch-picking method, choice of acquisition function,and its exploration hyperparameter values. We find that the effects of noise depend on the problem landscape; therefore, prior knowledge of the domain structure and noise level is needed when designing BO. The Ackley function optimization is significantly degraded by noise with a complete loss of ground truth resemblance when noise equals 10 % of the maximum objective value. For the Hartmann function, even in the absence of noise, a significant fraction of the initial samplings identify the false maximum instead of the ground truth maximum as the optimum of the function; with increasing noise, BO remains effective, albeit with increasing probability of landing on the false maximum. This study systematically highlights the critical issues when setting up BO and choosing synthetic data to test experimental design. The results and methodology will facilitate wider utilization of BO in guiding experiments, specifically in high-dimensional settings.


Detection Limits and Statistical Separability of Tree Ring Watermarks in Rectified Flow-based Text-to-Image Generation Models

arXiv.org Machine Learning

Tree-Ring Watermarking is a significant technique for authenticating AI-generated images. However, its effectiveness in rectified flow-based models remains unexplored, particularly given the inherent challenges of these models with noise latent inversion. Through extensive experimentation, we evaluated and compared the detection and separability of watermarks between SD 2.1 and FLUX.1-dev models. By analyzing various text guidance configurations and augmentation attacks, we demonstrate how inversion limitations affect both watermark recovery and the statistical separation between watermarked and unwatermarked images. Our findings provide valuable insights into the current limitations of Tree-Ring Watermarking in the current SOTA models and highlight the critical need for improved inversion methods to achieve reliable watermark detection and separability. The official implementation, dataset release and all experimental results are available at this \href{https://github.com/dsgiitr/flux-watermarking}{\textbf{link}}.


The Ground Cost for Optimal Transport of Angular Velocity

arXiv.org Machine Learning

We revisit the optimal transport problem over angular velocity dynamics given by the controlled Euler equation. The solution of this problem enables stochastic guidance of spin states of a rigid body (e.g., spacecraft) over hard deadline constraint by transferring a given initial state statistics to a desired terminal state statistics. This is an instance of generalized optimal transport over a nonlinear dynamical system. While prior work has reported existence-uniqueness and numerical solution of this dynamical optimal transport problem, here we present structural results about the equivalent Kantorovich a.k.a. optimal coupling formulation. Specifically, we focus on deriving the ground cost for the associated Kantorovich optimal coupling formulation. The ground cost equals to the cost of transporting unit amount of mass from a specific realization of the initial or source joint probability measure to a realization of the terminal or target joint probability measure, and determines the Kantorovich formulation. Finding the ground cost leads to solving a structured deterministic nonlinear optimal control problem, which is shown to be amenable to an analysis technique pioneered by Athans et. al. We show that such techniques have broader applicability in determining the ground cost (thus Kantorovich formulation) for a class of generalized optimal mass transport problems involving nonlinear dynamics with translated norm-invariant drift.