Goto

Collaborating Authors

 affinity


Turning the Tables: Enabling Backward Transfer via Causal-Aware LoRA in Continual Learning

Neural Information Processing Systems

Current parameter-efficient fine-tuning (PEFT) methods have shown superior performance in continual learning. However, most existing PEFT-based methods focus on mitigating catastrophic forgetting by limiting modifications to the old task model caused by new tasks. This hinders backward knowledge transfer, as when new tasks have a strong positive correlation with old tasks, appropriately training on new tasks can transfer beneficial knowledge to old tasks. Critically, achieving backward knowledge transfer faces two fundamental challenges: (1) some parameters may be ineffective on task performance, which constrains the task solution space and model capacity; (2) since old task data are inaccessible, modeling task correlation via shared data is infeasible. To address these challenges, we propose CaLoRA, a novel causal-aware low-rank adaptation framework that is the first PEFT-based continual learning work with backward knowledge transfer. Specifically, we first propose parameter-level counterfactual attribution (PaCA) that estimates the causal effect of LoRA parameters via counterfactual reasoning, identifying effective parameters from a causal view. Second, we propose cross-task gradient adaptation (CaGA) to quantify task correlation by gradient projection and evaluate task affinity based on gradient similarity. By incorporating causal effect, task correlation, and affinity, CaGA adaptively adjusts task gradients, facilitating backward knowledge transfer without relying on data replay. Extensive experiments across multiple benchmarks and continual learning settings show that CaLoRA outperforms stateof-the-art methods.


Towards Precision Protein-Ligand Affinity Prediction Benchmark: AComplete and Modification-Aware DAVISDataset

Neural Information Processing Systems

Advancements in AI for science unlocks capabilities for critical drug discovery tasks such as protein-ligand binding affinity prediction. However, current models overfit to existing oversimplified datasets that does not represent naturally occurring and biologically relevant proteins with modifications. In this work, we curate a complete and modification-aware version of the widely used DAVIS dataset by incorporating 4,032 kinase-ligand pairs involving substitutions, insertions, deletions, and phosphorylation events. This enriched dataset enables benchmarking of predictive models under biologically realistic conditions. Based on this new dataset, we propose three benchmark settings--Augmented Dataset Prediction, Wild-Type to Modification Generalization, and Few-Shot Modification Generalization--designed to assess model robustness in the presence of protein modifications. Through extensive evaluation of both docking-free and docking-based methods, we find that docking-based model generalize better in zero-shot settings. In contrast, docking-free models tend to overfit to wild-type proteins and struggle with unseen modifications but show notable improvement when fine-tuned on a small set of modified examples. We anticipate that the curated dataset and benchmarks offer a valuable foundation for developing models that better generalize to protein modifications, ultimately advancing precision medicine in drug discovery.


Diffusion Operator Geometry of Feedforward Representations

arXiv.org Machine Learning

Neural networks transform data through learned representations whose geometry affects separation, contraction, and generalization. Recent work studies this geometry using discrete curvature on neighborhood graphs, suggesting Ricci-flow-like behavior across layers. We develop a smooth operator-theoretic alternative for feedforward representation snapshots. Each feature cloud induces a Gaussian-kernel diffusion Markov operator, and transport, spectral, label-boundary, and local-scale observables are derived from this single object via Bakry-Emery $ฮ“$-calculus. In a balanced Gaussian class-conditional snapshot model with shared covariance, the population operator has closed-form class affinities, leakage, and coarse spectra, all controlled by pairwise regularized Mahalanobis separations $c_\varepsilon^{(a,b)}$. We also prove that the resulting operator observables vary smoothly under feature perturbations, while hard neighborhood-graph diagnostics can change discontinuously. Synthetic experiments validate the closed-form Gaussian bridge, while learned MNIST experiments show that the same operator observables track training, width, and perturbation stability. Together, these results give a stable operator-geometric framework for analyzing feedforward representation geometry.


Biconvex Biclustering

arXiv.org Machine Learning

This article proposes a biconvex modification to convex biclustering in order to improve its performance in high-dimensional settings. In contrast to heuristics that discard a subset of noisy features a priori, our method jointly learns and accordingly weighs informative features while discovering biclusters. Moreover, the method is adaptive to the data, and is accompanied by an efficient algorithm based on proximal alternating minimization, complete with detailed guidance on hyperparameter tuning and efficient solutions to optimization subproblems. These contributions are theoretically grounded; we establish finite-sample bounds on the objective function under sub-Gaussian errors, and generalize these guarantees to cases where input affinities need not be uniform. Extensive simulation results reveal our method consistently recovers underlying biclusters while weighing and selecting features appropriately, outperforming peer methods. An application to a gene microarray dataset of lymphoma samples recovers biclusters matching an underlying classification, while giving additional interpretation to the mRNA samples via the column groupings and fitted weights.


Subspace Clustering via Tangent Cones

Neural Information Processing Systems

Given samples lying on any of a number of subspaces, subspace clustering is the task of grouping the samples based on the their corresponding subspaces. Many subspace clustering methods operate by assigning a measure of affinity to each pair of points and feeding these affinities into a graph clustering algorithm. This paper proposes a new paradigm for subspace clustering that computes affinities based on the corresponding conic geometry. The proposed conic subspace clustering (CSC) approach considers the convex hull of a collection of normalized data points and the corresponding tangent cones. The union of subspaces underlying the data imposes a strong association between the tangent cone at a sample $x$ and the original subspace containing $x$. In addition to describing this novel geometric perspective, this paper provides a practical algorithm for subspace clustering that leverages this perspective, where a tangent cone membership test is used to estimate the affinities. This algorithm is accompanied with deterministic and stochastic guarantees on the properties of the learned affinity matrix, on the true and false positive rates and spread, which directly translate into the overall clustering accuracy.


Affinity Clustering: Hierarchical Clustering at Scale

Neural Information Processing Systems

Graph clustering is a fundamental task in many data-mining and machine-learning pipelines. In particular, identifying a good hierarchical structure is at the same time a fundamental and challenging problem for several applications. The amount of data to analyze is increasing at an astonishing rate each day. Hence there is a need for new solutions to efficiently compute effective hierarchical clusterings on such huge data. The main focus of this paper is on minimum spanning tree (MST) based clusterings. In particular, we propose affinity, a novel hierarchical clustering based on Boruvka's MST algorithm. We prove certain theoretical guarantees for affinity (as well as some other classic algorithms) and show that in practice it is superior to several other state-of-the-art clustering algorithms.