Goto

Collaborating Authors

 Oceania


Theoretical Performance Guarantees for Partial Domain Adaptation via Partial Optimal Transport

arXiv.org Machine Learning

In many scenarios of practical interest, labeled data from a target distribution are scarce while labeled data from a related source distribution are abundant. One particular setting of interest arises when the target label space is a subset of the source label space, leading to the framework of partial domain adaptation (PDA). Typical approaches to PDA involve minimizing a domain alignment term and a weighted empirical loss on the source data, with the aim of transferring knowledge between domains. However, a theoretical basis for this procedure is lacking, and in particular, most existing weighting schemes are heuristic. In this work, we derive generalization bounds for the PDA problem based on partial optimal transport. These bounds corroborate the use of the partial Wasserstein distance as a domain alignment term, and lead to theoretically motivated explicit expressions for the empirical source loss weights. Inspired by these bounds, we devise a practical algorithm for PDA, termed WARMPOT. Through extensive numerical experiments, we show that WARMPOT is competitive with recent approaches, and that our proposed weights improve on existing schemes.


On the Need to Align Intent and Implementation in Uncertainty Quantification for Machine Learning

arXiv.org Machine Learning

Quantifying uncertainties for machine learning (ML) models is a foundational challenge in modern data analysis. This challenge is compounded by at least two key aspects of the field: (a) inconsistent terminology surrounding uncertainty and estimation across disciplines, and (b) the varying technical requirements for establishing trustworthy uncertainties in diverse problem contexts. In this position paper, we aim to clarify the depth of these challenges by identifying these inconsistencies and articulating how different contexts impose distinct epistemic demands. We examine the current landscape of estimation targets (e.g., prediction, inference, simulation-based inference), uncertainty constructs (e.g., frequentist, Bayesian, fiducial), and the approaches used to map between them. Drawing on the literature, we highlight and explain examples of problematic mappings. To help address these issues, we advocate for standards that promote alignment between the \textit{intent} and \textit{implementation} of uncertainty quantification (UQ) approaches. We discuss several axes of trustworthiness that are necessary (if not sufficient) for reliable UQ in ML models, and show how these axes can inform the design and evaluation of uncertainty-aware ML systems. Our practical recommendations focus on scientific ML, offering illustrative cases and use scenarios, particularly in the context of simulation-based inference (SBI).


Binary Cumulative Encoding meets Time Series Forecasting

arXiv.org Machine Learning

Recent studies in time series forecasting have explored formulating regression via classification task. By discretizing the continuous target space into bins and predicting over a fixed set of classes, these approaches benefit from stable training, robust uncertainty modeling, and compatibility with modern deep learning architectures. However, most existing methods rely on one-hot encoding that ignores the inherent ordinal structure of the underlying values. As a result, they fail to provide information about the relative distance between predicted and true values during training. In this paper, we propose to address this limitation by introducing binary cumulative encoding (BCE), that represents scalar targets into monotonic binary vectors. This encoding implicitly preserves order and magnitude information, allowing the model to learn distance-aware representations while still operating within a classification framework. We propose a convolutional neural network architecture specifically designed for BCE, incorporating residual and dilated convolutions to enable fast and expressive temporal modeling. Through extensive experiments on benchmark forecasting datasets, we show that our approach outperforms widely used methods in both point and probabilistic forecasting, while requiring fewer parameters and enabling faster training.


Symmetry-Aware GFlowNets

arXiv.org Machine Learning

Generative Flow Networks (GFlowNets) offer a powerful framework for sampling graphs in proportion to their rewards. However, existing approaches suffer from systematic biases due to inaccuracies in state transition probability computations. These biases, rooted in the inherent symmetries of graphs, impact both atom-based and fragment-based generation schemes. To address this challenge, we introduce Symmetry-Aware GFlowNets (SA-GFN), a method that incorporates symmetry corrections into the learning process through reward scaling. By integrating bias correction directly into the reward structure, SA-GFN eliminates the need for explicit state transition computations. Empirical results show that SA-GFN enables unbiased sampling while enhancing diversity and consistently generating high-reward graphs that closely match the target distribution.


Graph-Based Adversarial Domain Generalization with Anatomical Correlation Knowledge for Cross-User Human Activity Recognition

arXiv.org Artificial Intelligence

Cross-user variability poses a significant challenge in sensor-based Human Activity Recognition (HAR) systems, as traditional models struggle to generalize across users due to differences in behavior, sensor placement, and data distribution. To address this, we propose GNN-ADG (Graph Neural Network with Adversarial Domain Generalization), a novel method that leverages both the strength from both the Graph Neural Networks (GNNs) and adversarial learning to achieve robust cross-user generalization. GNN-ADG models spatial relationships between sensors on different anatomical body parts, extracting three types of Anatomical Units: (1) Interconnected Units, capturing inter-relations between neighboring sensors; (2) Analogous Units, grouping sensors on symmetrical or functionally similar body parts; and (3) Lateral Units, connecting sensors based on their position to capture region-specific coordination. These units information are fused into an unified graph structure with a cyclic training strategy, dynamically integrating spatial, functional, and lateral correlations to facilitate a holistic, user-invariant representation. Information fusion mechanism of GNN-ADG occurs by iteratively cycling through edge topologies during training, allowing the model to refine its understanding of inter-sensor relationships across diverse perspectives. By representing the spatial configuration of sensors as an unified graph and incorporating adversarial learning, Information Fusion GNN-ADG effectively learns features that generalize well to unseen users without requiring target user data during training, making it practical for real-world applications.


Label-shift robust federated feature screening for high-dimensional classification

arXiv.org Machine Learning

Distributed and federated learning are important tools for high-dimensional classification of large datasets. To reduce computational costs and overcome the curse of dimensionality, feature screening plays a pivotal role in eliminating irrelevant features during data preprocessing. However, data heterogeneity, particularly label shifting across different clients, presents significant challenges for feature screening. This paper introduces a general framework that unifies existing screening methods and proposes a novel utility, label-shift robust federated feature screening (LR-FFS), along with its federated estimation procedure. The framework facilitates a uniform analysis of methods and systematically characterizes their behaviors under label shift conditions. Building upon this framework, LR-FFS leverages conditional distribution functions and expectations to address label shift without adding computational burdens and remains robust against model misspecification and outliers. Additionally, the federated procedure ensures computational efficiency and privacy protection while maintaining screening effectiveness comparable to centralized processing. We also provide a false discovery rate (FDR) control method for federated feature screening. Experimental results and theoretical analyses demonstrate LR-FFS's superior performance across diverse client environments, including those with varying class distributions, sample sizes, and missing categorical data.


Flashbacks to Harmonize Stability and Plasticity in Continual Learning

arXiv.org Machine Learning

We introduce Flashback Learning (FL), a novel method designed to harmonize the stability and plasticity of models in Continual Learning (CL). Unlike prior approaches that primarily focus on regularizing model updates to preserve old information while learning new concepts, FL explicitly balances this trade-off through a bidirectional form of regularization. This approach effectively guides the model to swiftly incorporate new knowledge while actively retaining its old knowledge. FL operates through a two-phase training process and can be seamlessly integrated into various CL methods, including replay, parameter regularization, distillation, and dynamic architecture techniques. In designing FL, we use two distinct knowledge bases: one to enhance plasticity and another to improve stability. FL ensures a more balanced model by utilizing both knowledge bases to regularize model updates. Theoretically, we analyze how the FL mechanism enhances the stability-plasticity balance. Empirically, FL demonstrates tangible improvements over baseline methods within the same training budget. By integrating FL into at least one representative baseline from each CL category, we observed an average accuracy improvement of up to 4.91% in Class-Incremental and 3.51% in Task-Incremental settings on standard image classification benchmarks. Additionally, measurements of the stability-to-plasticity ratio confirm that FL effectively enhances this balance. FL also outperforms state-of-the-art CL methods on more challenging datasets like ImageNet. Introduction Our brain excels at learning new information without significantly disrupting or forgetting previously acquired knowledge. In contrast, Deep Neural Networks (DNNs) often struggle with Catastrophic Forgetting (CF) [1], where new information can overwrite and interfere with existing knowledge. This phenomenon poses a significant challenge when continuous learning of new tasks is required. Continual Learning (CL) methods have been developed to mitigate catastrophic forgetting by enabling DNNs to retain prior knowledge while acquiring new skills. Deploying AI models in realistic settings demands that models function effectively when encountering data with varying distributions over time.


Projection Pursuit Density Ratio Estimation

arXiv.org Machine Learning

Density ratio estimation (DRE) is a paramount task in machine learning, for its broad applications across multiple domains, such as covariate shift adaptation, causal inference, independence tests and beyond. Parametric methods for estimating the density ratio possibly lead to biased results if models are misspecified, while conventional non-parametric methods suffer from the curse of dimensionality when the dimension of data is large. To address these challenges, in this paper, we propose a novel approach for DRE based on the projection pursuit (PP) approximation. The proposed method leverages PP to mitigate the impact of high dimensionality while retaining the model flexibility needed for the accuracy of DRE. We establish the consistency and the convergence rate for the proposed estimator. Experimental results demonstrate that our proposed method outperforms existing alternatives in various applications.


Accurate Estimation of Mutual Information in High Dimensional Data

arXiv.org Machine Learning

Mutual information (MI) is a measure of statistical dependencies between two variables, widely used in data analysis. Thus, accurate methods for estimating MI from empirical data are crucial. Such estimation is a hard problem, and there are provably no estimators that are universally good for finite datasets. Common estimators struggle with high-dimensional data, which is a staple of modern experiments. Recently, promising machine learning-based MI estimation methods have emerged. Yet it remains unclear if and when they produce accurate results, depending on dataset sizes, statistical structure of the data, and hyperparameters of the estimators, such as the embedding dimensionality or the duration of training. There are also no accepted tests to signal when the estimators are inaccurate. Here, we systematically explore these gaps. We propose and validate a protocol for MI estimation that includes explicit checks ensuring reliability and statistical consistency. Contrary to accepted wisdom, we demonstrate that reliable MI estimation is achievable even with severely undersampled, high-dimensional datasets, provided these data admit accurate low-dimensional representations. These findings broaden the potential use of machine learning-based MI estimation methods in real-world data analysis and provide new insights into when and why modern high-dimensional, self-supervised algorithms perform effectively.


"Who experiences large model decay and why?" A Hierarchical Framework for Diagnosing Heterogeneous Performance Drift

arXiv.org Machine Learning

Machine learning (ML) models frequently experience performance degradation when deployed in new contexts. Such degradation is rarely uniform: some subgroups may suffer large performance decay while others may not. Understanding where and how large differences in performance arise is critical for designing targeted corrective actions that mitigate decay for the most affected subgroups while minimizing any unintended effects. Current approaches do not provide such detailed insight, as they either (i) explain how average performance shifts arise or (ii) identify adversely affected subgroups without insight into how this occurred. To this end, we introduce a Subgroup-scanning Hierarchical Inference Framework for performance drifT (SHIFT). SHIFT first asks "Is there any subgroup with unacceptably large performance decay due to covariate/outcome shifts?" (Where?) and, if so, dives deeper to ask "Can we explain this using more detailed variable(subset)-specific shifts?" (How?). In real-world experiments, we find that SHIFT identifies interpretable subgroups affected by performance decay, and suggests targeted actions that effectively mitigate the decay.