Goto

Collaborating Authors

 Bayesian Learning


Bayesian preference elicitation for decision support in multiobjective optimization

arXiv.org Machine Learning

We present a novel approach to help decision-makers efficiently identify preferred solutions from the Pareto set of a multi-objective optimization problem. Our method uses a Bayesian model to estimate the decision-maker's utility function based on pairwise comparisons. Aided by this model, a principled elicitation strategy selects queries interactively to balance exploration and exploitation, guiding the discovery of high-utility solutions. The approach is flexible: it can be used interactively or a posteriori after estimating the Pareto front through standard multi-objective optimization techniques. Additionally, at the end of the elicitation phase, it generates a reduced menu of high-quality solutions, simplifying the decision-making process. Through experiments on test problems with up to nine objectives, our method demonstrates superior performance in finding high-utility solutions with a small number of queries. We also provide an open-source implementation of our method to support its adoption by the broader community.


Sequential Bayesian Design for Efficient Surrogate Construction in the Inversion of Darcy Flows

arXiv.org Machine Learning

Inverse problems governed by partial differential equations (PDEs) play a crucial role in various fields, including computational science, image processing, and engineering. Particularly, Darcy flow equation is a fundamental equation in fluid mechanics, which plays a crucial role in understanding fluid flow through porous media. Bayesian methods provide an effective approach for solving PDEs inverse problems, while their numerical implementation requires numerous evaluations of computationally expensive forward solvers. Therefore, the adoption of surrogate models with lower computational costs is essential. However, constructing a globally accurate surrogate model for high-dimensional complex problems demands high model capacity and large amounts of data. To address this challenge, this study proposes an efficient locally accurate surrogate that focuses on the high-probability regions of the true likelihood in inverse problems, with relatively low model complexity and few training data requirements. Additionally, we introduce a sequential Bayesian design strategy to acquire the proposed surrogate since the high-probability region of the likelihood is unknown. The strategy treats the posterior evolution process of sequential Bayesian design as a Gaussian process, enabling algorithmic acceleration through one-step ahead prior. The complete algorithmic framework is referred to as Sequential Bayesian design for locally accurate surrogate (SBD-LAS). Finally, three experiments based the Darcy flow equation demonstrate the advantages of the proposed method in terms of both inversion accuracy and computational speed.


Debiased maximum-likelihood estimators for hazard ratios under machine-learning adjustment

arXiv.org Machine Learning

Previous studies have shown that hazard ratios between treatment groups estimated with the Cox model are uninterpretable because the indefinite baseline hazard of the model fails to identify temporal change in the risk set composition due to treatment assignment and unobserved factors among multiple, contradictory scenarios. To alleviate this problem, especially in studies based on observational data with uncontrolled dynamic treatment and real-time measurement of many covariates, we propose abandoning the baseline hazard and using machine learning to explicitly model the change in the risk set with or without latent variables. For this framework, we clarify the context in which hazard ratios can be causally interpreted, and then develop a method based on Neyman orthogonality to compute debiased maximum-likelihood estimators of hazard ratios. Computing the constructed estimators is more efficient than computing those based on weighted regression with marginal structural Cox models. Numerical simulations confirm that the proposed method identifies the ground truth with minimal bias. These results lay the foundation for developing a useful, alternative method for causal inference with uncontrolled, observational data in modern epidemiology.


Should Bias Always be Eliminated? A Principled Framework to Use Data Bias for OOD Generation

arXiv.org Artificial Intelligence

Most existing methods for adapting models to out-of-distribution (OOD) domains rely on invariant representation learning to eliminate the influence of biased features. However, should bias always be eliminated -- and if not, when should it be retained, and how can it be leveraged? To address these questions, we first present a theoretical analysis that explores the conditions under which biased features can be identified and effectively utilized. Building on this theoretical foundation, we introduce a novel framework that strategically leverages bias to complement invariant representations during inference. The framework comprises two key components that leverage bias in both direct and indirect ways: (1) using invariance as guidance to extract predictive ingredients from bias, and (2) exploiting identified bias to estimate the environmental condition and then use it to explore appropriate bias-aware predictors to alleviate environment gaps. We validate our approach through experiments on both synthetic datasets and standard domain generalization benchmarks. Results consistently demonstrate that our method outperforms existing approaches, underscoring its robustness and adaptability.


Decentralized Federated Learning of Probabilistic Generative Classifiers

arXiv.org Artificial Intelligence

--Federated learning is a paradigm of increasing relevance in real world applications, aimed at building a global model across a network of heterogeneous users without requiring the sharing of private data. We focus on model learning over decentralized architectures, where users collaborate directly to update the global model without relying on a central server . In this context, the current paper proposes a novel approach to collaboratively learn probabilistic generative classifiers with a parametric form. The framework is composed by a communication network over a set of local nodes, each of one having its own local data, and a local updating rule. The proposal involves sharing local statistics with neighboring nodes, where each node aggregates the neighbors' information and iteratively learns its own local classifier, which progressively converges to a global model. Extensive experiments demonstrate that the algorithm consistently converges to a globally competitive model across a wide range of network topologies, network sizes, local dataset sizes, and extreme non-i.i.d. In recent years, federated learning (FL) [1], [2] has gained increasing attention from both the research community [3], [4] and private companies [5], [6], as it enables the development of machine learning models across multiple users without requiring data centralization. This design inherently offers a fundamental layer of privacy while reducing the costs associated with massive data storage. FL traditionally achieves this by using a user-server architecture, where users train local models and share updates with a central server that aggregates them to build a global model [7], [8]. In contrast, decentralized FL [4], [9], [10] eliminates the need for a central server by enabling users to communicate directly and collaboratively train machine learning models.


Probabilistic Graphical Models: A Concise Tutorial

arXiv.org Artificial Intelligence

Probabilistic graphical modeling is a branch of machine learning that uses probability distributions to describe the world, make predictions, and support decision-making under uncertainty. Underlying this modeling framework is an elegant body of theory that bridges two mathematical traditions: probability and graph theory. This framework provides compact yet expressive representations of joint probability distributions, yielding powerful generative models for probabilistic reasoning. This tutorial provides a concise introduction to the formalisms, methods, and applications of this modeling framework. After a review of basic probability and graph theory, we explore three dominant themes: (1) the representation of multivariate distributions in the intuitive visual language of graphs, (2) algorithms for learning model parameters and graphical structures from data, and (3) algorithms for inference, both exact and approximate.


A Distributional View of High Dimensional Optimization

arXiv.org Machine Learning

This PhD thesis presents a distributional view of optimization in place of a worst-case perspective. We motivate this view with an investigation of the failure point of classical optimization. Subsequently we consider the optimization of a randomly drawn objective function. This is the setting of Bayesian Optimization. After a review of Bayesian optimization we outline how such a distributional view may explain predictable progress of optimization in high dimension. It further turns out that this distributional view provides insights into optimal step size control of gradient descent. To enable these results, we develop mathematical tools to deal with random input to random functions and a characterization of non-stationary isotropic covariance kernels. Finally, we outline how assumptions about the data, specifically exchangability, can lead to random objective functions in machine learning and analyze their landscape.


The Joys of Categorical Conformal Prediction

arXiv.org Machine Learning

Conformal prediction (CP) is an Uncertainty Representation technique that delivers finite-sample calibrated prediction regions for any underlying Machine Learning model. Its status as an Uncertainty Quantification (UQ) tool, though, has remained conceptually opaque: While Conformal Prediction Regions (CPRs) give an ordinal representation of uncertainty (larger regions typically indicate higher uncertainty), they lack the capability to cardinally quantify it (twice as large regions do not imply twice the uncertainty). We adopt a category-theoretic approach to CP -- framing it as a morphism, embedded in a commuting diagram, of two newly-defined categories -- that brings us three joys. First, we show that -- under minimal assumptions -- CP is intrinsically a UQ mechanism, that is, its cardinal UQ capabilities are a structural feature of the method. Second, we demonstrate that CP bridges (and perhaps subsumes) the Bayesian, frequentist, and imprecise probabilistic approaches to predictive statistical reasoning. Finally, we show that a CPR is the image of a covariant functor. This observation is relevant to AI privacy: It implies that privacy noise added locally does not break the global coverage guarantee.


Structural Effect and Spectral Enhancement of High-Dimensional Regularized Linear Discriminant Analysis

arXiv.org Machine Learning

Regularized linear discriminant analysis (RLDA) is a widely used tool for classification and dimensionality reduction, but its performance in high-dimensional scenarios is inconsistent. Existing theoretical analyses of RLDA often lack clear insight into how data structure affects classification performance. To address this issue, we derive a non-asymptotic approximation of the misclassification rate and thus analyze the structural effect and structural adjustment strategies of RLDA. Based on this, we propose the Spectral Enhanced Discriminant Analysis (SEDA) algorithm, which optimizes the data structure by adjusting the spiked eigenvalues of the population covariance matrix. By developing a new theoretical result on eigenvectors in random matrix theory, we derive an asymptotic approximation on the misclassification rate of SEDA. The bias correction algorithm and parameter selection strategy are then obtained. Experiments on synthetic and real datasets show that SEDA achieves higher classification accuracy and dimensionality reduction compared to existing LDA methods.


Learning Text Styles: A Study on Transfer, Attribution, and Verification

arXiv.org Artificial Intelligence

This thesis advances the computational understanding and manipulation of text styles through three interconnected pillars: (1) Text Style Transfer (TST), which alters stylistic properties (e.g., sentiment, formality) while preserving content; (2)Authorship Attribution (AA), identifying the author of a text via stylistic fingerprints; and (3) Authorship V erification (A V), determining whether two texts share the same authorship. We address critical challenges in these areas by leveraging parameter-efficient adaptation of large language models (LLMs), contrastive disentanglement of stylistic features, and instruction-based fine-tuning for explainable verification. First, for TST, we conduct a comprehensive survey and reproducibility study of 19 state-of-the-art algorithms, establishing benchmarks across diverse datasets. Building on these insights, we introduce LLM-Adapters, a unified framework for parameter-efficient fine-tuning (PEFT) that enables cost-effective adaptation of LLMs for style-centric tasks. This culminates in Adapter-TST, a novel architecture that models multiple stylistic attributes (e.g., sentiment, tense) using lightweight neural adapters. Adapter-TST achieves superior performance in multi-attribute transfer and compositional editing while reducing computational costs by 80% compared to full fine-tuning. For AA, we propose ContrastDistAA, a contrastive learning framework that disentangles content and style features to address performance degradation under topic shifts. Our method advances both individual-level attribution and regional linguistic analysis, achieving state-of-the-art accuracy by isolating culturally influenced stylistic patterns.