AITopics

2505.05842

Country:

Europe > Kosovo > District of Gjilan > Kamenica (0.04)
Europe > Italy > Lazio > Rome (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(7 more...)

Genre: Instructional Material > Online (0.81)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.60)
Information Technology > Data Science > Data Mining > Big Data (0.46)

Justin, Nathan, Sun, Qingshi, Gómez, Andrés, Vayanos, Phebe

Mixed-Integer Optimization for Responsible Machine Learning

arXiv.org Machine LearningMay-12-2025

In the last few decades, Machine Learning (ML) has achieved significant success across domains ranging from healthcare, sustainability, and the social sciences, to criminal justice and finance. But its deployment in increasingly sophisticated, critical, and sensitive areas affecting individuals, the groups they belong to, and society as a whole raises critical concerns around fairness, transparency, robustness, and privacy, among others. As the complexity and scale of ML systems and of the settings in which they are deployed grow, so does the need for responsible ML methods that address these challenges while providing guaranteed performance in deployment. Mixed-integer optimization (MIO) offers a powerful framework for embedding responsible ML considerations directly into the learning process while maintaining performance. For example, it enables learning of inherently transparent models that can conveniently incorporate fairness or other domain specific constraints. This tutorial paper provides an accessible and comprehensive introduction to this topic discussing both theoretical and practical aspects. It outlines some of the core principles of responsible ML, their importance in applications, and the practical utility of MIO for building ML models that align with these principles. Through examples and mathematical formulations, it illustrates practical strategies and available tools for efficiently solving MIO problems for responsible ML. It concludes with a discussion on current limitations and open research questions, providing suggestions for future work.

artificial intelligence, machine learning, mixed-integer optimization, (14 more...)

2505.05857

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(3 more...)

Genre: Research Report > Experimental Study (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Transportation (0.93)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(4 more...)

Ye, Changkun, Tsuchida, Russell, Petersson, Lars, Barnes, Nick

Open Set Label Shift with Test Time Out-of-Distribution Reference

arXiv.org Artificial IntelligenceMay-12-2025

Open set label shift (OSLS) occurs when label distributions change from a source to a target distribution, and the target distribution has an additional out-of-distribution (OOD) class. In this work, we build estimators for both source and target open set label distributions using a source domain in-distribution (ID) classifier and an ID/OOD classifier . With reasonable assumptions on the ID/OOD classifier, the estimators are assembled into a sequence of three stages: 1) an estimate of the source label distribution of the OOD class, 2) an EM algorithm for Maximum Likelihood estimates (MLE) of the target label distribution, and 3) an estimate of the target label distribution of OOD class under relaxed assumptions on the OOD classifier . The sampling errors of estimates in 1) and 3) are quantified with a concentration inequality. The estimation result allows us to correct the ID classifier trained on the source distribution to the target distribution without retraining. Experiments on a variety of open set label shift settings demonstrate the effectiveness of our model.

classifier, data mining, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2505.05868

Genre: Research Report > New Finding (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
(2 more...)

Sankoh, Aroon, Wickerhauser, Victor

Comparing statistical and deep learning techniques for parameter estimation of continuous-time stochastic differentiable equations

arXiv.org Artificial IntelligenceMay-8-2025

--Stochastic differential equations such as the Ornstein-Uhlenbeck process have long been used to model real-world probablistic events such as stock prices and temperature fluctuations. While statistical methods such as Maximum Likelihood Estimation (MLE), Kalman Filtering, Inverse V ariable Method, and more have historically been used to estimate the parameters of stochastic differential equations, the recent explosion of deep learning technology suggests that models such as a Recurrent Neural Network (RNN) could produce more precise estimators. We present a series of experiments that compare the estimation accuracy and computational expensiveness of a statistical method (MLE) with a deep learning model (RNN) for the parameters of the Ornstein-Uhlenbeck process. I NTRODUCTION In section I, we will define the Ornstein-Uhlenbeck (OU) stochastic process and explore some of the theory behind its solution. After introducing useful properties of the OU process, we can define the likelihood function to optimize for MLE estimation and search algorithm(s) we will use to obtain estimates.

artificial intelligence, estimation, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2505.0398

Country: North America > United States > Missouri > St. Louis County > St. Louis (0.04)

Genre: Research Report (0.83)

Industry: Banking & Finance (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.55)

Yi, Yinzhuang, Cortés, Jorge, Atanasov, Nikolay

Variational Formulation of the Particle Flow Particle Filter

This paper provides a formulation of the particle flow particle filter from the perspective of variational inference. We show that the transient density used to derive the particle flow particle filter follows a time-scaled trajectory of the Fisher-Rao gradient flow in the space of probability densities. The Fisher-Rao gradient flow is obtained as a continuous-time algorithm for variational inference, minimizing the Kullback-Leibler divergence between a variational density and the true posterior density.

artificial intelligence, bayesian inference, machine learning, (15 more...)

2505.04007

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > California > San Diego County > La Jolla (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)

Bayesian Estimation of Extreme Quantiles and the Exceedance Distribution for Paretian Tails

Johnston, Douglas E.

Estimating extreme quantiles is an important task in many applications, including financial risk management and climatology. More important than estimating the quantile itself is to insure zero coverage error, which implies the quantile estimate should, on average, reflect the desired probability of exceedance. In this research, we show that for unconditional distributions isomorphic to the exponential, a Bayesian quantile estimate results in zero coverage error. This compares to the traditional maximum likelihood method, where the coverage error can be significant under small sample sizes even though the quantile estimate is unbiased. More generally, we prove a sufficient condition for an unbiased quantile estimator to result in coverage error. Interestingly, our results hold by virtue of using a Jeffreys prior for the unknown parameters and is independent of the true prior. We also derive an expression for the distribution, and moments, of future exceedances which is vital for risk assessment. We extend our results to the conditional tail of distributions with asymptotic Paretian tails and, in particular, those in the Fréchet maximum domain of attraction. We illustrate our results using simulations for a variety of light and heavy-tailed distributions.

artificial intelligence, exceedance, machine learning, (17 more...)

2505.04501

Country:

North America > Canada > Ontario > Toronto (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
(2 more...)

Genre: Research Report > New Finding (0.75)

Industry:

Banking & Finance (0.86)
Information Technology > Security & Privacy (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Machine Learning: a Lecture Note

Cho, Kyunghyun

This lecture note is intended to prepare early-year master's and PhD students in data science or a related discipline with foundational ideas in machine learning. It starts with basic ideas in modern machine learning with classification as a main target task. These basic ideas include loss formulation, backpropagation, stochastic gradient descent, generalization, model selection as well as fundamental blocks of artificial neural networks. Based on these basic ideas, the lecture note explores in depth the probablistic approach to unsupervised learning, covering directed latent variable models, product of experts, generative adversarial networks and autoregressive models. Finally, the note ends by covering a diverse set of further topics, such as reinforcement learning, ensemble methods and meta-learning. After reading this lecture note, a student should be ready to embark on studying and researching more advanced topics in machine learning and more broadly artificial intelligence.

artificial intelligence, energy function, machine learning, (17 more...)

2505.03861

Country:

North America > Canada > Ontario > Toronto (0.13)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Industry: Education (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.70)

Lu, Wenhui Sophia, Wong, Wing Hung

Likelihood-Free Adaptive Bayesian Inference via Nonparametric Distribution Matching

When the likelihood is analytically unavailable and computationally intractable, approximate Bayesian computation (ABC) has emerged as a widely used methodology for approximate posterior inference; however, it suffers from severe computational inefficiency in high-dimensional settings or under diffuse priors. To overcome these limitations, we propose Adaptive Bayesian Inference (ABI), a framework that bypasses traditional data-space discrepancies and instead compares distributions directly in posterior space through nonparametric distribution matching. By leveraging a novel Marginally-augmented Sliced Wasserstein (MSW) distance on posterior measures and exploiting its quantile representation, ABI transforms the challenging problem of measuring divergence between posterior distributions into a tractable sequence of one-dimensional conditional quantile regression tasks. Moreover, we introduce a new adaptive rejection sampling scheme that iteratively refines the posterior approximation by updating the proposal distribution via generative density estimation. Theoretically, we establish parametric convergence rates for the trimmed MSW distance and prove that the ABI posterior converges to the true posterior as the tolerance threshold vanishes. Through extensive empirical evaluation, we demonstrate that ABI significantly outperforms data-based Wasserstein ABC, summary-based ABC, and state-of-the-art likelihood-free simulators, especially in high-dimensional or dependent observation regimes.

artificial intelligence, machine learning, posterior, (17 more...)

2505.04603

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Spain > Galicia > Madrid (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.88)

PAC-Bayesian risk bounds for fully connected deep neural network with Gaussian priors

Mai, The Tien

Deep neural networks (DNNs) have emerged as a powerful methodology with significant practical successes in fields such as computer vision and natural language processing. Recent works have demonstrated that sparsely connected DNNs with carefully designed architectures can achieve minimax estimation rates under classical smoothness assumptions. However, subsequent studies revealed that simple fully connected DNNs can achieve comparable convergence rates, challenging the necessity of sparsity. Theoretical advances in Bayesian neural networks (BNNs) have been more fragmented. Much of those work has concentrated on sparse networks, leaving the theoretical properties of fully connected BNNs underexplored. In this paper, we address this gap by investigating fully connected Bayesian DNNs with Gaussian prior using PAC-Bayes bounds. We establish upper bounds on the prediction risk for a probabilistic deep neural network method, showing that these bounds match (up to logarithmic factors) the minimax-optimal rates in Besov space, for both nonparametric regression and binary classification with logistic loss. Importantly, our results hold for a broad class of practical activation functions that are Lipschitz continuous.

artificial intelligence, machine learning, neural network, (16 more...)

2505.04341

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New York (0.04)
Europe > France (0.04)
(4 more...)

Genre: Research Report > New Finding (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Ohl, Louis, Mattei, Pierre-Alexandre, Precioso, Frédéric

A Tutorial on Discriminative Clustering and Mutual Information

To cluster data is to separate samples into distinctive groups that should ideally have some cohesive properties. Today, numerous clustering algorithms exist, and their differences lie essentially in what can be perceived as ``cohesive properties''. Therefore, hypotheses on the nature of clusters must be set: they can be either generative or discriminative. As the last decade witnessed the impressive growth of deep clustering methods that involve neural networks to handle high-dimensional data often in a discriminative manner; we concentrate mainly on the discriminative hypotheses. In this paper, our aim is to provide an accessible historical perspective on the evolution of discriminative clustering methods and notably how the nature of assumptions of the discriminative models changed over time: from decision boundaries to invariance critics. We notably highlight how mutual information has been a historical cornerstone of the progress of (deep) discriminative clustering methods. We also show some known limitations of mutual information and how discriminative clustering methods tried to circumvent those. We then discuss the challenges that discriminative clustering faces with respect to the selection of the number of clusters. Finally, we showcase these techniques using the dedicated Python package, GemClus, that we have developed for discriminative clustering.

artificial intelligence, machine learning, mutual information, (16 more...)

2505.04484

Country:

North America > United States > New York > New York County > New York City (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Sweden > Östergötland County > Linköping (0.04)
(3 more...)

Genre:

Instructional Material > Course Syllabus & Notes (1.00)
Overview (0.92)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.67)
Health & Medicine > Diagnostic Medicine > Imaging (0.45)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)