AITopics

Department of Decision Sciences, Bocconi Institute for Data Science and Analytics, Bocconi University, Milan Abstract Accurate tuning of hyperparameters is crucial to ensure that models can generalise effectively across different settings. We construct a variational approximation to a hierarchical Bayes procedure, and derive upper bounds for the contraction rate of the variational posterior in an abstract setting. The theory is applied to various Gaussian process priors and variational classes, resulting in minimax optimal rates. Our theoretical results are accompanied with numerical analysis both on synthetic and real world data sets. Keywords: variational inference, Bayesian model selection, Gaussian processes, nonparametric regression, adaptation, posterior contraction rates 1 Introduction A core challenge in Bayesian statistics is scalability, i.e. the computation of the posterior for large sample sizes. Variational Bayes approximation is a standard approach to speed up inference. Variational posteriors are random probability measures that minimise the Kullback-Leibler divergence between a suitable class of distributions and the otherwise hard to compute posterior. Typically, the variational class of distributions over which the optimisation takes place does not contain the original posterior, hence the variational procedure can be viewed as a projection onto this class. The projected variational distribution then approximates the posterior. During the approximation procedure one inevitably loses information and hence it is important to characterize the accuracy of the approach. Despite the wide use of variational approximations, their theoretical underpinning started to emerge only recently, see for instance Alquier and Ridgway (2020); Yang et al. (2020); Zhang and Gao (2020a); Ray and Szab o (2022). In a Bayesian procedure, the choice of prior reflects the presumed properties of the unknown parameter. In comparison to regular parametric models, where in view of the Bernstein-von Mises theorem the posterior is asymptotically normal, the prior plays a crucial role in the asymptotic behaviour of the posterior. In fact, the large-sample behaviour of the posterior typically depends intricately on the choice of prior hyperparam-eters, so it is vital that these are tuned correctly. The two classical approaches are hierarchical and empirical Bayes methods.

artificial intelligence, machine learning, posterior, (19 more...)

arXiv.org Machine Learning

2504.03321

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.87)

Brahmbhatt, Anand, Buzaglo, Gon, Druchyna, Sofiia, Hazan, Elad

A New Approach to Controlling Linear Dynamical Systems

We propose a new method for controlling linear dynamical systems under adversarial disturbances and cost functions. Our algorithm achieves a running time that scales polylogarithmically with the inverse of the stability margin, improving upon prior methods with polynomial dependence maintaining the same regret guarantees. The technique, which may be of independent interest, is based on a novel convex relaxation that approximates linear control policies using spectral filters constructed from the eigenvectors of a specific Hankel matrix.

artificial intelligence, linear dynamical system, machine learning, (17 more...)

arXiv.org Machine Learning

2504.03952

Genre: Research Report (0.65)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Subedi, Unique, Tewari, Ambuj

Operator Learning: A Statistical Perspective

Operator learning has emerged as a powerful tool in scientific computing for approximating mappings between infinite-dimensional function spaces. A primary application of operator learning is the development of surrogate models for the solution operators of partial differential equations (PDEs). These methods can also be used to develop black-box simulators to model system behavior from experimental data, even without a known mathematical model. In this article, we begin by formalizing operator learning as a function-to-function regression problem and review some recent developments in the field. We also discuss PDE-specific operator learning, outlining strategies for incorporating physical and mathematical constraints into architecture design and training processes. Finally, we end by highlighting key future directions such as active data collection and the development of rigorous uncertainty quantification frameworks.

artificial intelligence, machine learning, survey article, (19 more...)

arXiv.org Machine Learning

2504.03503

Country: North America > United States (0.68)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Health & Medicine (0.69)
Education (0.68)

Technology:

Information Technology > Mathematics of Computing (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(3 more...)

Hemdanou, Abderrafik Laakel, Achtoun, Youssef, Sefian, Mohammed Lamarti, Tahiri, Ismail, Afia, Abdellatif El

Random Normed k-Means: A Paradigm-Shift in Clustering within Probabilistic Metric Spaces

Existing approaches remain largely constrained by traditional distance metrics, limiting their effectiveness in handling random data. In this work, we introduce the first k-means variant in the literature that operates within a probabilistic metric space, replacing conventional distance measures with a well-defined distance distribution function. This pioneering approach enables more flexible and robust clustering in both deterministic and random datasets, establishing a new foundation for clustering in stochastic environments. By adopting a probabilistic perspective, our method not only introduces a fresh paradigm but also establishes a rigorous theoretical framework that is expected to serve as a key reference for future clustering research involving random data. Extensive experiments on diverse real and synthetic datasets assess our model's effectiveness using widely recognized evaluation metrics, including Silhouette, Davies-Bouldin, Calinski Harabasz, the adjusted Rand index, and distortion. Comparative analyses against established methods such as k-means++, fuzzy c-means, and kernel probabilistic k-means demonstrate the superior performance of our proposed random normed k-means (RNKM) algorithm. Notably, RNKM exhibits a remarkable ability to identify nonlinearly separable structures, making it highly effective in complex clustering scenarios. These findings position RNKM as a groundbreaking advancement in clustering research, offering a powerful alternative to traditional techniques while addressing a long-standing gap in the literature. By bridging probabilistic metrics with clustering, this study provides a foundational reference for future developments and opens new avenues for advanced data analysis in dynamic, data-driven applications.

artificial intelligence, machine learning, survey article, (17 more...)

arXiv.org Machine Learning

2504.03928

Country: North America > United States (0.28)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.66)
Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area (0.68)
Information Technology > Security & Privacy (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Dabo, Issa-Mbenard, Bigot, Jérémie

High-dimensional ridge regression with random features for non-identically distributed data with a variance profile

The behavior of the random feature model in the high-dimensional regression framework has become a popular issue of interest in the machine learning literature}. This model is generally considered for feature vectors $x_i = \Sigma^{1/2} x_i'$, where $x_i'$ is a random vector made of independent and identically distributed (iid) entries, and $\Sigma$ is a positive definite matrix representing the covariance of the features. In this paper, we move beyond {\CB this standard assumption by studying the performances of the random features model in the setting of non-iid feature vectors}. Our approach is related to the analysis of the spectrum of large random matrices through random matrix theory (RMT) {\CB and free probability} results. We turn to the analysis of non-iid data by using the notion of variance profile {\CB which} is {\CB well studied in RMT.} Our main contribution is then the study of the limits of the training and {\CB prediction} risks associated to the ridge estimator in the random features model when its dimensions grow. We provide asymptotic equivalents of these risks that capture the behavior of ridge regression with random features in a {\CB high-dimensional} framework. These asymptotic equivalents, {\CB which prove to be sharp in numerical experiments}, are retrieved by adapting, to our setting, established results from operator-valued free probability theory. Moreover, {\CB for various classes of random feature vectors that have not been considered so far in the literature}, our approach allows to show the appearance of the double descent phenomenon when the ridge regularization parameter is small enough.

artificial intelligence, machine learning, matrix, (18 more...)

arXiv.org Machine Learning

2504.03035

Country:

North America > United States (0.28)
Europe (0.28)

Genre: Research Report (1.00)

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Yusupov, Viacheslav, Rakhuba, Maxim, Frolov, Evgeny

Knowledge Graph Completion with Mixed Geometry Tensor Factorization

Knowledge Graph Completion with Mixed Geometry Tensor Factorization Viacheslav Yusupov Maxim Rakhuba Evgeny Frolov HSE University HSE University AIRI HSE University Abstract In this paper, we propose a new geometric approach for knowledge graph completion via low rank tensor approximation. We augment a pretrained and well-established Euclidean model based on a Tucker tensor decomposition with a novel hyperbolic interaction term. This correction enables more nuanced capturing of distributional properties in data better aligned with real-world knowledge graphs. By combining two geometries together, our approach improves expressivity of the resulting model achieving new state-of-the-art link prediction accuracy with a significantly lower number of parameters compared to the previous Euclidean and hyperbolic models. 1 INTRODUCTION Most of the information in the world can be expressed in terms of entities and the relationships between them. This information is effectively represented in the form of a knowledge graph (d'Amato, 2021; Peng et al., 2023), which serves as a repository for storing various forms of relational data with their interconnections. Particular examples include storing user profiles on social networking platforms (Xu et al., 2018), organizing Internet resources and the links between them, constructing knowledge bases that capture user preferences to enhance the functionality of recommender systems (Wang et al., 2019a; Guo et al., 2020). With the recent emergence of large language models (LLM), knowledge graphs have become an essential tool for improving the consistency and trustworthiness of linguis-Proceedings of the 28 th International Conference on Artificial Intelligence and Statistics (AISTATS) 2025, Mai Khao, Thailand. Among notable examples of their application are fact checking (Pan et al., 2024), hallucinations mitigation (Agrawal et al., 2023), retrieval-augmented generation (Lewis et al., 2020), and generation of corpus for LLM pretraining (Agarwal et al., 2021). This utilization underscores the versatility and utility of knowledge graphs in managing complex datasets and facilitating the manipulation of interconnected information in various domains and downstream tasks. On the other hand, knowledge graphs may present an incomplete view of the world. Relations can evolve and change over time, be subject to errors, processing limitations, and gaps in available information.

large language model, machine learning, natural language, (18 more...)

arXiv.org Machine Learning

2504.02589

Country: Asia > Thailand (0.24)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Online Multivariate Regularized Distributional Regression for High-dimensional Probabilistic Electricity Price Forecasting

Hirsch, Simon

Probabilistic electricity price forecasting (PEPF) is a key task for market participants in short-term electricity markets. The increasing availability of high-frequency data and the need for real-time decision-making in energy markets require online estimation methods for efficient model updating. We present an online, multivariate, regularized distributional regression model, allowing for the modeling of all distribution parameters conditional on explanatory variables. Our approach is based on the combination of the multivariate distributional regression and an efficient online learning algorithm based on online coordinate descent for LASSO-type regularization. Additionally, we propose to regularize the estimation along a path of increasingly complex dependence structures of the multivariate distribution, allowing for parsimonious estimation and early stopping. We validate our approach through one of the first forecasting studies focusing on multivariate probabilistic forecasting in the German day-ahead electricity market while using only online estimation methods. We compare our approach to online LASSO-ARX-models with adaptive marginal distribution and to online univariate distributional models combined with an adaptive Copula. We show that the multivariate distributional regression, which allows modeling all distribution parameters - including the mean and the dependence structure - conditional on explanatory variables such as renewable in-feed or past prices provide superior forecasting performance compared to modeling of the marginals only and keeping a static/unconditional dependence structure. Additionally, online estimation yields a speed-up by a factor of 80 to over 400 times compared to batch fitting.

artificial intelligence, machine learning, real time system, (19 more...)

arXiv.org Machine Learning

2504.02518

Country: Europe (0.46)

Genre: Research Report > New Finding (0.67)

Industry: Energy > Power Industry (1.00)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Architecture > Real Time Systems (0.88)
(2 more...)

Kim, Jung-hun, Oh, Min-hwan

Dynamic Assortment Selection and Pricing with Censored Preference Feedback

In this study, we investigate the problem of dynamic multi-product selection and pricing by introducing a novel framework based on a \textit{censored multinomial logit} (C-MNL) choice model. In this model, sellers present a set of products with prices, and buyers filter out products priced above their valuation, purchasing at most one product from the remaining options based on their preferences. The goal is to maximize seller revenue by dynamically adjusting product offerings and prices, while learning both product valuations and buyer preferences through purchase feedback. To achieve this, we propose a Lower Confidence Bound (LCB) pricing strategy. By combining this pricing strategy with either an Upper Confidence Bound (UCB) or Thompson Sampling (TS) product selection approach, our algorithms achieve regret bounds of $\tilde{O}(d^{\frac{3}{2}}\sqrt{T/\kappa})$ and $\tilde{O}(d^{2}\sqrt{T/\kappa})$, respectively. Finally, we validate the performance of our methods through simulations, demonstrating their effectiveness.

artificial intelligence, machine learning, texp, (14 more...)

arXiv.org Machine Learning

2504.02324

Genre: Research Report > New Finding (0.34)

Industry: Law > Civil Rights & Constitutional Law (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Rapakoulias, George, Pedram, Ali Reza, Tsiotras, Panagiotis

Steering Large Agent Populations using Mean-Field Schrodinger Bridges with Gaussian Mixture Models

The Mean-Field Schrodinger Bridge (MFSB) problem is an optimization problem aiming to find the minimum effort control policy to drive a McKean-Vlassov stochastic differential equation from one probability measure to another. In the context of multiagent control, the objective is to control the configuration of a swarm of identical, interacting cooperative agents, as captured by the time-varying probability measure of their state. Available methods for solving this problem for distributions with continuous support rely either on spatial discretizations of the problem's domain or on approximating optimal solutions using neural networks trained through stochastic optimization schemes. For agents following Linear Time-Varying dynamics, and for Gaussian Mixture Model boundary distributions, we propose a highly efficient parameterization to approximate the solutions of the corresponding MFSB in closed form, without any learning steps. Our proposed approach consists of a mixture of elementary policies, each solving a Gaussian-to-Gaussian Covariance Steering problem from the components of the initial to the components of the terminal mixture. Leveraging the semidefinite formulation of the Covariance Steering problem, our proposed solver can handle probabilistic hard constraints on the system's state, while maintaining numerical tractability. We illustrate our approach on a variety of numerical examples.

artificial intelligence, boundary distribution, machine learning, (16 more...)

arXiv.org Machine Learning

2503.23705

Country:

North America > United States (0.28)
Asia (0.28)

Genre: Research Report (0.50)

Industry: Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.68)

Allen, Alice E. A., Shinkle, Emily, Bujack, Roxana, Lubbers, Nicholas

Optimal Invariant Bases for Atomistic Machine Learning

The representation of atomic configurations for machine learning models has led to the development of numerous descriptors, often to describe the local environment of atoms. However, many of these representations are incomplete and/or functionally dependent. Incomplete descriptor sets are unable to represent all meaningful changes in the atomic environment. Complete constructions of atomic environment descriptors, on the other hand, often suffer from a high degree of functional dependence, where some descriptors can be written as functions of the others. These redundant descriptors do not provide additional power to discriminate between different atomic environments and increase the computational burden. By employing techniques from the pattern recognition literature to existing atomistic representations, we remove descriptors that are functions of other descriptors to produce the smallest possible set that satisfies completeness. We apply this in two ways: first we refine an existing description, the Atomistic Cluster Expansion. We show that this yields a more efficient subset of descriptors. Second, we augment an incomplete construction based on a scalar neural network, yielding a new message-passing network architecture that can recognize up to 5-body patterns in each neuron by taking advantage of an optimal set of Cartesian tensor invariants. This architecture shows strong accuracy on state-of-the-art benchmarks while retaining low computational cost. Our results not only yield improved models, but point the way to classes of invariant bases that minimize cost while maximizing expressivity for a host of applications.

artificial intelligence, invariant, machine learning, (18 more...)

arXiv.org Machine Learning

2503.23515

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.87)

Industry:

Energy (0.93)
Government > Regional Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)