AITopics

2511.21223

Country:

Europe > France > Hauts-de-France > Nord > Lille (0.04)
North America > United States > Illinois > Champaign County > Urbana (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Bonnasse-Gahot, Laurent, Nadal, Jean-Pierre

Category learning in deep neural networks: Information content and geometry of internal representations

In humans and other animals, category learning enhances discrimination between stimuli close to the category boundary. This phenomenon, called categorical perception, was also empirically observed in artificial neural networks trained on classification tasks. In previous modeling works based on neuroscience data, we show that this expansion/compression is a necessary outcome of efficient learning. Here we extend our theoretical framework to artificial networks. We show that minimizing the Bayes cost (mean of the cross-entropy loss) implies maximizing the mutual information between the set of categories and the neural activities prior to the decision layer. Considering structured data with an underlying feature space of small dimension, we show that maximizing the mutual information implies (i) finding an appropriate projection space, and, (ii) building a neural representation with the appropriate metric. The latter is based on a Fisher information matrix measuring the sensitivity of the neural activity to changes in the projection space. Optimal learning makes this neural Fisher information follow a category-specific Fisher information, measuring the sensitivity of the category membership. Category learning thus induces an expansion of neural space near decision boundaries. We characterize the properties of the categorical Fisher information, showing that its eigenvectors give the most discriminant directions at each point of the projection space. We find that, unexpectedly, its maxima are in general not exactly at, but near, the class boundaries. Considering toy models and the MNIST dataset, we numerically illustrate how after learning the two Fisher information matrices match, and essentially align with the category boundaries. Finally, we relate our approach to the Information Bottleneck one, and we exhibit a bias-variance decomposition of the Bayes cost, of interest on its own.

artificial intelligence, information, machine learning, (19 more...)

doi: 10.1103/mp35-bdx5

2510.19021

Country:

Europe (0.92)
North America > United States (0.92)

Genre:

Research Report (1.00)
Instructional Material > Course Syllabus & Notes (0.45)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.45)

Learning Multi-Order Block Structure in Higher-Order Networks

Nakajima, Kazuki, Sasaki, Yuya, Uno, Takeaki, Aida, Masaki

Higher-order networks, naturally described as hypergraphs, are essential for modeling real-world systems involving interactions among three or more entities. Stochastic block models offer a principled framework for characterizing mesoscale organization, yet their extension to hypergraphs involves a trade-off between expressive power and computational complexity. A recent simplification, a single-order model, mitigates this complexity by assuming a single affinity pattern governs interactions of all orders. This universal assumption, however, may overlook order-dependent structural details. Here, we propose a framework that relaxes this assumption by introducing a multi-order block structure, in which different affinity patterns govern distinct subsets of interaction orders. Our framework is based on a multi-order stochastic block model and searches for the optimal partition of the set of interaction orders that maximizes out-of-sample hyperlink prediction performance. Analyzing a diverse range of real-world networks, we find that multi-order block structures are prevalent. Accounting for them not only yields better predictive performance over the single-order model but also uncovers sharper, more interpretable mesoscale organization. Our findings reveal that order-dependent mechanisms are a key feature of the mesoscale organization of real-world higher-order networks.

data mining, information retrieval, machine learning, (21 more...)

2511.2135

Country: North America > United States (1.00)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Regional Government > North America Government > United States Government (0.93)
Health & Medicine > Therapeutic Area (0.93)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Information Management > Search (1.00)
Information Technology > Data Science > Data Mining (1.00)
(10 more...)

Data-Driven Methods and AI in Engineering Design: A Systematic Literature Review Focusing on Challenges and Opportunities

Afifi, Nehal, Wittig, Christoph, Paehler, Lukas, Lindenmann, Andreas, Wolter, Kai, Leitenberger, Felix, Dogru, Melih, Grauberger, Patric, Düser, Tobias, Albers, Albert, Matthiesen, Sven

The increasing availability of data and advancements in computational intelligence have accelerated the adoption of data-driven methods (DDMs) in product development. However, their integration into product development remains fragmented. This fragmentation stems from uncertainty, particularly the lack of clarity on what types of DDMs to use and when to employ them across the product development lifecycle. To address this, a necessary first step is to investigate the usage of DDM in engineering design by identifying which methods are being used, at which development stages, and for what application. This paper presents a PRISMA systematic literature review. The V-model as a product development framework was adopted and simplified into four stages: system design, system implementation, system integration, and validation. A structured search across Scopus, Web of Science, and IEEE Xplore (2014--2024) retrieved 1{,}689 records. After screening, 114 publications underwent full-text analysis. Findings show that machine learning (ML) and statistical methods dominate current practice, whereas deep learning (DL), though still less common, exhibits a clear upward trend in adoption. Additionally, supervised learning, clustering, regression analysis, and surrogate modeling are prevalent in design, implementation, and integration system stages but contributions to validation remain limited. Key challenges in existing applications include limited model interpretability, poor cross-stage traceability, and insufficient validation under real-world conditions. Additionally, it highlights key limitations and opportunities such as the need for interpretable hybrid models. This review is a first step toward design-stage guidelines; a follow-up synthesis should map computer science algorithms to engineering design problems and activities.

data mining, machine learning, natural language, (22 more...)

2511.2073

Country: Europe (0.46)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Transportation > Air (1.00)
Information Technology (1.00)
Automobiles & Trucks (1.00)
(4 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
(5 more...)

Tasdighi, Bahareh, Haussmann, Manuel, Wu, Yi-Shan, Masegosa, Andres R., Kandemir, Melih

Deep Actor-Critics with Tight Risk Certificates

Deep actor-critic algorithms have reached a level where they influence everyday life. They are a driving force behind continual improvement of large language models through user feedback. However, their deployment in physical systems is not yet widely adopted, mainly because no validation scheme fully quantifies their risk of malfunction. We demonstrate that it is possible to develop tight risk certificates for deep actor-critic algorithms that predict generalization performance from validation-time observations. Our key insight centers on the effectiveness of minimal evaluation data. A small feasible set of evaluation roll-outs collected from a pretrained policy suffices to produce accurate risk certificates when combined with a simple adaptation of PAC-Bayes theory. Specifically, we adopt a recently introduced recursive PAC-Bayes approach, which splits validation data into portions and recursively builds PAC-Bayes bounds on the excess loss of each portion's predictor, using the predictor from the previous portion as a data-informed prior. Our empirical results across multiple locomotion tasks, actor-critic methods, and policy expertise levels demonstrate risk certificates tight enough to be considered for practical use.

large language model, machine learning, policy seed 01, (16 more...)

2505.19682

Country: Europe > Denmark (0.28)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(4 more...)

Variational bagging: a robust approach for Bayesian uncertainty quantification

Fan, Shitao, Ohn, Ilsang, Dunson, David, Lin, Lizhen

Variational Bayes methods are popular due to their computational efficiency and adaptability to diverse applications. In specifying the variational family, mean-field classes are commonly used, which enables efficient algorithms such as coordinate ascent variational inference (CAVI) but fails to capture parameter dependence and typically underestimates uncertainty. In this work, we introduce a variational bagging approach that integrates a bagging procedure with variational Bayes, resulting in a bagged variational posterior for improved inference. We establish strong theoretical guarantees, including posterior contraction rates for general models and a Bernstein-von Mises (BVM) type theorem that ensures valid uncertainty quantification. Notably, our results show that even when using a mean-field variational family, our approach can recover off-diagonal elements of the limiting covariance structure and provide proper uncertainty quantification. In addition, variational bagging is robust to model misspecification, with covariance structures matching those of the target covariance. We illustrate our variational bagging method in numerical studies through applications to parametric models, finite mixture models, deep neural networks, and variational autoencoders (VAEs).

dunson and lin variational, imsart-generic ver, variational, (12 more...)

2511.20594

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Kilic, Afra, Batselier, Kim

A Fully Probabilistic Tensor Network for Regularized Volterra System Identification

Modeling nonlinear systems with Volterra series is challenging because the number of kernel coefficients grows exponentially with the model order. This work introduces Bayesian Tensor Network Volterra kernel machines (BTN-V), extending the Bayesian Tensor Network framework to Volterra system identification. BTN-V represents Volterra kernels using canonical polyadic decomposition, reducing model complexity from O(I^D) to O(DIR). By treating all tensor components and hyperparameters as random variables, BTN-V provides predictive uncertainty estimation at no additional computational cost. Sparsity-inducing hierarchical priors enable automatic rank determination and the learning of fading-memory behavior directly from data, improving interpretability and preventing overfitting. Empirical results demonstrate competitive accuracy, enhanced uncertainty quantification, and reduced computational cost.

factor matrix, identification, regularization, (12 more...)

2511.20457

Country:

Europe > Netherlands > South Holland > Delft (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)

PAC-Bayes Meets Online Contextual Optimization

Xie, Zhuojun, Abdin, Adam, Fang, Yiping

The predict-then-optimize paradigm bridges online learning and contextual optimization in dynamic environments. Previous works have investigated the sequential updating of predictors using feedback from downstream decisions to minimize regret in the full-information settings. However, existing approaches are predominantly frequentist, rely heavily on gradient-based strategies, and employ deterministic predictors that could yield high variance in practice despite their asymptotic guarantees. This work introduces, to the best of our knowledge, the first Bayesian online contextual optimization framework. Grounded in PAC-Bayes theory and general Bayesian updating principles, our framework achieves $\mathcal{O}(\sqrt{T})$ regret for bounded and mixable losses via a Gibbs posterior, eliminates the dependence on gradients through sequential Monte Carlo samplers, and thereby accommodates nondifferentiable problems. Theoretical developments and numerical experiments substantiate our claims.

gibbs posterior, optimization, posterior, (12 more...)

2511.20413

Country:

Europe > France (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Overview (0.67)
Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Data Science > Data Mining (0.94)

Ghattas, Badih, San-Benito, Alvaro Sanchez

Clustering Approaches for Mixed-Type Data: A Comparative Study

Clustering is widely used in unsupervised learning to find homogeneous groups of observations within a dataset. However, clustering mixed-type data remains a challenge, as few existing approaches are suited for this task. This study presents the state-of-the-art of these approaches and compares them using various simulation models. The compared methods include the distance-based approaches k-prototypes, PDQ, and convex k-means, and the probabilistic methods KAy-means for MIxed LArge data (KAMILA), the mixture of Bayesian networks (MBNs), and latent class model (LCM). The aim is to provide insights into the behavior of different methods across a wide range of scenarios by varying some experimental factors such as the number of clusters, cluster overlap, sample size, dimension, proportion of continuous variables in the dataset, and clusters' distribution. The degree of cluster overlap and the proportion of continuous variables in the dataset and the sample size have a significant impact on the observed performances. When strong interactions exist between variables alongside an explicit dependence on cluster membership, none of the evaluated methods demonstrated satisfactory performance. In our experiments KAMILA, LCM, and k-prototypes exhibited the best performance, with respect to the adjusted rand index (ARI). All the methods are available in R.

categorical variable, continuous variable, dataset, (14 more...)

doi: 10.1155/jpas/2242100

2511.19755

Country:

Europe > Austria > Vienna (0.14)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
North America > United States > New Jersey (0.04)

Genre:

Overview (1.00)
Research Report > New Finding (0.66)

Industry: Aerospace & Defense (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Lakhani, Jared N., Pienaar, Etienne

Optimization and Regularization Under Arbitrary Objectives

This study investigates the limitations of applying Markov Chain Monte Carlo (MCMC) methods to arbitrary objective functions, focusing on a two-block MCMC framework which alternates between Metropolis-Hastings and Gibbs sampling. While such approaches are often considered advantageous for enabling data-driven regularization, we show that their performance critically depends on the sharpness of the employed likelihood form. By introducing a sharpness parameter and exploring alternative likelihood formulations proportional to the target objective function, we demonstrate how likelihood curvature governs both in-sample performance and the degree of regularization inferred by the training data. Empirical applications are conducted on reinforcement learning tasks: including a navigation problem and the game of tic-tac-toe. The study concludes with a separate analysis examining the implications of extreme likelihood sharpness on arbitrary objective functions stemming from the classic game of blackjack, where the first block of the two-block MCMC framework is replaced with an iterative optimization step. The resulting hybrid approach achieves performance nearly identical to the original MCMC framework, indicating that excessive likelihood sharpness effectively collapses posterior mass onto a single dominant mode.

likelihood, posterior, regularization, (15 more...)

2511.19628

Country: Africa > South Africa > Western Cape > Cape Town (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Tic-Tac-Toe (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.87)