AITopics

2501.18059

Country:

Europe > Switzerland > Basel-City > Basel (0.04)
North America > United States > Virginia > Arlington County > Arlington (0.04)
North America > United States > New York > New York County > New York City (0.04)
(7 more...)

Genre: Research Report > New Finding (0.45)

Industry:

Information Technology (1.00)
Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(5 more...)

Loaiza-Ganem, Gabriel, Villecroze, Valentin, Wang, Yixin

Deep Ensembles Secretly Perform Empirical Bayes

arXiv.org Machine LearningJan-29-2025

Quantifying uncertainty in neural networks is a highly relevant problem which is essential to many applications. The two predominant paradigms to tackle this task are Bayesian neural networks (BNNs) and deep ensembles. Despite some similarities between these two approaches, they are typically surmised to lack a formal connection and are thus understood as fundamentally different. BNNs are often touted as more principled due to their reliance on the Bayesian paradigm, whereas ensembles are perceived as more ad-hoc; yet, deep ensembles tend to empirically outperform BNNs, with no satisfying explanation as to why this is the case. In this work we bridge this gap by showing that deep ensembles perform exact Bayesian averaging with a posterior obtained with an implicitly learned data-dependent prior. In other words deep ensembles are Bayesian, or more specifically, they implement an empirical Bayes procedure wherein the prior is learned from the data. This perspective offers two main benefits: (i) it theoretically justifies deep ensembles and thus provides an explanation for their strong empirical performance; and (ii) inspection of the learned prior reveals it is given by a mixture of point masses -- the use of such a strong prior helps elucidate observed phenomena about ensembles. Overall, our work delivers a newfound understanding of deep ensembles which is not only of interest in it of itself, but which is also likely to generate future insights that drive empirical improvements for these models.

artificial intelligence, machine learning, natural language, (13 more...)

arXiv.org Machine Learning

2501.17917

Country:

North America > United States > California (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > Michigan (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine > Diagnostic Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Modi, Mirage, Sule, Shashank, Palumbo, Jonathan, Rozowski, Michael, Bouhrara, Mustapha, Czaja, Wojciech, Spencer, Richard G.

Input layer regularization and automated regularization hyperparameter tuning for myelin water estimation using deep learning

arXiv.org Machine LearningJan-29-2025

We propose a novel deep learning method which combines classical regularization with data augmentation for estimating myelin water fraction (MWF) in the brain via biexponential analysis. Our aim is to design an accurate deep learning technique for analysis of signals arising in magnetic resonance relaxometry. In particular, we study the biexponential model, one of the signal models used for MWF estimation. We greatly extend our previous work on \emph{input layer regularization (ILR)} in several ways. We now incorporate optimal regularization parameter selection via a dedicated neural network or generalized cross validation (GCV) on a signal-by-signal, or pixel-by-pixel, basis to form the augmented input signal, and now incorporate estimation of MWF, rather than just exponential time constants, into the analysis. On synthetically generated data, our proposed deep learning architecture outperformed both classical methods and a conventional multi-layer perceptron. On in vivo brain data, our architecture again outperformed other comparison methods, with GCV proving to be somewhat superior to a NN for regularization parameter selection. Thus, ILR improves estimation of MWF within the biexponential model. In addition, classical methods such as GCV may be combined with deep learning to optimize MWF imaging in the human brain.

artificial intelligence, estimation, machine learning, (18 more...)

arXiv.org Machine Learning

2501.18074

Country:

North America > United States > Maryland > Prince George's County > College Park (0.04)
Europe > Netherlands (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Yoo, Se-Wook, Seo, Seung-Woo

DIAL: Distribution-Informed Adaptive Learning of Multi-Task Constraints for Safety-Critical Systems

arXiv.org Artificial IntelligenceJan-29-2025

Safe reinforcement learning has traditionally relied on predefined constraint functions to ensure safety in complex real-world tasks, such as autonomous driving. However, defining these functions accurately for varied tasks is a persistent challenge. Recent research highlights the potential of leveraging pre-acquired task-agnostic knowledge to enhance both safety and sample efficiency in related tasks. Building on this insight, we propose a novel method to learn shared constraint distributions across multiple tasks. Our approach identifies the shared constraints through imitation learning and then adapts to new tasks by adjusting risk levels within these learned distributions. This adaptability addresses variations in risk sensitivity stemming from expert-specific biases, ensuring consistent adherence to general safety principles even with imperfect demonstrations. Our method can be applied to control and navigation domains, including multi-task and meta-task scenarios, accommodating constraints such as maintaining safe distances or adhering to speed limits. Experimental results validate the efficacy of our approach, demonstrating superior safety performance and success rates compared to baselines, all without requiring task-specific constraint definitions. These findings underscore the versatility and practicality of our method across a wide range of real-world tasks.

constraint, machine learning, reinforcement learning, (16 more...)

2501.18086

Country: North America > United States (0.46)

Genre:

Research Report > New Finding (0.66)
Research Report > Promising Solution (0.48)

Industry: Transportation > Ground > Road (0.66)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(3 more...)

Swaroop, Siddharth, Khan, Mohammad Emtiyaz, Doshi-Velez, Finale

Connecting Federated ADMM to Bayes

arXiv.org Machine LearningJan-28-2025

We provide new connections between two distinct federated learning approaches based on (i) ADMM and (ii) Variational Bayes (VB), and propose new variants by combining their complementary strengths. Specifically, we show that the dual variables in ADMM naturally emerge through the "site" parameters used in VB with isotropic Gaussian covariances. Using this, we derive two versions of ADMM from VB that use flexible covariances and functional regularisation, respectively. Through numerical experiments, we validate the improvements obtained in performance. The work shows connection between two fields that are believed to be fundamentally different and combines them to improve federated learning. The goal of federated learning is to train a global model in the central server by using the data distributed over many local clients (McMahan et al., 2016). Such distributed learning improves privacy, security, and robustness, but is challenging due to frequent communication needed to synchronise training among nodes. This is especially true when the data quality differs drastically from client to client and needs to be appropriately weighted. Designing new methods to deal with such challenges is an active area of research in federated learning. We focus on two distinct federated-learning approaches based on the Alternating Direction Method of Multipliers (ADMM) and Variational Bayes (VB), respectively. The ADMM approach synchronises the global and local models by using constrained optimisation and updates both primal and dual variables simultaneously.

artificial intelligence, bayesian inference, machine learning, (19 more...)

arXiv.org Machine Learning

2501.17325

Country:

Europe > Austria > Vienna (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Virginia (0.04)
Asia > Japan (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Reuter, Arik, Rudner, Tim G. J., Fortuin, Vincent, Rügamer, David

Can Transformers Learn Full Bayesian Inference in Context?

Transformers have emerged as the dominant architecture in the field of deep learning, with a broad range of applications and remarkable in-context learning (ICL) capabilities. While not yet fully understood, ICL has already proved to be an intriguing phenomenon, allowing transformers to learn in context -- without requiring further training. In this paper, we further advance the understanding of ICL by demonstrating that transformers can perform full Bayesian inference for commonly used statistical models in context. More specifically, we introduce a general framework that builds on ideas from prior fitted networks and continuous normalizing flows which enables us to infer complex posterior distributions for methods such as generalized linear models and latent factor models. Extensive experiments on real-world datasets demonstrate that our ICL approach yields posterior samples that are similar in quality to state-of-the-art MCMC or variational inference methods not operating in context.

artificial intelligence, bayesian inference, machine learning, (13 more...)

2501.16825

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
North America > United States > New York (0.04)
North America > United States > Michigan (0.04)

Genre: Research Report > New Finding (0.92)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Erden, Zeki Doruk, Faltings, Boi

Agential AI for Integrated Continual Learning, Deliberative Behavior, and Comprehensible Models

Contemporary machine learning paradigm excels in statistical data analysis, solving problems that classical AI couldn't. However, it faces key limitations, such as a lack of integration with planning, incomprehensible internal structure, and inability to learn continually. We present the initial design for an AI system, Agential AI (AAI), in principle operating independently or on top of statistical methods, designed to overcome these issues. AAI's core is a learning method that models temporal dynamics with guarantees of completeness, minimality, and continual learning, using component-level variation and selection to learn the structure of the environment. It integrates this with a behavior algorithm that plans on a learned model and encapsulates high-level behavior patterns. Preliminary experiments on a simple environment show AAI's effectiveness and potential.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

2501.16922

Country:

Europe > Switzerland > Vaud > Lausanne (0.04)
Africa > South Sudan > Equatoria > Central Equatoria > Juba (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(5 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Mohr, Felix, van Rijn, Jan N.

Learning Curves for Decision Making in Supervised Machine Learning: A Survey

Learning curves are a concept from social sciences that has been adopted in the context of machine learning to assess the performance of a learning algorithm with respect to a certain resource, e.g., the number of training examples or the number of training iterations. Learning curves have important applications in several machine learning contexts, most notably in data acquisition, early stopping of model training, and model selection. For instance, learning curves can be used to model the performance of the combination of an algorithm and its hyperparameter configuration, providing insights into their potential suitability at an early stage and often expediting the algorithm selection process. Various learning curve models have been proposed to use learning curves for decision making. Some of these models answer the binary decision question of whether a given algorithm at a certain budget will outperform a certain reference performance, whereas more complex models predict the entire learning curve of an algorithm. We contribute a framework that categorises learning curve approaches using three criteria: the decision-making situation they address, the intrinsic learning curve question they answer and the type of resources they use. We survey papers from the literature and classify them into this framework.

artificial intelligence, learner, machine learning, (18 more...)

doi: 10.1007/s10994-024-06619-7

2201.1215

Country:

Europe > Netherlands > South Holland > Leiden (0.04)
Asia > Middle East > Jordan (0.04)
South America > Colombia > Cundinamarca Department (0.04)
North America > United States > Wisconsin (0.04)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Research Report > Experimental Study (0.67)

Industry:

Education (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.45)
Health & Medicine > Therapeutic Area (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
(3 more...)

RAINER: A Robust Ensemble Learning Grid Search-Tuned Framework for Rainfall Patterns Prediction

Li, Zhenqi, Zhong, Junhao, Wang, Hewei, Xu, Jinfeng, Li, Yijie, You, Jinjiang, Zhang, Jiayi, Wu, Runzhi, Dev, Soumyabrata

Rainfall prediction remains a persistent challenge due to the highly nonlinear and complex nature of meteorological data. Existing approaches lack systematic utilization of grid search for optimal hyperparameter tuning, relying instead on heuristic or manual selection, frequently resulting in sub-optimal results. Additionally, these methods rarely incorporate newly constructed meteorological features such as differences between temperature and humidity to capture critical weather dynamics. Furthermore, there is a lack of systematic evaluation of ensemble learning techniques and limited exploration of diverse advanced models introduced in the past one or two years. To address these limitations, we propose a robust ensemble learning grid search-tuned framework (RAINER) for rainfall prediction. RAINER incorporates a comprehensive feature engineering pipeline, including outlier removal, imputation of missing values, feature reconstruction, and dimensionality reduction via Principal Component Analysis (PCA). The framework integrates novel meteorological features to capture dynamic weather patterns and systematically evaluates non-learning mathematical-based methods and a variety of machine learning models, from weak classifiers to advanced neural networks such as Kolmogorov-Arnold Networks (KAN). By leveraging grid search for hyperparameter tuning and ensemble voting techniques, RAINER achieves promising results within real-world datasets.

artificial intelligence, data mining, machine learning, (19 more...)

2501.169

Country:

Oceania > Australia (0.28)
North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(7 more...)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.46)

Industry:

Energy > Renewable (0.68)
Education (0.66)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
(2 more...)

Neural Information Processing SystemsJan-27-2025, 13:08:06 GMT

Review for NeurIPS paper: Adaptive Experimental Design with Temporal Interference: A Maximum Likelihood Approach

Weaknesses: - Can we interpret the results as follows: If the TAR assumption is satisfied with positive limits, and we use MLE, then temporal interference does not cause bias. If this interpretation is correct, then it would be illuminating if the authors provide the intuitive connection between the TAR assumption and temporal interference. It is not clear if the estimations that the authors have required are feasible if the state space is large. The next natural question is how robust the results are if we use other methods for estimation. This could have been shown by providing some simulations, which is a part missing from the manuscript.

adaptive experimental design, maximum likelihood approach, temporal interference, (3 more...)

Neural Information Processing Systems

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.40)