AITopics

2503.05538

Country:

North America > United States (0.14)
Europe > Netherlands (0.14)

Genre: Research Report > New Finding (0.87)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)

arXiv.org Machine LearningMar-5-2025

Paths and Ambient Spaces in Neural Loss Landscapes

Dold, Daniel, Kobialka, Julius, Palm, Nicolai, Sommer, Emanuel, Rügamer, David, Dürr, Oliver

Understanding the structure of neural network loss surfaces, particularly the emergence of low-loss tunnels, is critical for advancing neural network theory and practice. In this paper, we propose a novel approach to directly embed loss tunnels into the loss landscape of neural networks. Exploring the properties of these loss tunnels offers new insights into their length and structure and sheds light on some common misconceptions. We then apply our approach to Bayesian neural networks, where we improve subspace inference by identifying pitfalls and proposing a more natural prior that better guides the sampling procedure.

artificial intelligence, dataset, machine learning, (16 more...)

2503.03382

Country: Europe (0.14)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Artificial IntelligenceFeb-11-2025

Calibrating LLMs with Information-Theoretic Evidential Deep Learning

Li, Yawei, Rügamer, David, Bischl, Bernd, Rezaei, Mina

Fine-tuned large language models (LLMs) often exhibit overconfidence, particularly when trained on small datasets, resulting in poor calibration and inaccurate uncertainty estimates. Evidential Deep Learning (EDL), an uncertainty-aware approach, enables uncertainty estimation in a single forward pass, making it a promising method for calibrating fine-tuned LLMs. However, despite its computational efficiency, EDL is prone to overfitting, as its training objective can result in overly concentrated probability distributions. To mitigate this, we propose regularizing EDL by incorporating an information bottleneck (IB). Our approach IB-EDL suppresses spurious information in the evidence generated by the model and encourages truly predictive information to influence both the predictions and uncertainty estimates. Extensive experiments across various fine-tuned LLMs and tasks demonstrate that IB-EDL outperforms both existing EDL and non-EDL approaches. By improving the trustworthiness of LLMs, IB-EDL facilitates their broader adoption in domains requiring high levels of confidence calibration. Code is available at https://github.com/sandylaker/ib-edl.

information-theoretic evidential deep learning, large language model, natural language, (2 more...)

2502.06351

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.60)

arXiv.org Artificial IntelligenceFeb-10-2025

Microcanonical Langevin Ensembles: Advancing the Sampling of Bayesian Neural Networks

Sommer, Emanuel, Robnik, Jakob, Nozadze, Giorgi, Seljak, Uros, Rügamer, David

Despite recent advances, sampling-based inference for Bayesian Neural Networks (BNNs) remains a significant challenge in probabilistic deep learning. While sampling-based approaches do not require a variational distribution assumption, current state-of-the-art samplers still struggle to navigate the complex and highly multimodal posteriors of BNNs. As a consequence, sampling still requires considerably longer inference times than non-Bayesian methods even for small neural networks, despite recent advances in making software implementations more efficient. Besides the difficulty of finding high-probability regions, the time until samplers provide sufficient exploration of these areas remains unpredictable. To tackle these challenges, we introduce an ensembling approach that leverages strategies from optimization and a recently proposed sampler called Microcanonical Langevin Monte Carlo (MCLMC) for efficient, robust and predictable sampling performance. Compared to approaches based on the state-of-the-art No-U-Turn Sampler, our approach delivers substantial speedups up to an order of magnitude, while maintaining or improving predictive performance and uncertainty quantification across diverse tasks and data modalities. The suggested Microcanonical Langevin Ensembles and modifications to MCLMC additionally enhance the method's predictability in resource requirements, facilitating easier parallelization. All in all, the proposed method offers a promising direction for practical, scalable inference for BNNs.

artificial intelligence, bayesian inference, machine learning, (16 more...)

2502.06335

Country: North America > United States > Oregon (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Energy (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.88)

arXiv.org Machine LearningFeb-7-2025

Deep Weight Factorization: Sparse Learning Through the Lens of Artificial Symmetries

Kolb, Chris, Weber, Tobias, Bischl, Bernd, Rügamer, David

Sparse regularization techniques are well-established in machine learning, yet their application in neural networks remains challenging due to the non-differentiability of penalties like the $L_1$ norm, which is incompatible with stochastic gradient descent. A promising alternative is shallow weight factorization, where weights are decomposed into two factors, allowing for smooth optimization of $L_1$-penalized neural networks by adding differentiable $L_2$ regularization to the factors. In this work, we introduce deep weight factorization, extending previous shallow approaches to more than two factors. We theoretically establish equivalence of our deep factorization with non-convex sparse regularization and analyze its impact on training dynamics and optimization. Due to the limitations posed by standard training practices, we propose a tailored initialization scheme and identify important learning rate requirements necessary for training factorized networks. We demonstrate the effectiveness of our deep weight factorization through experiments on various architectures and datasets, consistently outperforming its shallow counterpart and widely used pruning methods.

artificial intelligence, deep learning, machine learning, (18 more...)

2502.02496

Country:

Europe (0.14)
North America > United States (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

arXiv.org Artificial IntelligenceJan-28-2025

Can Transformers Learn Full Bayesian Inference in Context?

Reuter, Arik, Rudner, Tim G. J., Fortuin, Vincent, Rügamer, David

Transformers have emerged as the dominant architecture in the field of deep learning, with a broad range of applications and remarkable in-context learning (ICL) capabilities. While not yet fully understood, ICL has already proved to be an intriguing phenomenon, allowing transformers to learn in context -- without requiring further training. In this paper, we further advance the understanding of ICL by demonstrating that transformers can perform full Bayesian inference for commonly used statistical models in context. More specifically, we introduce a general framework that builds on ideas from prior fitted networks and continuous normalizing flows which enables us to infer complex posterior distributions for methods such as generalized linear models and latent factor models. Extensive experiments on real-world datasets demonstrate that our ICL approach yields posterior samples that are similar in quality to state-of-the-art MCMC or variational inference methods not operating in context.

large language model, machine learning, natural language, (15 more...)

2501.16825

Country: North America > United States (0.27)

Genre: Research Report > New Finding (0.92)

Industry: Health & Medicine (0.46)

arXiv.org Machine LearningOct-13-2024

A Functional Extension of Semi-Structured Networks

Rügamer, David, Liew, Bernard X. W., Altai, Zainab, Stöcker, Almond

Semi-structured networks (SSNs) merge the structures familiar from additive models with deep neural networks, allowing the modeling of interpretable partial feature effects while capturing higher-order non-linearities at the same time. A significant challenge in this integration is maintaining the interpretability of the additive model component. Inspired by large-scale biomechanics datasets, this paper explores extending SSNs to functional data. Existing methods in functional data analysis are promising but often not expressive enough to account for all interactions and non-linearities and do not scale well to large datasets. Although the SSN approach presents a compelling potential solution, its adaptation to functional data remains complex. In this work, we propose a functional SSN method that retains the advantageous properties of classical functional regression approaches while also improving scalability. Our numerical experiments demonstrate that this approach accurately recovers underlying signals, enhances predictive performance, and performs favorably compared to competing methods.

artificial intelligence, machine learning, neural network, (20 more...)

2410.0543

Country:

Europe (0.93)
North America > United States (0.28)

Genre:

Research Report > Experimental Study (0.46)
Research Report > Promising Solution (0.34)

Industry:

Health & Medicine > Therapeutic Area > Musculoskeletal (0.93)
Health & Medicine > Therapeutic Area > Neurology (0.93)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.49)

arXiv.org Machine LearningJul-10-2024

How Inverse Conditional Flows Can Serve as a Substitute for Distributional Regression

Kook, Lucas, Kolb, Chris, Schiele, Philipp, Dold, Daniel, Arpogaus, Marcel, Fritz, Cornelius, Baumann, Philipp F., Kopper, Philipp, Pielok, Tobias, Dorigatti, Emilio, Rügamer, David

Neural network representations of simple models, such as linear regression, are being studied increasingly to better understand the underlying principles of deep learning algorithms. However, neural representations of distributional regression models, such as the Cox model, have received little attention so far. We close this gap by proposing a framework for distributional regression using inverse flow transformations (DRIFT), which includes neural representations of the aforementioned models. We empirically demonstrate that the neural representations of models in DRIFT can serve as a substitute for their classical statistical counterparts in several applications involving continuous, ordered, time-series, and survival outcomes. We confirm that models in DRIFT empirically match the performance of several statistical methods in terms of estimation of partial effects, prediction, and aleatoric uncertainty quantification. DRIFT covers both interpretable statistical models and flexible neural networks opening up new avenues in both statistical modeling and deep learning.

artificial intelligence, machine learning, neural network, (17 more...)

2405.05429

Country:

Europe > United Kingdom > England (0.14)
Europe > Spain (0.14)
Asia > China (0.14)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry: Health & Medicine (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceJun-2-2024

Generalizing Orthogonalization for Models with Non-Linearities

Rügamer, David, Kolb, Chris, Weber, Tobias, Kook, Lucas, Nagler, Thomas

The complexity of black-box algorithms can lead to various challenges, including the introduction of biases. These biases present immediate risks in the algorithms' application. It was, for instance, shown that neural networks can deduce racial information solely from a patient's X-ray scan, a task beyond the capability of medical experts. If this fact is not known to the medical expert, automatic decision-making based on this algorithm could lead to prescribing a treatment (purely) based on racial information. While current methodologies allow for the "orthogonalization" or "normalization" of neural networks with respect to such information, existing approaches are grounded in linear models. Our paper advances the discourse by introducing corrections for non-linearities such as ReLU activations. Our approach also encompasses scalar and tensor-valued predictions, facilitating its integration into neural network architectures. Through extensive experiments, we validate our method's effectiveness in safeguarding sensitive data in generalized linear models, normalizing convolutional neural networks for metadata, and rectifying pre-existing embeddings for undesired attributes.

artificial intelligence, machine learning, orthogonalization, (19 more...)

2405.02475

Country: Europe > Austria > Vienna (0.14)

Genre: Research Report > New Finding (0.93)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

arXiv.org Machine LearningMay-25-2024

Position: Why We Must Rethink Empirical Research in Machine Learning

Herrmann, Moritz, Lange, F. Julian D., Eggensperger, Katharina, Casalicchio, Giuseppe, Wever, Marcel, Feurer, Matthias, Rügamer, David, Hüllermeier, Eyke, Boulesteix, Anne-Laure, Bischl, Bernd

In practice, that leads to non-replicable results, makes it may jeopardize applied empirical researchers' confidence findings unreliable, and threatens to undermine in experimental results and discourage them from applying progress in the field. To overcome this alarming ML methods, even though these novel approaches might be situation, we call for more awareness of the beneficial. For example, ML is increasingly being used in plurality of ways of gaining knowledge experimentally the medical domain, and this is often promising in terms of but also of some epistemic limitations.

artificial intelligence, deep learning, machine learning, (15 more...)

2405.022

Country:

Europe (1.00)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine (1.00)
Education (0.93)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)