AITopics | Krämer, Michael

Collaborating Authors

Krämer, Michael

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

From Kernels to Features: A Multi-Scale Adaptive Theory of Feature Learning

Rubin, Noa, Fischer, Kirsten, Lindner, Javed, Dahmen, David, Seroussi, Inbar, Ringel, Zohar, Krämer, Michael, Helias, Moritz

arXiv.org Machine LearningFeb-5-2025

Theoretically describing feature learning in neural networks is crucial for understanding their expressive power and inductive biases, motivating various approaches. Some approaches describe network behavior after training through a simple change in kernel scale from initialization, resulting in a generalization power comparable to a Gaussian process. Conversely, in other approaches training results in the adaptation of the kernel to the data, involving complex directional changes to the kernel. While these approaches capture different facets of network behavior, their relationship and respective strengths across scaling regimes remains an open question. This work presents a theoretical framework of multi-scale adaptive feature learning bridging these approaches. Using methods from statistical mechanics, we derive analytical expressions for network output statistics which are valid across scaling regimes and in the continuum between them. A systematic expansion of the network's probability distribution reveals that mean-field scaling requires only a saddle-point approximation, while standard scaling necessitates additional correction terms. Remarkably, we find across regimes that kernel adaptation can be reduced to an effective kernel rescaling when predicting the mean network output of a linear network. However, even in this case, the multi-scale adaptive approach captures directional feature learning effects, providing richer insights than what could be recovered from a rescaling of the kernel alone.

artificial intelligence, machine learning, multi-scale adaptive theory, (16 more...)

arXiv.org Machine Learning

2502.0321

Country:

Asia > Middle East > Israel (0.28)
North America > United States > Texas > Clay County (0.25)

Genre: Research Report (0.65)

Industry:

Information Technology > Networks (0.68)
Telecommunications > Networks (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Large Physics Models: Towards a collaborative approach with Large Language Models and Foundation Models

Barman, Kristian G., Caron, Sascha, Sullivan, Emily, de Regt, Henk W., de Austri, Roberto Ruiz, Boon, Mieke, Färber, Michael, Fröse, Stefan, Hasibi, Faegheh, Ipp, Andreas, Kapoor, Rukshak, Kasieczka, Gregor, Kostić, Daniel, Krämer, Michael, Golling, Tobias, Lopez, Luis G., Marco, Jesus, Otten, Sydney, Pawlowski, Pawel, Vischia, Pietro, Weber, Erik, Weniger, Christoph

arXiv.org Artificial IntelligenceJan-9-2025

This paper explores ideas and provides a potential roadmap for the development and evaluation of physics-specific large-scale AI models, which we call Large Physics Models (LPMs). These models, based on foundation models such as Large Language Models (LLMs) - trained on broad data - are tailored to address the demands of physics research. LPMs can function independently or as part of an integrated framework. This framework can incorporate specialized tools, including symbolic reasoning modules for mathematical manipulations, frameworks to analyse specific experimental and simulated data, and mechanisms for synthesizing theories and scientific literature. We begin by examining whether the physics community should actively develop and refine dedicated models, rather than relying solely on commercial LLMs. We then outline how LPMs can be realized through interdisciplinary collaboration among experts in physics, computer science, and philosophy of science. To integrate these models effectively, we identify three key pillars: Development, Evaluation, and Philosophical Reflection. Development focuses on constructing models capable of processing physics texts, mathematical formulations, and diverse physical data. Evaluation assesses accuracy and reliability by testing and benchmarking. Finally, Philosophical Reflection encompasses the analysis of broader implications of LLMs in physics, including their potential to generate new scientific understanding and what novel collaboration dynamics might arise in research. Inspired by the organizational structure of experimental collaborations in particle physics, we propose a similarly interdisciplinary and collaborative approach to building and refining Large Physics Models. This roadmap provides specific objectives, defines pathways to achieve them, and identifies challenges that must be addressed to realise physics-specific large scale AI models.

arxiv preprint arxiv, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2501.05382

Country:

Europe > Germany (0.28)
Europe > Netherlands > South Holland (0.14)
Europe > United Kingdom > England (0.14)

Genre: Research Report > Promising Solution (0.46)

Industry:

Information Technology (0.93)
Health & Medicine > Pharmaceuticals & Biotechnology (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Aspen Open Jets: Unlocking LHC Data for Foundation Models in Particle Physics

Amram, Oz, Anzalone, Luca, Birk, Joschka, Faroughy, Darius A., Hallin, Anna, Kasieczka, Gregor, Krämer, Michael, Pang, Ian, Reyes-Gonzalez, Humberto, Shih, David

arXiv.org Machine LearningDec-13-2024

Foundation models are deep learning models pre-trained on large amounts of data which are capable of generalizing to multiple datasets and/or downstream tasks. This work demonstrates how data collected by the CMS experiment at the Large Hadron Collider can be useful in pre-training foundation models for HEP. Specifically, we introduce the AspenOpenJets dataset, consisting of approximately 180M high $p_T$ jets derived from CMS 2016 Open Data. We show how pre-training the OmniJet-$\alpha$ foundation model on AspenOpenJets improves performance on generative tasks with significant domain shift: generating boosted top and QCD jets from the simulated JetClass dataset. In addition to demonstrating the power of pre-training of a jet-based foundation model on actual proton-proton collision data, we provide the ML-ready derived AspenOpenJets dataset for further public use.

artificial intelligence, deep learning, machine learning, (15 more...)

arXiv.org Machine Learning

2412.10504

Country:

North America > United States (1.00)
Europe (0.69)

Genre: Research Report (0.64)

Industry:

Energy (0.68)
Government > Regional Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A theory of data variability in Neural Network Bayesian inference

Lindner, Javed, Dahmen, David, Krämer, Michael, Helias, Moritz

arXiv.org Machine LearningNov-9-2023

Bayesian inference and kernel methods are well established in machine learning. The neural network Gaussian process in particular provides a concept to investigate neural networks in the limit of infinitely wide hidden layers by using kernel and inference methods. Here we build upon this limit and provide a field-theoretic formalism which covers the generalization properties of infinitely wide networks. We systematically compute generalization properties of linear, non-linear, and deep non-linear networks for kernel matrices with heterogeneous entries. In contrast to currently employed spectral methods we derive the generalization properties from the statistical properties of the input, elucidating the interplay of input dimensionality, size of the training data set, and variability of the data. We show that data variability leads to a non-Gaussian action reminiscent of a ($\varphi^3+\varphi^4$)-theory. Using our formalism on a synthetic task and on MNIST we obtain a homogeneous kernel matrix approximation for the learning curve as well as corrections due to data variability which allow the estimation of the generalization properties and exact results for the bounds of the learning curves in the case of infinitely many training data points.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Machine Learning

2307.16695

Country:

North America > United States (0.45)
Europe > Germany (0.28)
Europe > United Kingdom > England (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.45)

Add feedback

Unified Field Theory for Deep and Recurrent Neural Networks

Segadlo, Kai, Epping, Bastian, van Meegen, Alexander, Dahmen, David, Krämer, Michael, Helias, Moritz

arXiv.org Machine LearningJan-7-2022

Understanding capabilities and limitations of different network architectures is of fundamental importance to machine learning. Bayesian inference on Gaussian processes has proven to be a viable approach for studying recurrent and deep networks in the limit of infinite layer width, $n\to\infty$. Here we present a unified and systematic derivation of the mean-field theory for both architectures that starts from first principles by employing established methods from statistical physics of disordered systems. The theory elucidates that while the mean-field equations are different with regard to their temporal structure, they yet yield identical Gaussian kernels when readouts are taken at a single time point or layer, respectively. Bayesian inference applied to classification then predicts identical performance and capabilities for the two architectures. Numerically, we find that convergence towards the mean-field theory is typically slower for recurrent networks than for deep networks and the convergence speed depends non-trivially on the parameters of the weight prior as well as the depth or number of time steps, respectively. Our method exposes that Gaussian processes are but the lowest order of a systematic expansion in $1/n$. The formalism thus paves the way to investigate the fundamental differences between recurrent and deep architectures at finite widths $n$.

artificial intelligence, machine learning, rnn, (19 more...)

arXiv.org Machine Learning

2112.05589

Country:

North America > United States (0.28)
Europe > Germany > North Rhine-Westphalia > Cologne Region (0.14)

Genre: Research Report (0.50)

Industry: Government > Regional Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback