AITopics | Petersen, Philipp Christian

Collaborating Authors

Petersen, Philipp Christian

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Regularized Gauss-Newton for Optimizing Overparameterized Neural Networks

Adeoye, Adeyemi D., Petersen, Philipp Christian, Bemporad, Alberto

arXiv.org Artificial IntelligenceApr-23-2024

Despite their superior convergence rates compared to first-order methods, (approximate) second-order methods are still rarely used -- and as such, underexplored -- for training large-scale machine learning and neural network (NN) models. This is due to their highly prohibitive computations and memory footprints at each iteration. Some past and recent works have, however, made efforts to reduce this overhead by proposing different approximations to the Hessian of the loss function, which the methods ultimately exploit to achieve their impressive convergence properties (see e.g., [1, 2, 3, 4, 5, 6, 7, 8, 9]). One of the most appealing approximations to the Hessian matrix within the context of practical deep learning and nonlinear optimization in general is the generalized Gauss-Newton (GGN) approximation of [10], which uses a positive semi-definite (PSD) matrix to model the curvature about an arbitrary convex loss function. In fact, the Fisher information matrix (FIM) -- a curvature approximating matrix which most other approximate second-order methods seek to estimate -- is shown to have direct connections with the GGN matrix in many practical cases [4, 11]. Despite its close connection with the GGN matrix, the FIM, unlike the GGN matrix, potentially leads to over-approximating the second-order terms in more general loss functions, throwing away relevant curvature information [10]. In addition to the desirable property of maintaining positive-definiteness throughout the training procedure, other nice properties of the GGN matrix, in comparison with the Hessian matrix, are discussed in [12, Section 8.1]; see also [13] for discussions in the context of nonlinear least-squares estimation and [14] for efficient training of (deep) recurrent neural networks with a GGN approach.

artificial intelligence, machine learning, regularized gauss-newton, (14 more...)

arXiv.org Artificial Intelligence

2404.14875

Country: Europe > Austria > Vienna (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback

Efficient Learning Using Spiking Neural Networks Equipped With Affine Encoders and Decoders

Neuman, A. Martina, Petersen, Philipp Christian

arXiv.org Machine LearningApr-6-2024

Deep learning [6, 29] is a technology that has revolutionized many areas of modern life. The term describes the gradient-based training of deep neural networks. Since its breakthrough in image classification in 2012 [28], deep learning is essentially the only viable technology for this application. Moreover, it is the basis of multiple recent breakthroughs in science [25] and even mathematical research [14]. Recently, deep learning has received wide public attention through the advent of generative AI in the form of large language models such as ChatGPT [39]. It is well-documented that deep learning in modern applications can have extreme requirements on computational resources and the hardware requirements scale in an unsustainable way [52]. In constrained settings, this can become a serious bottleneck preventing the employment of deep learning methods. In addition, these comprehensive computations come with an immense environmental cost.

artificial intelligence, machine learning, neural network, (19 more...)

arXiv.org Machine Learning

2404.04549

Country:

Europe (0.46)
North America > United States (0.28)

Genre: Research Report (0.81)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

Add feedback

Limitations of neural network training due to numerical instability of backpropagation

Karner, Clemens, Kazeev, Vladimir, Petersen, Philipp Christian

arXiv.org Machine LearningNov-15-2023

Deep learning is a machine learning technique based on artificial neural networks which are trained by gradient-based methods and which have a large number of layers. This technique has been tremendously successful in a wide range of applications [26, 24, 44, 41]. Of particular interest for applied mathematicians are recent developments in which deep neural networks are applied to tasks of numerical analysis such as the numerical solution of inverse problems [1, 34, 27, 20, 38] or of (parametric) partial differential equations [7, 12, 39, 9, 40, 25, 29, 3]. The appeal of deep neural networks for these applications is due to their exceptional efficiency in representing functions from several approximation classes that underlie well-established numerical methods. In terms of approximation accuracy with respect to the number of approximation parameters, deep neural networks have been theoretically proven to achieve approximation rates that are at least as good as those of finite elements [15, 35, 30], local Taylor polynomials or splines [47, 11], wavelets [42] and, more generally, affine systems [5]. In the sequel, we consider neural networks with the rectified-linear-unit (ReLU) activation function, which is standard in most applications. In this case, the neural-network approximations are piecewiseaffine functions. We point out that all state-of-the-art results on the rates of approximation with deep ReLU neural networks that achieve higher order polynomial approximation rates are based on explicit constructions with the number of affine pieces growing exponentially with respect to the number of layers; see, e.g., [47, 46]. In this work, we argue that this central building block, functions with exponentially many affine pieces, cannot be learned with the state-of-the-art techniques.

artificial intelligence, machine learning, neural network, (17 more...)

arXiv.org Machine Learning

2210.00805

Country: Europe > Austria > Vienna (0.14)

Genre: Research Report > New Finding (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Deep neural networks can stably solve high-dimensional, noisy, non-linear inverse problems

Pineda, Andrés Felipe Lerma, Petersen, Philipp Christian

arXiv.org Machine LearningOct-20-2023

We study the problem of reconstructing solutions of inverse problems when only noisy measurements are available. We assume that the problem can be modeled with an infinite-dimensional forward operator that is not continuously invertible. Then, we restrict this forward operator to finite-dimensional spaces so that the inverse is Lipschitz continuous. For the inverse operator, we demonstrate that there exists a neural network which is a robust-to-noise approximation of the operator. In addition, we show that these neural networks can be learned from appropriately perturbed training data. We demonstrate the admissibility of this approach to a wide range of inverse problems of practical interest. Numerical examples are given that support the theoretical findings.

artificial intelligence, inverse problem, machine learning, (15 more...)

arXiv.org Machine Learning

2206.00934

Country: Europe > United Kingdom > England (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Mathematical Capabilities of ChatGPT

Frieder, Simon, Pinchetti, Luca, Chevalier, Alexis, Griffiths, Ryan-Rhys, Salvatori, Tommaso, Lukasiewicz, Thomas, Petersen, Philipp Christian, Berner, Julius

arXiv.org Artificial IntelligenceJul-20-2023

We investigate the mathematical capabilities of two iterations of ChatGPT (released 9-January-2023 and 30-January-2023) and of GPT-4 by testing them on publicly available datasets, as well as hand-crafted ones, using a novel methodology. In contrast to formal mathematics, where large databases of formal proofs are available (e.g., the Lean Mathematical Library), current datasets of natural-language mathematics, used to benchmark language models, either cover only elementary mathematics or are very small. We address this by publicly releasing two new datasets: GHOSTS and miniGHOSTS. These are the first natural-language datasets curated by working researchers in mathematics that (1) aim to cover graduate-level mathematics, (2) provide a holistic overview of the mathematical capabilities of language models, and (3) distinguish multiple dimensions of mathematical reasoning. These datasets also test whether ChatGPT and GPT-4 can be helpful assistants to professional mathematicians by emulating use cases that arise in the daily professional activities of mathematicians. We benchmark the models on a range of fine-grained performance metrics. For advanced mathematics, this is the most detailed evaluation effort to date. We find that ChatGPT can be used most successfully as a mathematical assistant for querying facts, acting as a mathematical search engine and knowledge base interface. GPT-4 can additionally be used for undergraduate-level mathematics but fails on graduate-level difficulty. Contrary to many positive reports in the media about GPT-4 and ChatGPT's exam-solving abilities (a potential case of selection bias), their overall mathematical performance is well below the level of a graduate student. Hence, if your goal is to use ChatGPT to pass a graduate-level math exam, you would be better off copying from your average peer!

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2301.13867

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.27)

Genre: Research Report (1.00)

Industry:

Law (1.00)
Information Technology (0.92)
Education > Educational Setting > Higher Education (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

VC dimensions of group convolutional neural networks

Petersen, Philipp Christian, Sepliarskaia, Anna

arXiv.org Artificial IntelligenceDec-19-2022

Due to impressive results in image recognition, convolutional neural networks (CNNs) have become one of the most widely-used neural network architectures [12, 13]. It is believed that one of the main reasons for the efficiency of CNNs is their ability to convert translation symmetry of the data into a built-in translationequivariance property of the neural network without exhausting the data to learn the equivariance [4, 15]. Based on this intuition, other data symmetries have recently been incorporated into neural network architectures. Group convolutional neural networks (G-CNNs) are a natural generalization of CNNs that can be equivariant with respect to rotation [5, 24, 23, 9], scale [21, 20, 1], and other symmetries defined by matrix groups [7]. Moreover, every neural network that is equivariant to the action of a group on its input is a G-CNN, where the convolutions are with respect to the group, [11] (see Theorem 2.10 below).

artificial intelligence, machine learning, neural network, (18 more...)

arXiv.org Artificial Intelligence

2212.09507

Country: North America > United States (0.28)

Genre: Research Report (0.81)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.81)

Add feedback

The Oracle of DLphi

Alfke, Dominik, Baines, Weston, Blechschmidt, Jan, Sarmina, Mauricio J. del Razo, Drory, Amnon, Elbrächter, Dennis, Farchmin, Nando, Gambara, Matteo, Glas, Silke, Grohs, Philipp, Hinz, Peter, Kivaranovic, Danijel, Kümmerle, Christian, Kutyniok, Gitta, Lunz, Sebastian, Macdonald, Jan, Malthaner, Ryan, Naisat, Gregory, Neufeld, Ariel, Petersen, Philipp Christian, Reisenhofer, Rafael, Sheng, Jun-Da, Thesing, Laura, Trunschke, Philipp, von Lindheim, Johannes, Weber, David, Weber, Melanie

arXiv.org Machine LearningJan-27-2019

This paper takes aim at achieving nothing less than the impossible. To be more precise, we seek to predict labels of unknown data from entirely uncorrelated labelled training data. This will be accomplished by an application of an algorithm based on deep learning, as well as, by invoking one of the most fundamental concepts of set theory. Estimating the behaviour of a system in unknown situations is one of the central problems of humanity. Indeed, we are constantly trying to produce predictions for future events to be able to prepare ourselves.

deep learning, neural network, training data, (18 more...)

arXiv.org Machine Learning

1901.05744

Country:

North America > United States (0.28)
Europe > United Kingdom > England (0.14)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback