AITopics | Borgelt, Christian

Collaborating Authors

Borgelt, Christian

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Convolutional Differentiable Logic Gate Networks

Petersen, Felix, Kuehne, Hilde, Borgelt, Christian, Welzel, Julian, Ermon, Stefano

arXiv.org Artificial IntelligenceNov-7-2024

With the increasing inference cost of machine learning models, there is a growing interest in models with fast and efficient inference. Recently, an approach for learning logic gate networks directly via a differentiable relaxation was proposed. Logic gate networks are faster than conventional neural network approaches because their inference only requires logic gate operators such as NAND, OR, and XOR, which are the underlying building blocks of current hardware and can be efficiently executed. We build on this idea, extending it by deep logic gate tree convolutions, logical OR pooling, and residual initializations. This allows scaling logic gate networks up by over one order of magnitude and utilizing the paradigm of convolution. On CIFAR-10, we achieve an accuracy of 86.29% using only 61 million logic gates, which improves over the SOTA while being 29x smaller.

artificial intelligence, logic gate, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2411.04732

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Industry: Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

TrAct: Making First-layer Pre-Activations Trainable

Petersen, Felix, Borgelt, Christian, Ermon, Stefano

arXiv.org Artificial IntelligenceOct-31-2024

We consider the training of the first layer of vision models and notice the clear relationship between pixel values and gradient update magnitudes: the gradients arriving at the weights of a first layer are by definition directly proportional to (normalized) input pixel values. Thus, an image with low contrast has a smaller impact on learning than an image with higher contrast, and a very bright or very dark image has a stronger impact on the weights than an image with moderate brightness. In this work, we propose performing gradient descent on the embeddings produced by the first layer of the model. However, switching to discrete inputs with an embedding layer is not a reasonable option for vision models. Thus, we propose the conceptual procedure of (i) a gradient descent step on first layer activations to construct an activation proposal, and (ii) finding the optimal weights of the first layer, i.e., those weights which minimize the squared distance to the activation proposal. We provide a closed form solution of the procedure and adjust it for robust stochastic training while computing everything efficiently. Empirically, we find that TrAct (Training Activations) speeds up training by factors between 1.25x and 4x while requiring only a small computational overhead. We demonstrate the utility of TrAct with different optimizers for a range of different vision models including convolutional and transformer architectures.

artificial intelligence, machine learning, tract, (15 more...)

arXiv.org Artificial Intelligence

2410.2397

Country: Europe > Spain (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.75)

Add feedback

Newton Losses: Using Curvature Information for Learning with Differentiable Algorithms

Petersen, Felix, Borgelt, Christian, Sutter, Tobias, Kuehne, Hilde, Deussen, Oliver, Ermon, Stefano

arXiv.org Artificial IntelligenceOct-24-2024

When training neural networks with custom objectives, such as ranking losses and shortest-path losses, a common problem is that they are, per se, non-differentiable. A popular approach is to continuously relax the objectives to provide gradients, enabling learning. However, such differentiable relaxations are often non-convex and can exhibit vanishing and exploding gradients, making them (already in isolation) hard to optimize. Here, the loss function poses the bottleneck when training a deep neural network. We present Newton Losses, a method for improving the performance of existing hard to optimize losses by exploiting their second-order information via their empirical Fisher and Hessian matrices. Instead of training the neural network with second-order techniques, we only utilize the loss function's second-order information to replace it by a Newton Loss, while training the network with gradient descent. This makes our method computationally efficient. We apply Newton Losses to eight differentiable algorithms for sorting and shortest-paths, achieving significant improvements for less-optimized differentiable algorithms, and consistent improvements, even for well-optimized differentiable algorithms.

artificial intelligence, deep learning, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2410.19055

Country: Europe (0.46)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Generalizing Stochastic Smoothing for Differentiation and Gradient Estimation

Petersen, Felix, Borgelt, Christian, Mishra, Aashwin, Ermon, Stefano

arXiv.org Machine LearningOct-10-2024

We deal with the problem of gradient estimation for stochastic differentiable relaxations of algorithms, operators, simulators, and other non-differentiable functions. Stochastic smoothing conventionally perturbs the input of a non-differentiable function with a differentiable density distribution with full support, smoothing it and enabling gradient estimation. Our theory starts at first principles to derive stochastic smoothing with reduced assumptions, without requiring a differentiable density nor full support, and we present a general framework for relaxation and gradient estimation of non-differentiable black-box functions $f:\mathbb{R}^n\to\mathbb{R}^m$. We develop variance reduction for gradient estimation from 3 orthogonal perspectives. Empirically, we benchmark 6 distributions and up to 24 variance reduction strategies for differentiable sorting and ranking, differentiable shortest-paths on graphs, differentiable rendering for pose estimation, as well as differentiable cryo-ET simulations.

artificial intelligence, machine learning, rqmc, (13 more...)

arXiv.org Machine Learning

2410.08125

Genre: Research Report (0.63)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Uncertainty Quantification via Stable Distribution Propagation

Petersen, Felix, Mishra, Aashwin, Kuehne, Hilde, Borgelt, Christian, Deussen, Oliver, Yurochkin, Mikhail

arXiv.org Artificial IntelligenceFeb-13-2024

We propose a new approach for propagating stable probability distributions through neural networks. Our method is based on local linearization, which we show to be an optimal approximation in terms of total variation distance for the ReLU non-linearity. This allows propagating Gaussian and Cauchy input uncertainties through neural networks to quantify their output uncertainties. To demonstrate the utility of propagating distributions, we apply the proposed method to predicting calibrated confidence intervals and selective prediction on out-of-distribution data. The results demonstrate a broad applicability of propagating distributions and show the advantages of our method over other approaches such as moment matching.

artificial intelligence, machine learning, neural network, (17 more...)

arXiv.org Artificial Intelligence

2402.08324

Country:

North America > United States > Arizona (0.14)
North America > Puerto Rico (0.14)
Europe > Sweden (0.14)
Europe > Spain (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)

Add feedback

ISAAC Newton: Input-based Approximate Curvature for Newton's Method

Petersen, Felix, Sutter, Tobias, Borgelt, Christian, Huh, Dongsung, Kuehne, Hilde, Sun, Yuekai, Deussen, Oliver

arXiv.org Artificial IntelligenceApr-30-2023

We present ISAAC (Input-baSed ApproximAte Curvature), a novel method that conditions the gradient using selected second-order information and has an asymptotically vanishing computational overhead, assuming a batch size smaller than the number of neurons. We show that it is possible to compute a good conditioner based on only the input to a respective layer without a substantial computational overhead. The proposed method allows effective training even in small-batch stochastic regimes, which makes it competitive to first-order as well as second-order methods. While second-order optimization methods are traditionally much less explored than first-order methods in large-scale machine learning (ML) applications due to their memory requirements and prohibitive computational cost per iteration, they have recently become more popular in ML mainly due to their fast convergence properties when compared to first-order methods [1]. The expensive computation of an inverse Hessian (also known as pre-conditioning matrix) in the Newton step has also been tackled via estimating the curvature from the change in gradients.

approximation, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2305.00604

Country: North America > United States (0.67)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.30)

Add feedback

Learning with Algorithmic Supervision via Continuous Relaxations

Petersen, Felix, Borgelt, Christian, Kuehne, Hilde, Deussen, Oliver

arXiv.org Machine LearningOct-25-2021

The integration of algorithmic components into neural architectures has gained increased attention recently, as it allows training neural networks with new forms of supervision such as ordering constraints or silhouettes instead of using ground truth labels. Many approaches in the field focus on the continuous relaxation of a specific task and show promising results in this context. But the focus on single tasks also limits the applicability of the proposed concepts to a narrow range of applications. In this work, we build on those ideas to propose an approach that allows to integrate algorithms into end-to-end trainable neural network architectures based on a general approximation of discrete conditions. To this end, we relax these conditions in control structures such as conditional statements, loops, and indexing, so that resulting algorithms are smoothly differentiable. To obtain meaningful gradients, each relevant variable is perturbed via logistic distributions and the expectation value under this perturbation is approximated. We evaluate the proposed continuous relaxation model on four challenging tasks and show that it can keep up with relaxations specifically designed for each individual task.

artificial intelligence, machine learning, neural network, (15 more...)

arXiv.org Machine Learning

2110.05651

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback