Perceptrons
MLPGradientFlow: going with the flow of multilayer perceptrons (and finding minima fast and accurately)
Brea, Johanni, Martinelli, Flavio, Şimşek, Berfin, Gerstner, Wulfram
MLPGradientFlow is a software package to solve numerically the gradient flow differential equation $\dot \theta = -\nabla \mathcal L(\theta; \mathcal D)$, where $\theta$ are the parameters of a multi-layer perceptron, $\mathcal D$ is some data set, and $\nabla \mathcal L$ is the gradient of a loss function. We show numerically that adaptive first- or higher-order integration methods based on Runge-Kutta schemes have better accuracy and convergence speed than gradient descent with the Adam optimizer. However, we find Newton's method and approximations like BFGS preferable to find fixed points (local and global minima of $\mathcal L$) efficiently and accurately. For small networks and data sets, gradients are usually computed faster than in pytorch and Hessian are computed at least $5\times$ faster. Additionally, the package features an integrator for a teacher-student setup with bias-free, two-layer networks trained with standard Gaussian input in the limit of infinite data. The code is accessible at https://github.com/jbrea/MLPGradientFlow.jl.
AQuaMaM: An Autoregressive, Quaternion Manifold Model for Rapidly Estimating Complex SO(3) Distributions
Accurately modeling complex, multimodal distributions is necessary for optimal decision-making, but doing so for rotations in three-dimensions, i.e., the SO(3) group, is challenging due to the curvature of the rotation manifold. The recently described implicit-PDF (IPDF) is a simple, elegant, and effective approach for learning arbitrary distributions on SO(3) up to a given precision. However, inference with IPDF requires $N$ forward passes through the network's final multilayer perceptron (where $N$ places an upper bound on the likelihood that can be calculated by the model), which is prohibitively slow for those without the computational resources necessary to parallelize the queries. In this paper, I introduce AQuaMaM, a neural network capable of both learning complex distributions on the rotation manifold and calculating exact likelihoods for query rotations in a single forward pass. Specifically, AQuaMaM autoregressively models the projected components of unit quaternions as mixtures of uniform distributions that partition their geometrically-restricted domain of values. When trained on an "infinite" toy dataset with ambiguous viewpoints, AQuaMaM rapidly converges to a sampling distribution closely matching the true data distribution. In contrast, the sampling distribution for IPDF dramatically diverges from the true data distribution, despite IPDF approaching its theoretical minimum evaluation loss during training. When trained on a constructed dataset of 500,000 renders of a die in different rotations, AQuaMaM reaches a test log-likelihood 14% higher than IPDF. Further, compared to IPDF, AQuaMaM uses 24% fewer parameters, has a prediction throughput 52$\times$ faster on a single GPU, and converges in a similar amount of time during training.
Predicting the Masses of Exotic Hadrons with Data Augmentation Using Multilayer Perceptron
Recently, there have been significant developments in neural networks, which led to the frequent use of neural networks in the physics literature. This work is focused on predicting the masses of exotic hadrons, doubly charmed and bottomed baryons using neural networks trained on meson and baryon masses that are determined by experiments. The original data set has been extended using the recently proposed artificial data augmentation methods. We have observed that the neural network's predictive ability increases with the use of augmented data. The results indicated that data augmentation techniques play an essential role in improving neural network predictions; moreover, neural networks can make reasonable predictions for exotic hadrons, doubly charmed, and doubly bottomed baryons. The results are also comparable to Gaussian Process and Constituent Quark Model.
Holistically Explainable Vision Transformers
Böhle, Moritz, Fritz, Mario, Schiele, Bernt
Transformers increasingly dominate the machine learning landscape across many tasks and domains, which increases the importance for understanding their outputs. While their attention modules provide partial insight into their inner workings, the attention scores have been shown to be insufficient for explaining the models as a whole. To address this, we propose B-cos transformers, which inherently provide holistic explanations for their decisions. Specifically, we formulate each model component--such as the multi-layer perceptrons, attention layers, and the tokenisation module--to be dynamic linear, which allows us to faithfully summarise the entire transformer via a single linear transform. We apply our proposed design to Vision Transformers (ViTs) and show that the resulting models, dubbed Bcos-ViTs, are highly interpretable and perform competitively to baseline ViTs on ImageNet. Code will be made available soon. However, recently they are often surpassed by transformers (Vaswani et al., 2017), which-- if the current development is any indication-- will replace CNNs for ever more tasks and domains. Transformers are thus bound to impact many aspects of our lives: from healthcare, over judicial decisions, to autonomous driving. Given the sensitive nature of such areas, it is of utmost importance to ensure that we can explain the underlying models, which still remains a challenge for transformers. To explain transformers, prior work often focused on the models' attention layers (Jain & Wallace, 2019; Serrano & Smith, 2019; Abnar & Zuidema, 2020; Barkan et al., 2021), as they inherently compute their output in an interpretable manner. For a detailed discussion, see supplement. These model components are given by: a tokenisation module, a mechanism for providing positional information to the model, multi-layer perceptrons (MLPs), as well as normalisation and attention layers, see Figure 1a. By addressing the interpretability of each component individually, we obtain transformers that inherently explain their decisions, see, for example Figure 1 and Figure 1b. In detail, our approach is based on the idea of designing each component to be dynamic linear, such that it computes an input-dependent linear transform. Böhle et al. (2021; 2022), s.t. it can be summarised by a single linear transform for each input.
An Artificial Intelligence-based model for cell killing prediction: development, validation and explainability analysis of the ANAKIN model
Cordoni, Francesco G., Missiaggia, Marta, Scifoni, Emanuele, La Tessa, Chiara
The present work develops ANAKIN: an Artificial iNtelligence bAsed model for (radiation induced) cell KIlliNg prediction. ANAKIN is trained and tested over 513 cell survival experiments with different types of radiation contained in the publicly available PIDE database. We show how ANAKIN accurately predicts several relevant biological endpoints over a wide broad range on ions beams and for a high number of cell--lines. We compare the prediction of ANAKIN to the only two radiobiological model for RBE prediction used in clinics, that is the Microdosimetric Kinetic Model (MKM) and the Local Effect Model (LEM version III), showing how ANAKIN has higher accuracy over the all considered biological endpoints. At last, via modern techniques of Explainable Artificial Intelligence (XAI), we show how ANAKIN predictions can be understood and explained, highlighting how ANAKIN is in fact able to reproduce relevant well-known biological patterns, such as the overkilling effect.
Perceptron: Let's break down thinking into calculations
Perceptron is the most fundamental model of how the brain thinks. Fundamental in the sense that it is the most minimum function that is at least required to perform brain like thinking. Thinking can also be thought of as a calculation and calculation is best described in terms of functions. I described perceptron as the most fundamental model which also means it is the most minimum form of function that is at least required for brain-like thinking to perform. If we go even below this the full functionality of the brain will not be possible.
Transformers in Video Understanding
Videos are everywhere and they are only increasing over time. One way to solve problems related to videos is by using individual frames for classification. For working on space and time, machine learning researchers have proposed many solutions and one of the recent techniques is using transformers. Transformers were introduced in Natural Language Processing. Now transformers are almost everywhere.
Rig Inversion by Training a Differentiable Rig Function
Bolduc, Mathieu Marquis, Phan, Hau Nghiep
Rig inversion is the problem of creating a method that can find the rig parameter vector that best approximates a given input mesh. In this paper we propose to solve this problem by first obtaining a differentiable rig function by training a multi layer perceptron to approximate the rig function. This differentiable rig function can then be used to train a deep learning model of rig inversion.
FOLD-SE: An Efficient Rule-based Machine Learning Algorithm with Scalable Explainability
We present FOLD-SE, an efficient, explainable machine learning algorithm for classification tasks given tabular data containing numerical and categorical values. FOLD-SE generates a set of default rules-essentially a stratified normal logic program-as an (explainable) trained model. Explainability provided by FOLD-SE is scalable, meaning that regardless of the size of the dataset, the number of learned rules and learned literals stay quite small while good accuracy in classification is maintained. A model with smaller number of rules and literals is easier to understand for human beings. FOLD-SE is competitive with state-of-the-art machine learning algorithms such as XGBoost and Multi-Layer Perceptrons (MLP) wrt accuracy of prediction. However, unlike XGBoost and MLP, the FOLD-SE algorithm is explainable. The FOLD-SE algorithm builds upon our earlier work on developing the explainable FOLD-R++ machine learning algorithm for binary classification and inherits all of its positive features. Thus, pre-processing of the dataset, using techniques such as one-hot encoding, is not needed. Like FOLD-R++, FOLD-SE uses prefix sum to speed up computations resulting in FOLD-SE being an order of magnitude faster than XGBoost and MLP in execution speed. The FOLD-SE algorithm outperforms FOLD-R++ as well as other rule-learning algorithms such as RIPPER in efficiency, performance and scalability, especially for large datasets. A major reason for scalable explainability of FOLD-SE is the use of a literal selection heuristics based on Gini Impurity, as opposed to Information Gain used in FOLD-R++. A multi-category classification version of FOLD-SE is also presented.
Optimal Power Flow Based on Physical-Model-Integrated Neural Network with Worth-Learning Data Generation
Fast and reliable solvers for optimal power flow (OPF) problems are attracting surging research interest. As surrogates of physical-model-based OPF solvers, neural network (NN) solvers can accelerate the solving process. However, they may be unreliable for ``unseen" inputs when the training dataset is unrepresentative. Enhancing the representativeness of the training dataset for NN solvers is indispensable but is not well studied in the literature. To tackle this challenge, we propose an OPF solver based on a physical-model-integrated NN with worth-learning data generation. The designed NN is a combination of a conventional multi-layer perceptron (MLP) and an OPF-model module, which outputs not only the optimal decision variables of the OPF problem but also the constraints violation degree. Based on this NN, the worth-learning data generation method can identify feasible samples that are not well generalized by the NN. By iteratively applying this method and including the newly identified worth-learning samples in the training set, the representativeness of the training set can be significantly enhanced. Therefore, the solution reliability of the NN solver can be remarkably improved. Experimental results show that the proposed method leads to an over 50% reduction of constraint violations and optimality loss compared to conventional NN solvers.