Goto

Collaborating Authors

 Perceptrons


Characteristics of Monte Carlo Dropout in Wide Neural Networks

arXiv.org Machine Learning

Monte Carlo (MC) dropout is one of the state-of-the-art approaches for uncertainty estimation in neural networks (NNs). It has been interpreted as approximately performing Bayesian inference. Based on previous work on the approximation of Gaussian processes by wide and deep neural networks with random weights, we study the limiting distribution of wide untrained NNs under dropout more rigorously and prove that they as well converge to Gaussian processes for fixed sets of weights and biases. We sketch an argument that this property might also hold for infinitely wide feed-forward networks that are trained with (full-batch) gradient descent. The theory is contrasted by an empirical analysis in which we find correlations and non-Gaussian behaviour for the pre-activations of finite width NNs. We therefore investigate how (strongly) correlated pre-activations can induce non-Gaussian behavior in NNs with strongly correlated weights.


Neural Networks in Python

#artificialintelligence

In this tutorial, we will implement a multi-layered perceptron (a type of a feed-forward neural network) in Python using three different libraries. We'll start off with the most basic example possible, going to more complex and flexible frameworks with the aim of increasing our understanding of how to implement neural networks in Python. Quoting from the scikit-learn documentation [1], "A Multi-layer Perceptron (MLP) is a supervised learning algorithm that learns a function f: Rแต Rแต’ by training on a dataset, where m is the number of dimensions for input and o is the number of dimensions for output. Given a set of features X xยน,xยฒ,โ€ฆ,xแต, and a target y, it can learn a non-linear function approximator for either classification or regression. It is different from logistic regression, in that between the input and the output layer, there can be one or more non-linear layers, called hidden layers".


Introduction to Machine Learning

#artificialintelligence

Introduction to Machine Learning This class will teach you the end-to-end process of investigating data through a machine learning lens. This course will provide you a foundational understanding of machine learning models (logistic regression, multilayer perceptrons, convolutional neural networks, natural language processing, etc.) as well as demonstrate how these models can solve complex problems in a variety of industries, from medical diagnostics to image recognition to text prediction. In addition, we have designed practice exercises that will give you hands-on experience implementing these data science models on data sets. These practice exercises will teach you how to implement machine learning algorithms with PyTorch, open source libraries used by leading tech companies in the machine learning field (e.g., Google, NVIDIA, CocaCola, eBay, Snapchat, Uber and many more). Duke University has about 13,000 undergraduate and graduate students and a world-class faculty helping to expand the frontiers of knowledge.


11 Essential Neural Network Architectures, Visualized & Explained

#artificialintelligence

The perceptron is the most basic of all neural networks, being a fundamental building block of more complex neural networks. It simply connects an input cell and an output cell. The feed-forward network is a collection of perceptrons, in which there are three fundamental types of layers -- input layers, hidden layers, and output layers. During each connection, the signal from the previous layer is multiplied by a weight, added to a bias, and passed through an activation function. Feed-forward networks use backpropagation to iteratively update the parameters until it achieves a desirable performance.


AI Academy #3: Learn Artificial Neural Networks from A-Z

#artificialintelligence

Do you like to learn how to forecast economic time series like stock price or indexes with high accuracy? Do you like to know how to predict weather data like temperature and wind speed with a few lines of codes? If you say Yes so read more ... Artificial neural networks (ANNs) or connectionist systems are computing systems vaguely inspired by the biological neural networks that constitute animal brains. Such systems "learn" to perform tasks by considering examples, generally without being programmed with any task-specific rules. In this Course you learn multilayer perceptron (MLP) neural network by using Scikit learn & Keras libraries and Python.You learn how to classify datasets by MLP Classifier to find the correct classes for them.


Simple and Scalable Parallelized Bayesian Optimization

arXiv.org Machine Learning

In recent years, leveraging parallel and distributed computational resources has become essential to solve problems of high computational cost. Bayesian optimization (BO) has shown attractive results in those expensive-to-evaluate problems such as hyperparameter optimization of machine learning algorithms. While many parallel BO methods have been developed to search efficiently utilizing these computational resources, these methods assumed synchronous settings or were not scalable. In this paper, we propose a simple and scalable BO method for asynchronous parallel settings. Experiments are carried out with a benchmark function and hyperparameter optimization of multi-layer perceptrons, which demonstrate the promising performance of the proposed method.


Long-Term Prediction of Lane Change Maneuver Through a Multilayer Perceptron

arXiv.org Artificial Intelligence

Behavior prediction plays an essential role in both autonomous driving systems and Advanced Driver Assistance Systems (ADAS), since it enhances vehicle's awareness of the imminent hazards in the surrounding environment. Many existing lane change prediction models take as input lateral or angle information and make short-term (< 5 seconds) maneuver predictions. In this study, we propose a longer-term (5~10 seconds) prediction model without any lateral or angle information. Three prediction models are introduced, including a logistic regression model, a multilayer perceptron (MLP) model, and a recurrent neural network (RNN) model, and their performances are compared by using the real-world NGSIM dataset. To properly label the trajectory data, this study proposes a new time-window labeling scheme by adding a time gap between positive and negative samples. Two approaches are also proposed to address the unstable prediction issue, where the aggressive approach propagates each positive prediction for certain seconds, while the conservative approach adopts a roll-window average to smooth the prediction. Evaluation results show that the developed prediction model is able to capture 75% of real lane change maneuvers with an average advanced prediction time of 8.05 seconds.


A Neural Network for Determination of Latent Dimensionality in Nonnegative Matrix Factorization

arXiv.org Machine Learning

Non-negative Matrix Factorization (NMF) has proven to be a powerful unsupervised learning method for uncovering hidden features in complex and noisy data sets with applications in data mining, text recognition, dimension reduction, face recognition, anomaly detection, blind source separation, and many other fields. An important input for NMF is the latent dimensionality of the data, that is, the number of hidden features, K, present in the explored data set. Unfortunately, this quantity is rarely known a priori. We utilize a supervised machine learning approach in combination with a recent method for model determination, called NMFk, to determine the number of hidden features automatically. NMFk performs a set of NMF simulations on an ensemble of matrices, obtained by bootstrapping the initial data set, and determines which K produces stable groups of latent features that reconstruct the initial data set well. We then train a Multi-Layer Perceptron (MLP) classifier network to determine the correct number of latent features utilizing the statistics and characteristics of the NMF solutions, obtained from NMFk. In order to train the MLP classifier, a training set of 58,660 matrices with predetermined latent features were factorized with NMFk. The MLP classifier in conjunction with NMFk maintains a greater than 95% success rate when applied to a held out test set. Additionally, when applied to two well-known benchmark data sets, the swimmer and MIT face data, NMFk/MLP correctly recovered the established number of hidden features. Finally, we compared the accuracy of our method to the ARD, AIC and Stability-based methods.


PAC-Bayesian Generalization Bounds for MultiLayer Perceptrons

arXiv.org Machine Learning

We study PAC-Bayesian generalization bounds for Multilayer Perceptrons (MLPs) with the cross entropy loss. Above all, we introduce probabilistic explanations for MLPs in two aspects: (i) MLPs formulate a family of Gibbs distributions, and (ii) minimizing the cross-entropy loss for MLPs is equivalent to Bayesian variational inference, which establish a solid probabilistic foundation for studying PAC-Bayesian bounds on MLPs. Furthermore, based on the Evidence Lower Bound (ELBO), we prove that MLPs with the cross entropy loss inherently guarantee PAC- Bayesian generalization bounds, and minimizing PAC-Bayesian generalization bounds for MLPs is equivalent to maximizing the ELBO. Finally, we validate the proposed PAC-Bayesian generalization bound on benchmark datasets.


Flatness is a False Friend

arXiv.org Machine Learning

Hessian based measures of flatness, such as the trace, Frobenius and spectral norms, have been argued, used and shown to relate to generalisation. In this paper we demonstrate that for feed forward neural networks under the cross entropy loss, we would expect low loss solutions with large weights to have small Hessian based measures of flatness. This implies that solutions obtained using $L2$ regularisation should in principle be sharper than those without, despite generalising better. We show this to be true for logistic regression, multi-layer perceptrons, simple convolutional, pre-activated and wide residual networks on the MNIST and CIFAR-$100$ datasets. Furthermore, we show that for adaptive optimisation algorithms using iterate averaging, on the VGG-$16$ network and CIFAR-$100$ dataset, achieve superior generalisation to SGD but are $30 \times$ sharper. This theoretical finding, along with experimental results, raises serious questions about the validity of Hessian based sharpness measures in the discussion of generalisation. We further show that the Hessian rank can be bounded by the a constant times number of neurons multiplied by the number of classes, which in practice is often a small fraction of the network parameters. This explains the curious observation that many Hessian eigenvalues are either zero or very near zero which has been reported in the literature.