Goto

Collaborating Authors

 Perceptrons


Artificial Neural Networks

arXiv.org Machine Learning

The term neural networks refers to networks of neurons in the mammalian brain. Neurons are its fundamental units of computation. In the brain they are connected together in networks to process data. This can be a very complex task, and the dynamics of neural networks in the mammalian brain in response to external stimuli can therefore be quite intricate. Inputs and outputs of each neuron vary as functions of time, in the form of so-called spike trains, but also the network itself changes. We learn and improve our data-processing capacities by establishing reconnections between neurons. Neural-networkalgorithms are inspired by the architecture and the dynamics of networks of neurons in the brain. Yet the algorithms use neuron models that are highly simplified, compared with real neurons. Nevertheless, the fundamental principle is the same: artificial neural networks learn by reconnection.


Activation Functions for Generalized Learning Vector Quantization - A Performance Comparison

arXiv.org Machine Learning

An appropriate choice of the activation function (like ReLU, sigmoid or swish) plays an important role in the performance of (deep) multilayer perceptrons (MLP) for classification and regression learning. Prototype-based classification learning methods like (generalized) learning vector quantization (GLVQ) are powerful alternatives. These models also deal with activation functions but here they are applied to the so-called classifier function instead. In this paper we investigate successful candidates of activation functions known for MLPs for application in GLVQ and their influence on the performance.


Applying SVGD to Bayesian Neural Networks for Cyclical Time-Series Prediction and Inference

arXiv.org Machine Learning

A regression-based BNN model is proposed to predict spatiotemporal quantities like hourly rider demand with calibrated uncertainties. The main contributions of this paper are (i) A feed-forward deterministic neural network (DetNN) architecture that predicts cyclical time series data with sensitivity to anomalous forecasting events; (ii) A Bayesian framework applying SVGD to train large neural networks for such tasks, capable of producing time series predictions as well as measures of uncertainty surrounding the predictions. Experiments show that the proposed BNN reduces average estimation error by 10% across 8 U.S. cities compared to a fine-tuned multilayer perceptron (MLP), and 4% better than the same network architecture trained without SVGD.


Weightless Neural Network with Transfer Learning to Detect Distress in Asphalt

arXiv.org Machine Learning

Abstract-- The present paper shows a solution to the problem of automatic distress detection, more precisely the detection of holes in paved roads. To do so, the proposed solution uses a weightless neural network known as Wisard to decide whether an image of a road has any kind of cracks. In addition, the proposed architecture also shows how the use of transfer learning was able to improve the overall accuracy of the decision system. As a verification step of the research, an experiment was carried out using images from the streets at the Federal University of Tocantins, Brazil. The architecture of the developed solution presents a result of 85.71% accuracy in the dataset, proving to be superior to approaches of the state-of-the-art. I.INTRODUCTION In Brazil, most of the traffic is driven on asphalt roads.


Why cannot one find the zero in the delta rule for sigmoid? (No closed form to find weights in one-layer perceptron neural network?)

#artificialintelligence

I know that finding the weights of a neural network requires gradient descent as there is no closed form available. I know this from the books, and not knowing exactly why the derivative w.r.t. the weights is not zero-able led me to try to do it. Let's consider the traditional sigmoid MLP, with just one layer and just one datapoint $ \mathbf{x},t $. The gradient vector of the MSE loss function w.r.t. the weights is: Now, how to solve (finding the zero) of the gradient expression? What I could do is to analyze the various factors and see where they individually zero.


Recurrent Relational Networks

Neural Information Processing Systems

This paper is concerned with learning to solve tasks that require a chain of interde- pendent steps of relational inference, like answering complex questions about the relationships between objects, or solving puzzles where the smaller elements of a solution mutually constrain each other. We introduce the recurrent relational net- work, a general purpose module that operates on a graph representation of objects. As a generalization of Santoro et al. [2017]’s relational network, it can augment any neural network model with the capacity to do many-step relational reasoning. We achieve state of the art results on the bAbI textual question-answering dataset with the recurrent relational network, consistently solving 20/20 tasks. As bAbI is not particularly challenging from a relational reasoning point of view, we introduce Pretty-CLEVR, a new diagnostic dataset for relational reasoning. In the Pretty- CLEVR set-up, we can vary the question to control for the number of relational reasoning steps that are required to obtain the answer. Using Pretty-CLEVR, we probe the limitations of multi-layer perceptrons, relational and recurrent relational networks. Finally, we show how recurrent relational networks can learn to solve Sudoku puzzles from supervised training data, a challenging task requiring upwards of 64 steps of relational reasoning. We achieve state-of-the-art results amongst comparable methods by solving 96.6% of the hardest Sudoku puzzles.


Recurrent Relational Networks

Neural Information Processing Systems

This paper is concerned with learning to solve tasks that require a chain of interde- pendent steps of relational inference, like answering complex questions about the relationships between objects, or solving puzzles where the smaller elements of a solution mutually constrain each other. We introduce the recurrent relational net- work, a general purpose module that operates on a graph representation of objects. As a generalization of Santoro et al. [2017]’s relational network, it can augment any neural network model with the capacity to do many-step relational reasoning. We achieve state of the art results on the bAbI textual question-answering dataset with the recurrent relational network, consistently solving 20/20 tasks. As bAbI is not particularly challenging from a relational reasoning point of view, we introduce Pretty-CLEVR, a new diagnostic dataset for relational reasoning. In the Pretty- CLEVR set-up, we can vary the question to control for the number of relational reasoning steps that are required to obtain the answer. Using Pretty-CLEVR, we probe the limitations of multi-layer perceptrons, relational and recurrent relational networks. Finally, we show how recurrent relational networks can learn to solve Sudoku puzzles from supervised training data, a challenging task requiring upwards of 64 steps of relational reasoning. We achieve state-of-the-art results amongst comparable methods by solving 96.6% of the hardest Sudoku puzzles.


Throwing everything - including the kitchen sink - at a machine learning problem

#artificialintelligence

It seems the more I read, the more confused I get - models, algorithms, surrogates; my head is spinning. Assume the dataset is in perfect condition - pure as the driven snow, no correlated features, no null in sight, nothing; and it has "enough" observations. To simplify, let's say we are looking at binary classification. Let's also say that we want to try four different algorithms: for example - logistic regression, naive Bayes, gradient boosted tree and multilayer perceptron. And, finally, let's assume that (since all this is for educational purposes), we have no issues with time, efficiency, computing power, computing budget and whatnot; we don't care if this is an overkill or if we're going after a fly with an elephant gun: we want to throw everything, including the kitchen sink, at the problem so we can extract every last ounce of performance when it's time to make predictions on totally unseen data.


Dropout Regularization in Hierarchical Mixture of Experts

arXiv.org Machine Learning

Dropout is a very effective method in preventing overfitting and has become the go-to regularizer for multi-layer neural networks in recent years. Hierarchical mixture of experts is a hierarchically gated model that defines a soft decision tree where leaves correspond to experts and decision nodes correspond to gating models that softly choose between its children, and as such, the model defines a soft hierarchical partitioning of the input space. In this work, we propose a variant of dropout for hierarchical mixture of experts that is faithful to the tree hierarchy defined by the model, as opposed to having a flat, unitwise independent application of dropout as one has with multi-layer perceptrons. We show that on a synthetic regression data and on MNIST and CIFAR-10 datasets, our proposed dropout mechanism prevents overfitting on trees with many levels improving generalization and providing smoother fits.


Deep Autoencoder for Recommender Systems: Parameter Influence Analysis

arXiv.org Machine Learning

Recommender systems have recently attracted many researchers in the deep learning community. The state-of-the-art deep neural network models used in recommender systems are typically multilayer perceptron and deep Autoencoder (DAE), among which DAE usually shows better performance due to its superior capability to reconstruct the inputs. However, we found existing DAE recommendation systems that have similar implementations on similar datasets result in vastly different parameter settings. In this work, we have built a flexible DAE model, named FlexEncoder that uses configurable parameters and unique features to analyse the parameter influences on the prediction accuracy of recommender systems. This will help us identify the best-performance parameters given a dataset. Extensive evaluation on the MovieLens datasets are conducted, which drives our conclusions on the influences of DAE parameters. Specifically, we find that DAE parameters strongly affect the prediction accuracy of the recommender systems, and the effect is transferable to similar datasets in a larger size. We open our code to public which could benefit both new users for DAE -- they can quickly understand how DAE works for recommendation systems, and experienced DAE users -- it easier for them to tune the parameters on different datasets.