Perceptrons
Neural Network Design for Energy-Autonomous AI Applications using Temporal Encoding
Mileiko, Sergey, Bunnam, Thanasin, Xia, Fei, Shafik, Rishad, Yakovlev, Alex, Das, Shidhartha
Neural Networks (NNs) are steering a new generation of artificial intelligence (AI) applications at the micro-edge. Examples include wireless sensors, wearables and cybernetic systems that collect data and process them to support real-world decisions and controls. For energy autonomy, these applications are typically powered by energy harvesters. As harvesters and other power sources which provide energy autonomy inevitably have power variations, the circuits need to robustly operate over a dynamic power envelope. In other words, the NN hardware needs to be able to function correctly under unpredictable and variable supply voltages. In this paper, we propose a novel NN design approach using the principle of pulse width modulation (PWM). PWM signals represent information with their duty cycle values which may be made independent of the voltages and frequencies of the carrier signals. We design a PWM-based perceptron which can serve as the fundamental building block for NNs, by using an entirely new method of realising arithmetic in the PWM domain. We analyse the proposed approach building from a 3x3 perceptron circuit to a complex multi-layer NN. Using handwritten character recognition as an exemplar of AI applications, we demonstrate the power elasticity, resilience and efficiency of the proposed NN design in the presence of functional and parametric variations including large voltage variations in the power supply.
Professor's perceptron paved the way for AI – 60 years too soon Cornell Chronicle
In July 1958, the U.S. Office of Naval Research unveiled a remarkable invention. An IBM 704 – a 5-ton computer the size of a room – was fed a series of punch cards. After 50 trials, the computer taught itself to distinguish cards marked on the left from cards marked on the right. It was a demonstration of the "perceptron" – "the first machine which is capable of having an original idea," according to its creator, Frank Rosenblatt '50, Ph.D. '56. At the time, Rosenblatt – who later became an associate professor of neurobiology and behavior in Cornell's Division of Biological Sciences – was a research psychologist and project engineer at the Cornell Aeronautical Laboratory in Buffalo, New York.
The Basics of Recurrent Neural Networks (RNNs)
Recurrent Neural Networks (RNNs) are widely used for data with some kind of sequential structure. For instance, time series data has an intrinsic ordering based on time. Sentences are also sequential, "I love dogs" has a different meaning than "Dogs I love." Simply put, if the semantics of your data is altered by random permutation, you have a sequential dataset and RNNs may be used for your problem! RNNs are different than the classical multi-layer perceptron (MLP) networks because of two main reasons: 1) They take into account what happened previously and 2) they share parameters/weights.
Model Fusion via Optimal Transport
Singh, Sidak Pal, Jaggi, Martin
Combining different models is a widely used paradigm in machine learning applications. While the most common approach is to form an ensemble of models and average their individual predictions, this approach is often rendered infeasible by given resource constraints in terms of memory and computation, which grow linearly with the number of models. We present a layer-wise model fusion procedure for neural networks that utilizes optimal transport to (soft-) align neurons across the models before averaging their associated parameters. We discuss two main algorithms for fusing neural networks in this "one-shot" manner, without requiring any retraining. Finally, we illustrate on CIFAR10 and MNIST how this significantly outperforms vanilla averaging on convolutional networks, such as VGG11 and multi-layer perceptrons, and for transfer tasks even surpasses the performance of both original models.
Variational Auto-encoder Based Bayesian Poisson Tensor Factorization for Sparse and Imbalanced Count Data
Jin, Yuan, Du, Lan, Gao, Longxiang, Xiang, Yong, Li, Yunfeng, Xu, Ruohua
Non-negative tensor factorization models enable predictive analysis on count data. Among them, Bayesian Poisson-Gamma models are able to derive full posterior distributions of latent factors and are less sensitive to sparse count data. However, current inference methods for these Bayesian models adopt restricted update rules for the posterior parameters. They also fail to share the update information to better cope with the data sparsity. Moreover, these models are not endowed with a component that handles the imbalance in count data values. In this paper, we propose a novel variational auto-encoder framework called VAE-BPTF which addresses the above issues. It uses multi-layer perceptron networks to encode and share complex update information. The encoded information is then reweighted per data instance to penalize common data values before aggregated to compute the posterior parameters for the latent factors. Under synthetic data evaluation, VAE-BPTF tended to recover the right number of latent factors and posterior parameter values. It also outperformed current models in both reconstruction errors and latent factor (semantic) coherence across five real-world datasets. Furthermore, the latent factors inferred by VAE-BPTF are perceived to be meaningful and coherent under a qualitative analysis.
Automatic Construction of Multi-layer Perceptron Network from Streaming Examples
Pratama, Mahardhika, Za'in, Choiru, Ashfahani, Andri, Ong, Yew Soon, Ding, Weiping
Autonomous construction of deep neural network (DNNs) is desired for data streams because it potentially offers two advantages: proper model's capacity and quick reaction to drift and shift. While the self-organizing mechanism of DNNs remains an open issue, this task is even more challenging to be developed for standard multi-layer DNNs than that using the different-depth structures, because the addition of a new layer results in information loss of previously trained knowledge. A Neural Network with Dynamically Evolved Capacity (NADINE) is proposed in this paper. NADINE features a fully open structure where its network structure, depth and width, can be automatically evolved from scratch in an online manner and without the use of problem-specific thresholds. NADINE is structured under a standard MLP architecture and the catastrophic forgetting issue during the hidden layer addition phase is resolved using the proposal of soft-forgetting and adaptive memory methods. The advantage of NADINE, namely elastic structure and online learning trait, is numerically validated using nine data stream classification and regression problems where it demonstrates performance improvement over prominent algorithms in all problems. In addition, it is capable of dealing with data stream regression and classification problems equally well.
Auto-Rotating Perceptrons
Saromo, Daniel, Villota, Elizabeth, Villanueva, Edwin
This paper proposes an improved design of the perceptron unit to mitigate the vanishing gradient problem. This nuisance appears when training deep multilayer perceptron networks with bounded activation functions. The new neuron design, named auto-rotating perceptron (ARP), has a mechanism to ensure that the node always operates in the dynamic region of the activation function, by avoiding saturation of the perceptron. The proposed method does not change the inference structure learned at each neuron. We test the effect of using ARP units in some network architectures which use the sigmoid activation function. The results support our hypothesis that neural networks with ARP units can achieve better learning performance than equivalent models with classic perceptrons.
Multiplierless and Sparse Machine Learning based on Margin Propagation Networks
M., Nazreen P., Chakrabartty, Shantanu, Thakur, Chetan Singh
The new generation of machine learning processors have evolved from multi-core and parallel architectures (for example graphical processing units) that were designed to efficiently implement matrix-vector-multiplications (MVMs). This is because at the fundamental level, neural network and machine learning operations extensively use MVM operations and hardware compilers exploit the inherent parallelism in MVM operations to achieve hardware acceleration on GPUs, TPUs and FPGAs. A natural question to ask is whether MVM operations are even necessary to implement ML algorithms and whether simpler hardware primitives can be used to implement an ultra-energy-efficient ML processor/architecture. In this paper we propose an alternate hardware-software codesign of ML and neural network architectures where instead of using MVM operations and non-linear activation functions, the architecture only uses simple addition and thresholding operations to implement inference and learning. At the core of the proposed approach is margin-propagation based computation that maps multiplications into additions and additions into a dynamic rectifying-linear-unit (ReLU) operations. This mapping results in significant improvement in computational and hence energy cost. The training of a margin-propagation (MP) network involves optimizing an $L_1$ cost function, which in conjunction with ReLU operations leads to network sparsity and weight updates using only Boolean predicates. In this paper, we show how the MP network formulation can be applied for designing linear classifiers, multi-layer perceptrons and for designing support vector networks.
Multi-Party Computation on Machine Learning - Security Boulevard
During my internship this summer, I built a multi-party computation (MPC) tool that implements a 3-party computation protocol for perceptron and support vector machine (SVM) algorithms. MPC enables multiple parties to perform analyses on private datasets without sharing them with each other. I developed a technique that lets three parties obtain the results of machine learning across non-public datasets. It is now possible to perform data analytics on private datasets that was previously impossible due to data privacy constraints. For MPC protocols, a group of parties, each with their own set of secret data, xi, share an input function, f, and each is able to obtain the output of f(x1,…,xn) without learning the private data of other parties.
Neural networks are $\textit{a priori}$ biased towards Boolean functions with low entropy
Mingard, Chris, Skalse, Joar, Valle-Pérez, Guillermo, Martínez-Rubio, David, Mikulik, Vladimir, Louis, Ard A.
Understanding the inductive bias of neural networks is critical to explaining their ability to generalise. Here, for one of the simplest neural networks -- a single-layer perceptron with $n$ input neurons, one output neuron, and no threshold bias term -- we prove that upon random initialisation of weights, the a priori probability $P(t)$ that it represents a Boolean function that classifies $t$ points in $\{0,1\}^n$ as $1$ has a remarkably simple form: $ P(t) = 2^{-n} \,\, {\rm for} \,\, 0\leq t < 2^n$. Since a perceptron can express far fewer Boolean functions with small or large values of $t$ (low "entropy") than with intermediate values of $t$ (high "entropy") there is, on average, a strong intrinsic a-priori bias towards individual functions with low entropy. Furthermore, within a class of functions with fixed $t$, we often observe a further intrinsic bias towards functions of lower complexity. Finally, we prove that, regardless of the distribution of inputs, the bias towards low entropy becomes monotonically stronger upon adding ReLU layers, and empirically show that increasing the variance of the bias term has a similar effect.