Perceptrons
Graph-less Neural Networks: Teaching Old MLPs New Tricks via Distillation
Zhang, Shichang, Liu, Yozen, Sun, Yizhou, Shah, Neil
Graph Neural Networks (GNNs) have recently become popular for graph machine learning and have shown great results on wide node classification tasks. Yet, GNNs are less popular for practical deployments in the industry owing to their scalability challenges incurred by data dependency. Namely, GNN inference depends on neighbor nodes multiple hops away from the target, and fetching these nodes burdens latency-constrained applications. Existing inference acceleration methods like pruning and quantization can speed up GNNs to some extent by reducing Multiplication-and-ACcumulation (MAC) operations. However, their improvements are limited given the data dependency is not resolved. Conversely, multi-layer perceptrons (MLPs) have no dependency on graph data and infer much faster than GNNs, even though they are less accurate than GNNs for node classification in general. Motivated by these complementary strengths and weaknesses, we bring GNNs and MLPs together via knowledge distillation (KD). Our work shows that the performance of MLPs can be improved by large margins with GNN KD. We call the distilled MLPs Graph-less Neural Networks (GLNNs) as they have no inference graph dependency. We show that GLNN with competitive performance infer faster than GNNs by 146X-273X and faster than other acceleration methods by 14X-27X. Meanwhile, under a production setting involving both transductive and inductive predictions across 7 datasets, GLNN accuracies improve over stand alone MLPs by 12.36% on average and match GNNs on 6/7 datasets. A comprehensive analysis of GLNN shows when and why GLNN can achieve competitive results to GNNs and suggests GLNN as a handy choice for latency-constrained applications.
Two-argument activation functions learn soft XOR operations like cortical neurons
Yoon, Kijung, Orhan, Emin, Kim, Juhyun, Pitkow, Xaq
Neurons in the brain are complex machines with distinct functional compartments that interact nonlinearly. In contrast, neurons in artificial neural networks abstract away this complexity, typically down to a scalar activation function of a weighted sum of inputs. Here we emulate more biologically realistic neurons by learning canonical activation functions with two input arguments, analogous to basal and apical dendrites. We use a network-in-network architecture where each neuron is modeled as a multilayer perceptron with two inputs and a single output. This inner perceptron is shared by all units in the outer network. Remarkably, the resultant nonlinearities often produce soft XOR functions, consistent with recent experimental observations about interactions between inputs in human cortical neurons. When hyperparameters are optimized, networks with these nonlinearities learn faster and perform better than conventional ReLU nonlinearities with matched parameter counts, and they are more robust to natural and adversarial perturbations.
A Novel Clustering-Based Algorithm for Continuous and Non-invasive Cuff-Less Blood Pressure Estimation
Farki, Ali, Kazemzadeh, Reza Baradaran, Noughabi, Elham Akhondzadeh
Continuous blood pressure (BP) measurements can reflect a body's response to diseases and serve as a predictor of cardiovascular and other health conditions. While current cuff-based BP measurement methods are incapable of providing continuous BP readings, invasive BP monitoring methods also tend to cause patient dissatisfaction and can potentially cause infection. In this research, we developed a method for estimating blood pressure based on the features extracted from Electrocardiogram (ECG) and Photoplethysmogram (PPG) signals and the Arterial Blood Pressure (ABP) data. The vector of features extracted from the preprocessed ECG and PPG signals is used in this approach, which include Pulse Transit Time (PTT), PPG Intensity Ratio (PIR), and Heart Rate (HR), as the input of a clustering algorithm and then developing separate regression models like Random Forest Regression, Gradient Boosting Regression, and Multilayer Perceptron Regression algorithms for each resulting cluster. We evaluated and compared the findings to create the model with the highest accuracy by applying the clustering approach and identifying the optimal number of clusters, and eventually the acceptable prediction model. The paper compares the results obtained with and without this clustering. The results show that the proposed clustering approach helps obtain more accurate estimates of Systolic Blood Pressure (SBP) and Diastolic Blood Pressure (DBP). Given the inconsistency, high dispersion, and multitude of trends in the datasets for different features, using the clustering approach improved the estimation accuracy by 50-60%.
Improving Generalization of Deep Reinforcement Learning-based TSP Solvers
Ouyang, Wenbin, Wang, Yisen, Han, Shaochen, Jin, Zhejian, Weng, Paul
Recent work applying deep reinforcement learning (DRL) to solve traveling salesman problems (TSP) has shown that DRL-based solvers can be fast and competitive with TSP heuristics for small instances, but do not generalize well to larger instances. In this work, we propose a novel approach named MAGIC that includes a deep learning architecture and a DRL training method. Our architecture, which integrates a multilayer perceptron, a graph neural network, and an attention model, defines a stochastic policy that sequentially generates a TSP solution. Our training method includes several innovations: (1) we interleave DRL policy gradient updates with local search (using a new local search technique), (2) we use a novel simple baseline, and (3) we apply curriculum learning. Finally, we empirically demonstrate that MAGIC is superior to other DRL-based methods on random TSP instances, both in terms of performance and generalizability. Moreover, our method compares favorably against TSP heuristics and other state-of-the-art approach in terms of performance and computational time.
A Theoretical Perspective on Hyperdimensional Computing
Thomas, Anthony, Dasgupta, Sanjoy, Rosing, Tajana
Hyperdimensional (HD) computing is a set of neurally inspired methods for obtaining highdimensional, low-precision, distributed representations of data. These representations can be combined with simple, neurally plausible algorithms to effect a variety of information processing tasks. HD computing has recently garnered significant interest from the computer hardware community as an energy-efficient, low-latency, and noise-robust tool for solving learning problems. In this review, we present a unified treatment of the theoretical foundations of HD computing with a focus on the suitability of representations for learning.
Neural Knitworks: Patched Neural Implicit Representation Networks
Czerkawski, Mikolaj, Cardona, Javier, Atkinson, Robert, Michie, Craig, Andonovic, Ivan, Clemente, Carmine, Tachtatzis, Christos
Coordinate-based Multilayer Perceptron (MLP) networks, despite being capable of learning neural implicit representations, are not performant for internal image synthesis applications. Convolutional Neural Networks (CNNs) are typically used instead for a variety of internal generative tasks, at the cost of a larger model. We propose Neural Knitwork, an architecture for neural implicit representation learning of natural images that achieves image synthesis by optimizing the distribution of image patches in an adversarial manner and by enforcing consistency between the patch predictions. To the best of our knowledge, this is the first implementation of a coordinate-based MLP tailored for synthesis tasks such as image inpainting, super-resolution, and denoising. We demonstrate the utility of the proposed technique by training on these three tasks. The results show that modeling natural images using patches, rather than pixels, produces results of higher fidelity. The resulting model requires 80% fewer parameters than alternative CNN-based solutions while achieving comparable performance and training time.
Perceptron: A Basic Neural Network Model for Deep Learning
The World has witnessed an explosion in machine and deep learning technology in the last decade from a personalized world to professional activities everywhere. With these technologies, I am sure that you had heard or read the term "Perceptron" in the neural networks which is the first concept one will probably start to learn neural network. Therefore through this article, my emphasis is to show what is a perceptron and its working. An American Scientist Rosenblatt was very much inspired by the biological neuron and its ability to learn and the term perceptron was introduced by him around 1957. Rosenblatt's perceptron has one or more inputs with only one output and a processor.
Weighted Conditional EL{^}bot Knowledge Bases with Integer Weights: an ASP Approach
Giordano, Laura, Dupré, Daniele Theseider
Weighted knowledge bases for description logics with typicality have been recently considered under a "concept-wise" multipreference semantics (in both the two-valued and fuzzy case), as the basis of a logical semantics of Multilayer Perceptrons. In this paper we consider weighted conditional EL^bot knowledge bases in the two-valued case, and exploit ASP and asprin for encoding concept-wise multipreference entailment for weighted KBs with integer weights.
EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation
Dong, Chenhe, Wang, Guangrun, Xu, Hang, Peng, Jiefeng, Ren, Xiaozhe, Liang, Xiaodan
Pre-trained language models have shown remarkable results on various NLP tasks. Nevertheless, due to their bulky size and slow inference speed, it is hard to deploy them on edge devices. In this paper, we have a critical insight that improving the feed-forward network (FFN) in BERT has a higher gain than improving the multi-head attention (MHA) since the computational cost of FFN is 2$\sim$3 times larger than MHA. Hence, to compact BERT, we are devoted to designing efficient FFN as opposed to previous works that pay attention to MHA. Since FFN comprises a multilayer perceptron (MLP) that is essential in BERT optimization, we further design a thorough search space towards an advanced MLP and perform a coarse-to-fine mechanism to search for an efficient BERT architecture. Moreover, to accelerate searching and enhance model transferability, we employ a novel warm-up knowledge distillation strategy at each search stage. Extensive experiments show our searched EfficientBERT is 6.9$\times$ smaller and 4.4$\times$ faster than BERT$\rm_{BASE}$, and has competitive performances on GLUE and SQuAD Benchmarks. Concretely, EfficientBERT attains a 77.7 average score on GLUE \emph{test}, 0.7 higher than MobileBERT$\rm_{TINY}$, and achieves an 85.3/74.5 F1 score on SQuAD v1.1/v2.0 \emph{dev}, 3.2/2.7 higher than TinyBERT$_4$ even without data augmentation. The code is released at https://github.com/cheneydon/efficient-bert.
BotSpot: Deep Learning Classification of Bot Accounts within Twitter
Braker, Christopher, Shiaeles, Stavros, Bendiab, Gueltoum, Savage, Nick, Limniotis, Konstantinos
The openness feature of Twitter allows programs to generate and control Twitter accounts automatically via the Twitter API. These accounts, which are known as bots, can automatically perform actions such as tweeting, re-tweeting, following, unfollowing, or direct messaging other accounts, just like real people. They can also conduct malicious tasks such as spreading of fake news, spams, malicious software and other cyber-crimes. In this paper, we introduce a novel bot detection approach using deep learning, with the Multi-layer Perceptron Neural Networks and nine features of a bot account. A web crawler is developed to automatically collect data from public Twitter accounts and build the testing and training datasets, with 860 samples of human and bot accounts. After the initial training is done, the Multilayer Perceptron Neural Networks achieved an overall accuracy rate of 92%, which proves the performance of the proposed approach.