Goto

Collaborating Authors

 Perceptrons


Incremental Verification of Fixed-Point Implementations of Neural Networks

arXiv.org Artificial Intelligence

Implementations of artificial neural networks (ANNs) might lead to failures, which are hardly predicted in the design phase since ANNs are highly parallel and their parameters are barely interpretable. Here, we develop and evaluate a novel symbolic verification framework using incremental bounded model checking (BMC), satisfiability modulo theories (SMT), and invariant inference, to obtain adversarial cases and validate coverage methods in a multi-layer perceptron (MLP). We exploit incremental BMC based on interval analysis to compute boundaries from a neuron's input. Then, the latter are propagated to effectively find a neuron's output since it is the input of the next one. This paper describes the first bit-precise symbolic verification framework to reason over actual implementations of ANNs in CUDA, based on invariant inference, therefore providing further guarantees about finite-precision arithmetic and its rounding errors, which are routinely ignored in the existing literature. We have implemented the proposed approach on top of the efficient SMT-based bounded model checker (ESBMC), and its experimental results show that it can successfully verify safety properties, in actual implementations of ANNs, and generate real adversarial cases in MLPs. Our approach was able to verify and produce adversarial examples for 85.8% of 21 test cases considering different input images, and 100% of the properties related to covering methods. Although our verification time is higher than existing approaches, our methodology can consider fixed-point implementation aspects that are disregarded by the state-of-the-art verification methodologies.


AdnFM: An Attentive DenseNet based Factorization Machine for CTR Prediction

arXiv.org Artificial Intelligence

In this paper, we consider the Click-Through-Rate (CTR) prediction problem. Factorization Machines and their variants consider pair-wise feature interactions, but normally we won't do high-order feature interactions using FM due to high time complexity. Given the success of deep neural networks (DNNs) in many fields, researchers have proposed several DNN-based models to learn high-order feature interactions. Multi-layer perceptrons (MLP) have been widely employed to learn reliable mappings from feature embeddings to final logits. In this paper, we aim to explore more about these high-order features interactions. However, high-order feature interaction deserves more attention and further development. Inspired by the great achievements of Densely Connected Convolutional Networks (DenseNet) in computer vision, we propose a novel model called Attentive DenseNet based Factorization Machines (AdnFM). AdnFM can extract more comprehensive deep features by using all the hidden layers from a feed-forward neural network as implicit high-order features, then selects dominant features via an attention mechanism. Also, high-order interactions in the implicit way using DNNs are more cost-efficient than in the explicit way, for example in FM. Extensive experiments on two real-world datasets show that the proposed model can effectively improve the performance of CTR prediction.


On the Power of Localized Perceptron for Label-Optimal Learning of Halfspaces with Adversarial Noise

arXiv.org Machine Learning

We study {\em online} active learning of homogeneous $s$-sparse halfspaces in $\mathbb{R}^d$ with adversarial noise \cite{kearns1992toward}, where the overall probability of a noisy label is constrained to be at most $\nu$ and the marginal distribution over unlabeled data is unchanged. Our main contribution is a state-of-the-art online active learning algorithm that achieves near-optimal attribute efficiency, label and sample complexity under mild distributional assumptions. In particular, under the conditions that the marginal distribution is isotropic log-concave and $\nu = \Omega(\epsilon)$, where $\epsilon \in (0, 1)$ is the target error rate, we show that our algorithm PAC learns the underlying halfspace in polynomial time with near-optimal label complexity bound of $\tilde{O}\big(s \cdot polylog(d, \frac{1}{\epsilon})\big)$ and sample complexity bound of $\tilde{O}\big(\frac{s}{\epsilon} \cdot polylog(d)\big)$. Prior to this work, existing online algorithms designed for tolerating the adversarial noise are either subject to label complexity polynomial in $d$ or $\frac{1}{\epsilon}$, or work under the restrictive uniform marginal distribution. As an immediate corollary of our main result, we show that under the more challenging agnostic model \cite{kearns1992toward} where no assumption is made on the noise rate, our active learner achieves an error rate of $O(OPT) + \epsilon$ with the same running time and label and sample complexity, where $OPT$ is the best possible error rate achievable by any homogeneous $s$-sparse halfspace. Our algorithm builds upon the celebrated Perceptron while leveraging novel localized sampling and semi-random gradient update to tolerate the adversarial noise. We believe that our algorithmic design and analysis are of independent interest, and may shed light on learning halfspaces with broader noise models.


Perceptron Theory for Predicting the Accuracy of Neural Networks

arXiv.org Artificial Intelligence

Many neural network models have been successful at classification problems, but their operation is still treated as a black box. Here, we developed a theory for one-layer perceptrons that can predict performance on classification tasks. This theory is a generalization of an existing theory for predicting the performance of Echo State Networks and connectionist models for symbolic reasoning known as Vector Symbolic Architectures. In this paper, we first show that the proposed perceptron theory can predict the performance of Echo State Networks, which could not be described by the previous theory. Second, we apply our perceptron theory to the last layers of shallow randomly connected and deep multi-layer networks. The full theory is based on Gaussian statistics, but it is analytically intractable. We explore numerical methods to predict network performance for problems with a small number of classes. For problems with a large number of classes, we investigate stochastic sampling methods and a tractable approximation to the full theory. The quality of predictions is assessed in three experimental settings, using reservoir computing networks on a memorization task, shallow randomly connected networks on a collection of classification datasets, and deep convolutional networks with the ImageNet dataset. This study offers a simple, bipartite approach to understand deep neural networks: the input is encoded by the last-but-one layers into a high-dimensional representation. This representation is mapped through the weights of the last layer into the postsynaptic sums of the output neurons. Specifically, the proposed perceptron theory uses the mean vector and covariance matrix of the postsynaptic sums to compute classification accuracies for the different classes. The first two moments of the distribution of the postsynaptic sums can predict the overall network performance quite accurately.


Perceptron Algorithm for Classification in Python

#artificialintelligence

The Perceptron is a linear machine learning algorithm for binary classification tasks. It may be considered one of the first and one of the simplest types of artificial neural networks. It is definitely not "deep" learning but is an important building block. Like logistic regression, it can quickly learn a linear separation in feature space for two-class classification tasks, although unlike logistic regression, it learns using the stochastic gradient descent optimization algorithm and does not predict calibrated probabilities. In this tutorial, you will discover the Perceptron classification machine learning algorithm.


Advanced Neural Networks in R - A Practical Approach

#artificialintelligence

Advanced Neural Networks in R - A Practical Approach Boost your data science skills - learn to build and train complex neural network using the R program. Neural networks are powerful predictive tools that can be used for almost any machine learning problem with very good results. If you want to break into deep learning and artificial intelligence, learning neural networks is the first crucial step. This course contains four comprehensive sections. Learn to use multilayer perceptrons to make predictions for both categorical and continuous variables.


How to Manually Optimize Neural Network Models

#artificialintelligence

Deep learning neural network models are fit on training data using the stochastic gradient descent optimization algorithm. Updates to the weights of the model are made, using the backpropagation of error algorithm. The combination of the optimization and weight update algorithm was carefully chosen and is the most efficient approach known to fit neural networks. Nevertheless, it is possible to use alternate optimization algorithms to fit a neural network model to a training dataset. This can be a useful exercise to learn more about how neural networks function and the central nature of optimization in applied machine learning. It may also be required for neural networks with unconventional model architectures and non-differentiable transfer functions.


Obtain Employee Turnover Rate and Optimal Reduction Strategy Based On Neural Network and Reinforcement Learning

arXiv.org Artificial Intelligence

Nowadays, human resource is an important part of various resources of enterprises. For enterprises, high-loyalty and high-quality talented persons are often the core competitiveness of enterprises. Therefore, it is of great practical significance to predict whether employees leave and reduce the turnover rate of employees. First, this paper established a multi-layer perceptron predictive model of employee turnover rate. A model based on Sarsa which is a kind of reinforcement learning algorithm is proposed to automatically generate a set of strategies to reduce the employee turnover rate. These strategies are a collection of strategies that can reduce the employee turnover rate the most and cost less from the perspective of the enterprise, and can be used as a reference plan for the enterprise to optimize the employee system. The experimental results show that the algorithm can indeed improve the efficiency and accuracy of the specific strategy.


Every Model Learned by Gradient Descent Is Approximately a Kernel Machine

arXiv.org Machine Learning

Deep learning's successes are often attributed to its ability to automatically discover new representations of the data, rather than relying on handcrafted features like other learning methods. We show, however, that deep networks learned by the standard gradient descent algorithm are in fact mathematically approximately equivalent to kernel machines, a learning method that simply memorizes the data and uses it directly for prediction via a similarity function (the kernel). This greatly enhances the interpretability of deep network weights, by elucidating that they are effectively a superposition of the training examples. The network architecture incorporates knowledge of the target function into the kernel. This improved understanding should lead to better learning algorithms.


DADNN: Multi-Scene CTR Prediction via Domain-Aware Deep Neural Network

arXiv.org Artificial Intelligence

Click through rate(CTR) prediction is a core task in advertising systems. The booming e-commerce business in our company, results in a growing number of scenes. Most of them are so-called long-tail scenes, which means that the traffic of a single scene is limited, but the overall traffic is considerable. Typical studies mainly focus on serving a single scene with a well designed model. However, this method brings excessive resource consumption both on offline training and online serving. Besides, simply training a single model with data from multiple scenes ignores the characteristics of their own. To address these challenges, we propose a novel but practical model named Domain-Aware Deep Neural Network(DADNN) by serving multiple scenes with only one model. Specifically, shared bottom block among all scenes is applied to learn a common representation, while domain-specific heads maintain the characteristics of every scene. Besides, knowledge transfer is introduced to enhance the opportunity of knowledge sharing among different scenes. In this paper, we study two instances of DADNN where its shared bottom block is multilayer perceptron(MLP) and Multi-gate Mixture-of-Experts(MMoE) respectively, for which we denote as DADNN-MLP and DADNN-MMoE.Comprehensive offline experiments on a real production dataset from our company show that DADNN outperforms several state-of-the-art methods for multi-scene CTR prediction. Extensive online A/B tests reveal that DADNN-MLP contributes up to 6.7% CTR and 3.0% CPM(Cost Per Mille) promotion compared with a well-engineered DCN model. Furthermore, DADNN-MMoE outperforms DADNN-MLP with a relative improvement of 2.2% and 2.7% on CTR and CPM respectively. More importantly, DADNN utilizes a single model for multiple scenes which saves a lot of offline training and online serving resources.