Scetbon, Meyer, Harchaoui, Zaid

The renewed interest in convolutional neural networks [12, 15] in computer vision and signal processing has lead to a major leap in generalization performance on common task benchmarks, supported by the recent advances in graphical processing hardware and the collection of huge labelled datasets for training and evaluation. Convolutional neural networks pose major a challenge to statistical learning theory. First and foremost a convolutional network learns from data, jointly, both a feature representation through its hidden layers and a prediction function through its ultimate layer. A convolutional neural network implements a function unfolding as a composition of basic functions (respectively nonlinearity, convolution, and pooling), which appear to model well visual information in images. Yet the relevant function spaces to analyze their statistical performance remain unclear. The analysis of convolutional neural networks (CNNs) has been an active research topic. Different viewpoints have been developed. A straightforward viewpoint is to dismiss completely the grid-or latticestructure of images and analyze a multi-layer perceptron (MLP) instead acting on vectorized images, which has the downside the set aside the most interesting property CNNs which is to model well images that is data with a 2D lattice structure.

Comprehensive surveys on GNNs were provided by Hamilton et al. (2017a), Zhou et al. (2018), and Wu et al. (2019). Despite GNNs' empirical successes in various fields, Xu et al. (2019) and Morris et al. (2019) demonstrated that GNNs cannot distinguish some pairs of graphs. This indicates that GNNs cannot correctly classify these graphs with any parameters unless the labels of these graphs are the same. This result contrasts with the universal approximation power of multi layer perceptrons (Cybenko, 1989; Hornik et al., 1989; Hornik, 1991). Furthermore, Sato et al. (2019a) showed that GNNs are at most as powerful as distributed local algorithms (Angluin, 1980; Suomela, 2013). Thus there are many combinatorial problems that GNNs cannot solve other than the graph isomorphism problem. Consequently, various provably powerful GNN models have been proposed to overcome the limitations of GNNs. This survey provides an extensive overview of the expressive power of GNNs and various GNN models to overcome these limitations.

Pizarroso, J., Portela, J., Muñoz, A.

Neural networks are important tools for data-intensive analysis and are commonly applied to model non-linear relationships between dependent and independent variables. However, neural networks are usually seen as "black boxes" that offer minimal information about how the input variables are used to predict the response in a fitted model. This article describes the \pkg{NeuralSens} package that can be used to perform sensitivity analysis of neural networks using the partial derivatives method. Functions in the package can be used to obtain the sensitivities of the output with respect to the input variables, evaluate variable importance based on sensitivity measures and characterize relationships between input and output variables. Methods to calculate sensitivities are provided for objects from common neural network packages in \proglang{R}, including \pkg{neuralnet}, \pkg{nnet}, \pkg{RSNNS}, \pkg{h2o}, \pkg{neural}, \pkg{forecast} and \pkg{caret}. The article presents an overview of the techniques for obtaining information from neural network models, a theoretical foundation of how are calculated the partial derivatives of the output with respect to the inputs of a multi-layer perceptron model, a description of the package structure and functions, and applied examples to compare \pkg{NeuralSens} functions with analogous functions from other available \proglang{R} packages.

Wang, Xiaying, Magno, Michele, Cavigelli, Lukas, Benini, Luca

The growing number of low-power smart devices in the Internet of Things is coupled with the concept of "Edge Computing", that is moving some of the intelligence, especially machine learning, towards the edge of the network. Enabling machine learning algorithms to run on resource-constrained hardware, typically on low-power smart devices, is challenging in terms of hardware (optimized and energy-efficient integrated circuits), algorithmic and firmware implementations. This paper presents FANN-on-MCU, an open-source toolkit built upon the Fast Artificial Neural Network (FANN) library to run lightweight and energy-efficient neural networks on microcontrollers based on both the ARM Cortex-M series and the novel RISC-V-based Parallel Ultra-Low-Power (PULP) platform. The toolkit takes multi-layer perceptrons trained with FANN and generates code targeted at execution on low-power microcontrollers either with a floating-point unit (i.e., ARM Cortex-M4F and M7F) or without (i.e., ARM Cortex M0-M3 or PULP-based processors). This paper also provides an architectural performance evaluation of neural networks on the most popular ARM Cortex-M family and the parallel RISC-V processor called Mr. Wolf. The evaluation includes experimental results for three different applications using a self-sustainable wearable multi-sensor bracelet. Experimental results show a measured latency in the order of only a few microseconds and a power consumption of few milliwatts while keeping the memory requirements below the limitations of the targeted microcontrollers. In particular, the parallel implementation on the octa-core RISC-V platform reaches a speedup of 22x and a 69% reduction in energy consumption with respect to a single-core implementation on Cortex-M4 for continuous real-time classification.

He, Yuzi, Burghardt, Keith, Lerman, Kristina

To reduce human error and prejudice, many high-stakes decisions have been turned over to machine algorithms. However, recent research suggests that this does not remove discrimination, and can perpetuate harmful stereotypes. While algorithms have been developed to improve fairness, they typically face at least one of three shortcomings: they are not interpretable, they lose significant accuracy compared to unbiased equivalents, or they are not transferable across models. To address these issues, we propose a geometric method that removes correlations between data and any number of protected variables. Further, we can control the strength of debi-asing through an adjustable parameter to address the tradeoff between model accuracy and fairness. The resulting features are interpretable and can be used with many popular models, such as linear regression, random forest and multilayer perceptrons. The resulting predictions are found to be more accurate and fair than several comparable fair AI algorithms across a variety of benchmark datasets. Our work shows that debiasing data is a simple and effective solution toward improving fairness.

Federated learning is an emerging technique used to prevent the leakage of private information. Unlike centralized learning that needs to collect data from users and store them collectively on a cloud server, federated learning makes it possible to learn a global model while the data are distributed on the users' devices. However, compared with the traditional centralized approach, the federated setting consumes considerable communication resources of the clients, which is indispensable for updating global models and prevents this technique from being widely used. In this paper, we aim to optimize the structure of the neural network models in federated learning using a multi-objective evolutionary algorithm to simultaneously minimize the communication costs and the global model test errors. A scalable method for encoding network connectivity is adapted to federated learning to enhance the efficiency in evolving deep neural networks. Experimental results on both multilayer perceptrons and convolutional neural networks indicate that the proposed optimization method is able to find optimized neural network models that can not only significantly reduce communication costs but also improve the learning performance of federated learning compared with the standard fully connected neural networks.

In this paper we propose a synergistic melting of neural networks and decision trees (DT) we call neural decision trees (NDT). NDT is an architecture a la decision tree where each splitting node is an independent multilayer perceptron allowing oblique decision functions or arbritrary nonlinear decision function if more than one layer is used. This way, each MLP can be seen as a node of the tree. We then show that with the weight sharing asumption among those units, we end up with a Hashing Neural Network (HNN) which is a multilayer perceptron with sigmoid activation function for the last layer as opposed to the standard softmax. The output units then jointly represent the probability to be in a particular region. The proposed framework allows for global optimization as opposed to greedy in DT and differentiability w.r.t. all parameters and the input, allowing easy integration in any learnable pipeline, for example after CNNs for computer vision tasks. We also demonstrate the modeling power of HNN allowing to learn union of disjoint regions for final clustering or classification making it more general and powerful than standard softmax MLP requiring linear separability thus reducing the need on the inner layer to perform complex data transformations. We finally show experiments for supervised, semi-suppervised and unsupervised tasks and compare results with standard DTs and MLPs.

Over the past few years, neural networks have re-emerged as powerful machine-learning models, yielding state-of-the-art results in fields such as image recognition and speech processing. More recently, neural network models started to be applied also to textual natural language signals, again with very promising results. This tutorial surveys neural network models from the perspective of natural language processing research, in an attempt to bring natural-language researchers up to speed with the neural techniques. The tutorial covers input encoding for natural language tasks, feed-forward networks, convolutional networks, recurrent networks and recursive networks, as well as the computation graph abstraction for automatic gradient computation.

That's an interesting question, and I try to answer this is a very general way. The tl;dr version of this is: Deep learning is essentially a set of techniques that help we to parameterize deep neural network structures, neural networks with many, many layers and parameters. And if we are interested, a more concrete example: Let's start with multi-layer perceptrons (MLPs) … On a tangent: The term "perceptron" in MLPs may be a bit confusing since we don't really want only linear neurons in our network. Using MLPs, we want to learn complex functions to solve non-linear problems. Thus, our network is conventionally composed of one or multiple "hidden" layers that connect the input and output layer.

In theory, the Winnow multiplicative update has certain advantages over the Perceptron additive update when there are many irrelevant attributes. Recently, there has been much effort on enhancing the Perceptron algorithm by using regularization, leading to a class of linear classification methods called support vector machines. Similarly, it is also possible to apply the regularization idea to the Winnow algorithm, which gives methods we call regularized Winnows. We show that the resulting methods compare with the basic Winnows in a similar way that a support vector machine compares with the Perceptron. We investigate algorithmic issues and learning properties of the derived methods. Some experimental results will also be provided to illustrate different methods. 1 Introduction In this paper, we consider the binary classification problem that is to determine a label y E {-1, 1} associated with an input vector x. A useful method for solving this problem is through linear discriminant functions, which consist of linear combinations of the components of the input variable.