Perceptrons
On Graph Neural Networks versus Graph-Augmented MLPs
Chen, Lei, Chen, Zhengdao, Bruna, Joan
While multi-layer Graph Neural Networks (GNNs) have gained popularity for their applications in various fields, recently authors have started to investigate what their true advantages over baselines are, and whether they can be simplified. On one hand, GNNs based on neighborhood-aggregation allows the combination of information present at different nodes, and by increasing the depth of such GNNs, we increase the size of the receptive field. On the other hand, it has been pointed out that deep GNNs can suffer from issues including over-smoothing, exploding or vanishing gradients in training as well as bottleneck effects (Kipf & Welling, 2016; Li et al., 2018; Luan et al., 2019; Oono & Suzuki, 2020; Rossi et al., 2020; Alon & Yahav, 2020). Recently, a series of models have attempted at relieving these issues of deep GNNs while retaining their benefit of combining information across nodes, using the approach of firstly augmenting the node features by propagating the original node features through powers of graph operators such as the (normalized) adjacency matrix, and secondly applying a node-wise function to the augmented node features, usually realized by a Multi-Layer Perceptron (MLP) (Wu et al., 2019; NT & Maehara, 2019; Chen et al., 2019a; Rossi et al., 2020). Because of the usage of graph operators for augmenting the node features, we will refer to such models as Graph-Augmented MLPs (GA-MLPs). These models have achieved competitive performances on various tasks, and moreover enjoy better scalability since the augmented node features can be computed during preprocessing (Rossi et al., 2020). Thus, it becomes natural to ask what advantages GNNs have over GA-MLPs. In this work, we ask whether GA-MLPs sacrifice expressive power compared to GNNs while gaining these advantages. A popular measure of the expressive power of GNNs is their ability to distinguish nonisomorphic graphs (Hamilton et al., 2017; Xu et al., 2019; Morris et al., 2019).
The Teaching Dimension of Kernel Perceptron
Kumar, Akash, Zhang, Hanqi, Singla, Adish, Chen, Yuxin
Algorithmic machine teaching has been studied under the linear setting where exact teaching is possible. However, little is known for teaching nonlinear learners. Here, we establish the sample complexity of teaching, aka teaching dimension, for kernelized perceptrons for different families of feature maps. As a warm-up, we show that the teaching complexity is $\Theta(d)$ for the exact teaching of linear perceptrons in $\mathbb{R}^d$, and $\Theta(d^k)$ for kernel perceptron with a polynomial kernel of order $k$. Furthermore, under certain smooth assumptions on the data distribution, we establish a rigorous bound on the complexity for approximately teaching a Gaussian kernel perceptron. We provide numerical examples of the optimal (approximate) teaching set under several canonical settings for linear, polynomial and Gaussian kernel perceptrons.
Hybrid Models for Learning to Branch
Gupta, Prateek, Gasse, Maxime, Khalil, Elias B., Kumar, M. Pawan, Lodi, Andrea, Bengio, Yoshua
A recent Graph Neural Network (GNN) approach for learning to branch has been shown to successfully reduce the running time of branch-and-bound algorithms for Mixed Integer Linear Programming (MILP). While the GNN relies on a GPU for inference, MILP solvers are purely CPU-based. This severely limits its application as many practitioners may not have access to high-end GPUs. In this work, we ask two key questions. First, in a more realistic setting where only a CPU is available, is the GNN model still competitive? Second, can we devise an alternate computationally inexpensive model that retains the predictive power of the GNN architecture? We answer the first question in the negative, and address the second question by proposing a new hybrid architecture for efficient branching on CPU machines. The proposed architecture combines the expressive power of GNNs with computationally inexpensive multi-layer perceptrons (MLP) for branching. We evaluate our methods on four classes of MILP problems, and show that they lead to up to 26% reduction in solver running time compared to state-of-the-art methods without a GPU, while extrapolating to harder problems than it was trained on. The code for this project is publicly available at https://github.com/pg2455/Hybrid-learn2branch.
Beyond Homophily in Graph Neural Networks: Current Limitations and Effective Designs
Zhu, Jiong, Yan, Yujun, Zhao, Lingxiao, Heimann, Mark, Akoglu, Leman, Koutra, Danai
We investigate the representation power of graph neural networks in the semi-supervised node classification task under heterophily or low homophily, i.e., in networks where connected nodes may have different class labels and dissimilar features. Many popular GNNs fail to generalize to this setting, and are even outperformed by models that ignore the graph structure (e.g., multilayer perceptrons). Motivated by this limitation, we identify a set of key designs -- ego- and neighbor-embedding separation, higher-order neighborhoods, and combination of intermediate representations -- that boost learning from the graph structure under heterophily. We combine them into a graph neural network, H2GCN, which we use as the base method to empirically evaluate the effectiveness of the identified designs. Going beyond the traditional benchmarks with strong homophily, our empirical analysis shows that the identified designs increase the accuracy of GNNs by up to 40% and 27% over models without them on synthetic and real networks with heterophily, respectively, and yield competitive performance under homophily.
A Multilinear Sampling Algorithm to Estimate Shapley Values
Shapley values are great analytical tools in game theory to measure the importance of a player in a game. Due to their axiomatic and desirable properties such as efficiency, they have become popular for feature importance analysis in data science and machine learning. However, the time complexity to compute Shapley values based on the original formula is exponential, and as the number of features increases, this becomes infeasible. Castro et al. [1] developed a sampling algorithm, to estimate Shapley values. In this work, we propose a new sampling method based on a multilinear extension technique as applied in game theory. The aim is to provide a more efficient (sampling) method for estimating Shapley values. Our method is applicable to any machine learning model, in particular for either multi-class classifications or regression problems. We apply the method to estimate Shapley values for multilayer perceptrons (MLPs) and through experimentation on two datasets, we demonstrate that our method provides more accurate estimations of the Shapley values by reducing the variance of the sampling statistics.
Contrastive Self-Supervised Learning for Wireless Power Control
We propose a new approach for power control in wireless networks using self-supervised learning. We partition a multi-layer perceptron that takes as input the channel matrix and outputs the power control decisions into a backbone and a head, and we show how we can use contrastive learning to pre-train the backbone so that it produces similar embeddings at its output for similar channel matrices and vice versa, where similarity is defined in an information-theoretic sense by identifying the interference links that can be optimally treated as noise. The backbone and the head are then fine-tuned using a limited number of labeled samples. Simulation results show the effectiveness of the proposed approach, demonstrating significant gains over pure supervised learning methods in both sum-throughput and sample efficiency.
Investigating the Scalability and Biological Plausibility of the Activation Relaxation Algorithm
Millidge, Beren, Tschantz, Alexander, Seth, Anil, Buckley, Christopher L
The recently proposed Activation Relaxation (AR) algorithm provides a simple and robust approach for approximating the backpropagation of error algorithm using only local learning rules. We have previously shown that the algorithm can be further simplified and made more biologically plausible by (i) introducing a learnable set of backwards weights, which overcomes the weight-transport problem, and (ii) avoiding the computation of nonlinear derivatives at each neuron. However, tthe efficacy of these simplifications has, so far, only been tested on simple multi-layer-perceptron (MLP) networks. Here, we show that these simplifications still maintain performance using more complex CNN architectures and challenging datasets, which have proven difficult for other biologically-plausible schemes to scale to. We also investigate whether another biologically implausible assumption of the original AR algorithm - the frozen feedforward pass - can be relaxed without damaging performance. The backpropagation of error algorithm (backprop) has been the engine driving the successes of modern machine learning with deep neural networks.
Similarity Based Stratified Splitting: an approach to train better classifiers
Farias, Felipe, Ludermir, Teresa, Bastos-Filho, Carmelo
We propose a Similarity-Based Stratified Splitting (SBSS) technique, which uses both the output and input space information to split the data. The splits are generated using similarity functions among samples to place similar samples in different splits. This approach allows for a better representation of the data in the training phase. This strategy leads to a more realistic performance estimation when used in real-world applications. We evaluate our proposal in twenty-two benchmark datasets with classifiers such as Multi-Layer Perceptron, Support Vector Machine, Random Forest and K-Nearest Neighbors, and five similarity functions Cityblock, Chebyshev, Cosine, Correlation, and Euclidean. According to the Wilcoxon Sign-Rank test, our approach consistently outperformed ordinary stratified 10-fold cross-validation in 75\% of the assessed scenarios.
How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks
Xu, Keyulu, Li, Jingling, Zhang, Mozhi, Du, Simon S., Kawarabayashi, Ken-ichi, Jegelka, Stefanie
We study how neural networks trained by gradient descent extrapolate, i.e., what they learn outside the support of the training distribution. Previous works report mixed empirical results when extrapolating with neural networks: while multilayer perceptrons (MLPs) do not extrapolate well in certain simple tasks, Graph Neural Network (GNN), a structured network with MLP modules, has shown some success in more complex tasks. Working towards a theoretical explanation, we identify conditions under which MLPs and GNNs extrapolate well. First, we quantify the observation that ReLU MLPs quickly converge to linear functions along any direction from the origin, which implies that ReLU MLPs do not extrapolate most non-linear functions. But, they can provably learn a linear target function when the training distribution is sufficiently "diverse". Second, in connection to analyzing successes and limitations of GNNs, these results suggest a hypothesis for which we provide theoretical and empirical evidence: the success of GNNs in extrapolating algorithmic tasks to new data (e.g., larger graphs or edge weights) relies on encoding task-specific non-linearities in the architecture or features.
HANDWRITING RECOGNITION USING CNN - AI PROJECTS
Machine Learning is an application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. It can also be defined as the scientific study of algorithms and statistical models that computer systems use to effectively perform a specific task without using explicit human intervention, relying on patterns and inference instead. Similarly, a mathematical equation is a statement that defines the equality of two expression which can be used to define almost all the remaining mathematical theorems and science theories. Neural Networks are simply an artificial model of the human brain which are generally composed of perceptron which are further composed of structures known as nodes and weights. These nodes can activate or deactivate with inputs and further activate more nodes further levels down the neural path. This is the basic concepts by which neural network works.