Perceptrons
On Local Aggregation in Heterophilic Graphs
Mostafa, Hesham, Nassar, Marcel, Majumdar, Somdeb
Many recent works have studied the performance of Graph Neural Networks (GNNs) in the context of graph homophily - a label-dependent measure of connectivity. Traditional GNNs generate node embeddings by aggregating information from a node's neighbors in the graph. Recent results in node classification tasks show that this local aggregation approach performs poorly in graphs with low homophily (heterophilic graphs). Several mechanisms have been proposed to improve the accuracy of GNNs on such graphs by increasing the aggregation range of a GNN layer, either through multi-hop aggregation, or through long-range aggregation from distant nodes. In this paper, we show that properly tuned classical GNNs and multi-layer perceptrons match or exceed the accuracy of recent long-range aggregation methods on heterophilic graphs. Thus, our results highlight the need for alternative datasets to benchmark long-range GNN aggregation mechanisms. We also show that homophily is a poor measure of the information in a node's local neighborhood and propose the Neighborhood Information Content (NIC) metric, which is a novel information-theoretic graph metric. We argue that NIC is more relevant for local aggregation methods as used by GNNs. We show that, empirically, it correlates better with GNN accuracy in node classification tasks than homophily.
Under the Hood of Modern Machine and Deep Learning
In this chapter, we investigate whether unique, optimal decision boundaries can be found. In order to do so, we first have to revisit several fundamental mathematical principles. Regularization is a mathematical tool, which allows us to find unique solutions even for highly ill-posed problems. In order to use this trick, we review norms and how they can be used to steer regression problems. Rosenblatt's Perceptron and Multi-Layer Perceptrons which are also called Artificial Neural Networks inherently suffer from this ill-posedness.
On the KLM properties of a fuzzy DL with Typicality
The paper investigates the properties of a fuzzy logic of typicality. The extension of fuzzy logic with a typicality operator was proposed in recent work to define a fuzzy multipreference semantics for Multilayer Perceptrons, by regarding the deep neural network as a conditional knowledge base. In this paper, we study its properties. First, a monotonic extension of a fuzzy ALC with typicality is considered (called ALCFT) and a reformulation the KLM properties of a preferential consequence relation for this logic is devised. Most of the properties are satisfied, depending on the reformulation and on the fuzzy combination functions considered. We then strengthen ALCFT with a closure construction by introducing a notion of faithful model of a weighted knowledge base, which generalizes the notion of coherent model of a conditional knowledge base previously introduced, and we study its properties.
Under the Hood of Modern Machine and Deep Learning
In this chapter, we investigate whether unique, optimal decision boundaries can be found. In order to do so, we first have to revisit several fundamental mathematical principles. Regularization is a mathematical tool, which allows us to find unique solutions even for highly ill-posed problems. In order to use this trick, we review norms and how they can be used to steer regression problems. Rosenblatt's Perceptron and Multi-Layer Perceptrons which are also called Artificial Neural Networks inherently suffer from this ill-posedness.
Machine Learning with Docker and Kubernetes: Batch Inference
You can find all the files used in this chapter on GitHub. Compared to our previous Dockerfile, we just added inference.py Here, we will download our previously trained models (Linear discriminant analysis and a multi-layer perceptron neural network) stored in a specified directory (/home/xavi/output) from a remote server (192.168.1.11) Once the image is successfully uploaded to the registry, we go to our project directory (connect to kubmaster) and create a configuration file, inference.yaml, We are finally ready to get our application running on Kubernetes.
Inclusion of Domain-Knowledge into GNNs using Mode-Directed Inverse Entailment
Dash, Tirtharaj, Srinivasan, Ashwin, Baskar, A
We present a general technique for constructing Graph Neural Networks (GNNs) capable of using multi-relational domain knowledge. The technique is based on mode-directed inverse entailment (MDIE) developed in Inductive Logic Programming (ILP). Given a data instance $e$ and background knowledge $B$, MDIE identifies a most-specific logical formula $\bot_B(e)$ that contains all the relational information in $B$ that is related to $e$. We transform $\bot_B(e)$ into a corresponding "bottom-graph" that can be processed for use by standard GNN implementations. This transformation allows a principled way of incorporating generic background knowledge into GNNs: we use the term `BotGNN' for this form of graph neural networks. For several GNN variants, using real-world datasets with substantial background knowledge, we show that BotGNNs perform significantly better than both GNNs without background knowledge and a recently proposed simplified technique for including domain knowledge into GNNs. We also provide experimental evidence comparing BotGNNs favourably to multi-layer perceptrons (MLPs) that use features representing a "propositionalised" form of the background knowledge; and BotGNNs to a standard ILP based on the use of most-specific clauses. Taken together, these results point to BotGNNs as capable of combining the computational efficacy of GNNs with the representational versatility of ILP.
Optimizing Neural Network Weights using Nature-Inspired Algorithms
Korani, Wael, Mouhoub, Malek, Sadaoui, Samira
This study aims to optimize Deep Feedforward Neural Networks (DFNNs) training using nature-inspired optimization algorithms, such as PSO, MTO, and its variant called MTOCL. We show how these algorithms efficiently update the weights of DFNNs when learning from data. We evaluate the performance of DFNN fused with optimization algorithms using three Wisconsin breast cancer datasets, Original, Diagnostic, and Prognosis, under different experimental scenarios. The empirical analysis demonstrates that MTOCL is the most performing in most scenarios across the three datasets. Also, MTOCL is comparable to past weight optimization algorithms for the original dataset, and superior for the other datasets, especially for the challenging Prognostic dataset.
Machine Learning Models to Predict 30-Day Mortality in Mechanically Ventilated Patients
Previous scoring models, such as the Acute Physiologic Assessment and Chronic Health Evaluation II (APACHE II) score, do not adequately predict the mortality of patients receiving mechanical ventilation in the intensive care unit. Therefore, this study aimed to apply machine learning algorithms to improve the prediction accuracy for 30-day mortality of mechanically ventilated patients. The data of 16,940 mechanically ventilated patients were divided into the training-validation (83%, n = 13,988) and test (17%, n = 2952) sets. Machine learning algorithms including balanced random forest, light gradient boosting machine, extreme gradient boost, multilayer perceptron, and logistic regression were used. We compared the area under the receiver operating characteristic curves (AUCs) of machine learning algorithms with those of the APACHE II and ProVent score results. The extreme gradient boost model showed the highest AUC (0.79 (0.77โ0.80)) for the 30-day mortality prediction, followed by the balanced random forest model (0.78 (0.76โ0.80)). The AUCs of these machine learning models as achieved by APACHE II and ProVent scores were higher than 0.67 (0.65โ0.69), and 0.69 (0.67โ0.71)), respectively. The most important variables in developing each machine learning model were APACHE II score, Charlson comorbidity index, and norepinephrine. The machine learning models have a higher AUC than conventional scoring systems, and can thus better predict the 30-day mortality of mechanically ventilated patients.
Introduction to Attention Mechanism
Let us go through one whole step to explain what is happening. At t 1 we're going to use decoder state s_t 1 to computer alignment scores. To compute the alignment score for every encoder state we're using a function that is called alignment function but it's just an MLP (MultiLayer Perceptron). Each alignment score can be treated as "how much h1 is useful in predicting the output in the state s0". The alignment function outputs a scalar value which is a real number and we cannot use it just like that, we have to normalize those values using the softmax function. Output from the softmax function is normalized so all the numbers sum up to 1.
A Theoretical-Empirical Approach to Estimating Sample Complexity of DNNs
Bisla, Devansh, Saridena, Apoorva Nandini, Choromanska, Anna
This paper focuses on understanding how the generalization error scales with the amount of the training data for deep neural networks (DNNs). Existing techniques in statistical learning require computation of capacity measures, such as VC dimension, to provably bound this error. It is however unclear how to extend these measures to DNNs and therefore the existing analyses are applicable to simple neural networks, which are not used in practice, e.g., linear or shallow ones or otherwise multi-layer perceptrons. Moreover, many theoretical error bounds are not empirically verifiable. We derive estimates of the generalization error that hold for deep networks and do not rely on unattainable capacity measures. The enabling technique in our approach hinges on two major assumptions: i) the network achieves zero training error, ii) the probability of making an error on a test point is proportional to the distance between this point and its nearest training point in the feature space and at a certain maximal distance (that we call radius) it saturates. Based on these assumptions we estimate the generalization error of DNNs. The obtained estimate scales as O(1/(\delta N^{1/d})), where N is the size of the training data and is parameterized by two quantities, the effective dimensionality of the data as perceived by the network (d) and the aforementioned radius (\delta), both of which we find empirically. We show that our estimates match with the experimentally obtained behavior of the error on multiple learning tasks using benchmark data-sets and realistic models. Estimating training data requirements is essential for deployment of safety critical applications such as autonomous driving etc. Furthermore, collecting and annotating training data requires a huge amount of financial, computational and human resources. Our empirical estimates will help to efficiently allocate resources.