Deep introduction to LSTMs


We define a set of inputs (x(1), x(2), …, x(m)) and each of these numbers is going to multiplied by a weight matrix and after that, they all are going to be added together to form this internal state of the perceptron that is z. With the Perceptron, we could have multiple inputs coming in and since we're interested here in sequence modeling, we could think of these inputs as being from a single time-step from our sequence. We could also think of extending a single perceptron to a layer of perceptrons to yield multi-dimensional outputs. But if we're considering sequential data, it's probably very likely that the output or the label at a later time-step is going to somehow depend on the inputs at prior time-steps so what we're missing here by treating these individual time-steps as individual isolated time-steps is this relationship that's inherent to sequence data between inputs earlier on in the sequence to what we predict later on in the sequence. What we really need is a way to relate the computations and the operations that the network is doing at a particular time-step to both the prior history of its computation from prior time-steps as well as the input at that time-step.

Introduction to Multilayer Perceptrons(MLPs)


I'll give a basic introduction to Multilayer Perceptrons in this essay. The initial step in deep learning is to understand MLPs. It's a pretty simple concept, and it's based on something we all learned in high school. So, rookies, get your skates on. Ever wondered how your brain works?

How to learn the maths of Data Science using your high school maths knowledge -


This post is a part of my forthcoming book on Mathematical foundations of Data Science. In this post, we use the Perceptron algorithm to bridge the gap between high school maths and deep learning. As part of my role as course director of the Artificial Intelligence: Cloud and Edge Computing at the University…, I see more students who are familiar with programming than with mathematics. They have last learnt maths years ago at University. And then, suddenly they find that they encounter matrices, linear algebra etc when they start learning Data Science.

Robustness Verification for Attention Networks using Mixed Integer Programming Artificial Intelligence

Attention networks such as transformers have been shown powerful in many applications ranging from natural language processing to object recognition. This paper further considers their robustness properties from both theoretical and empirical perspectives. Theoretically, we formulate a variant of attention networks containing linearized layer normalization and sparsemax activation, and reduce its robustness verification to a Mixed Integer Programming problem. Apart from a na\"ive encoding, we derive tight intervals from admissible perturbation regions and examine several heuristics to speed up the verification process. More specifically, we find a novel bounding technique for sparsemax activation, which is also applicable to softmax activation in general neural networks. Empirically, we evaluate our proposed techniques with a case study on lane departure warning and demonstrate a performance gain of approximately an order of magnitude. Furthermore, although attention networks typically deliver higher accuracy than general neural networks, contrasting its robustness against a similar-sized multi-layer perceptron surprisingly shows that they are not necessarily more robust.

gMLP: Winning over Transformers?


Alright, we all know that transformers are cool. At least in terms of NLP, these architectures are considered to be state-of-the-art (SOTA) for language modelling, and help us perform beautifully on various downtream tasks, such as named-entity-recognition (NER), question answering (QA), part of speech tagging (POS) etc. But in this tutorial, we will dive into another architecture called Gated Multilayer Perceptron (gMLP), proposed by Google Research team. As I mentioned above, transformer architectures are very powerful, and if you want to achieve a really high performance in your particular task, you should consider using some pre-trained transformers. You could usually find them on Huggingface.

SFMGNet: A Physics-based Neural Network To Predict Pedestrian Trajectories Artificial Intelligence

Autonomous robots and vehicles are expected to soon become an integral part of our environment. Unsatisfactory issues regarding interaction with existing road users, performance in mixed-traffic areas and lack of interpretable behavior remain key obstacles. To address these, we present a physics-based neural network, based on a hybrid approach combining a social force model extended by group force (SFMG) with Multi-Layer Perceptron (MLP) to predict pedestrian trajectories considering its interaction with static obstacles, other pedestrians and pedestrian groups. We quantitatively and qualitatively evaluate the model with respect to realistic prediction, prediction performance and prediction "interpretability". Initial results suggest, the model even when solely trained on a synthetic dataset, can predict realistic and interpretable trajectories with better than state-of-the-art accuracy.

An ASP approach for reasoning on neural networks under a finitely many-valued semantics for weighted conditional knowledge bases Artificial Intelligence

Weighted knowledge bases for description logics with typicality have been recently considered under a "concept-wise" multipreference semantics (in both the two-valued and fuzzy case), as the basis of a logical semantics of MultiLayer Perceptrons (MLPs). In this paper we consider weighted conditional ALC knowledge bases with typicality in the finitely many-valued case, through three different semantic constructions, based on coherent, faithful and phi-coherent interpretations. For the boolean fragment LC of ALC we exploit ASP and "asprin" for reasoning with the concept-wise multipreference entailment under a phi-coherent semantics, suitable to characterize the stationary states of MLPs. As a proof of concept, we experiment the proposed approach for checking properties of trained MLPs.

Real-Time Facial Expression Recognition using Facial Landmarks and Neural Networks Artificial Intelligence

This paper presents a lightweight algorithm for feature extraction, classification of seven different emotions, and facial expression recognition in a real-time manner based on static images of the human face. In this regard, a Multi-Layer Perceptron (MLP) neural network is trained based on the foregoing algorithm. In order to classify human faces, first, some pre-processing is applied to the input image, which can localize and cut out faces from it. In the next step, a facial landmark detection library is used, which can detect the landmarks of each face. Then, the human face is split into upper and lower faces, which enables the extraction of the desired features from each part. In the proposed model, both geometric and texture-based feature types are taken into account. After the feature extraction phase, a normalized vector of features is created. A 3-layer MLP is trained using these feature vectors, leading to 96% accuracy on the test set.

Directed Weight Neural Networks for Protein Structure Representation Learning Artificial Intelligence

A protein performs biological functions by folding to a particular 3D structure. To accurately model the protein structures, both the overall geometric topology and local fine-grained relations between amino acids (e.g. side-chain torsion angles and inter-amino-acid orientations) should be carefully considered. In this work, we propose the Directed Weight Neural Network for better capturing geometric relations among different amino acids. Extending a single weight from a scalar to a 3D directed vector, our new framework supports a rich set of geometric operations on both classical and SO(3)--representation features, on top of which we construct a perceptron unit for processing amino-acid information. In addition, we introduce an equivariant message passing paradigm on proteins for plugging the directed weight perceptrons into existing Graph Neural Networks, showing superior versatility in maintaining SO(3)-equivariance at the global scale. Experiments show that our network has remarkably better expressiveness in representing geometric relations in comparison to classical neural networks and the (globally) equivariant networks. It also achieves state-of-the-art performance on various computational biology applications related to protein 3D structures.