This post is a part of my forthcoming book on Mathematical foundations of Data Science. In this post, we use the Perceptron algorithm to bridge the gap between high school maths and deep learning. As part of my role as course director of the Artificial Intelligence: Cloud and Edge Computing at the University…, I see more students who are familiar with programming than with mathematics. They have last learnt maths years ago at University. And then, suddenly they find that they encounter matrices, linear algebra etc when they start learning Data Science.
Attention networks such as transformers have been shown powerful in many applications ranging from natural language processing to object recognition. This paper further considers their robustness properties from both theoretical and empirical perspectives. Theoretically, we formulate a variant of attention networks containing linearized layer normalization and sparsemax activation, and reduce its robustness verification to a Mixed Integer Programming problem. Apart from a na\"ive encoding, we derive tight intervals from admissible perturbation regions and examine several heuristics to speed up the verification process. More specifically, we find a novel bounding technique for sparsemax activation, which is also applicable to softmax activation in general neural networks. Empirically, we evaluate our proposed techniques with a case study on lane departure warning and demonstrate a performance gain of approximately an order of magnitude. Furthermore, although attention networks typically deliver higher accuracy than general neural networks, contrasting its robustness against a similar-sized multi-layer perceptron surprisingly shows that they are not necessarily more robust.
This paper presents a lightweight algorithm for feature extraction, classification of seven different emotions, and facial expression recognition in a real-time manner based on static images of the human face. In this regard, a Multi-Layer Perceptron (MLP) neural network is trained based on the foregoing algorithm. In order to classify human faces, first, some pre-processing is applied to the input image, which can localize and cut out faces from it. In the next step, a facial landmark detection library is used, which can detect the landmarks of each face. Then, the human face is split into upper and lower faces, which enables the extraction of the desired features from each part. In the proposed model, both geometric and texture-based feature types are taken into account. After the feature extraction phase, a normalized vector of features is created. A 3-layer MLP is trained using these feature vectors, leading to 96% accuracy on the test set.
Linear prediction is the cornerstone of a significant group of statistical learning algorithms including linear regression, Support Vector Machines (SVM), regularized regressions (such as ridge, elastic net, lasso, and its variants), logistic regression, Poisson regression, probit models, single-layer perceptrons, and tensor regression, just to name a few. Thus, developing a deeper understanding of the pertinent linear prediction models and generalizing the methods to provide unified theoretical bounds is of critical importance to the machine learning community. For the past few decades, researchers have unveiled different aspects of these linear models. Bartlett and Shawe-Taylor (1999) obtained high confidence generalization error bounds for SVMs and other learning algorithms such as boosting and Bayesian posterior classifier. Vapnik-Chervonenkis (VC) theory (Vapnik, 2013) and Rademacher complexity (Bartlett and Mendelson, 2001, 2002) have been instrumental in the machine learning literature to provide generalization bounds (Shalev-Shwartz and Ben-David, 2014). Theoretical properties of the multiple-instance extensions of SVM were analyzed by Doran and Ray (2014). Joint first authors contributed equally to this work.
Multilayer Perceptrons (MLP), are complex algorithms that take a lot of compute power and a *ton* of data in order to produce satisfactory results in reasonable timeframes. Let's start with what they're not: neural networks, despite the name and every blog post and intro to machine learning text book you've probably read up till now, are not analogs of the human brain. There are some *very* surface-level similarities, but the actual functionality of a neural network has almost nothing in common with the neurons that make up the approximately three pounds of meat that sits between your ears and defines everything you do and how you experience reality. Just like a lot of other machine learning algorithms, they use the formula "label equals weight times data value plus offset" (or y w*x b) to define where they draw their lines/hyperplanes for making predictions. In machine learning, that slope is called a weight.)
Multilayer Perceptrons (MLP), are complex algorithms that take a lot of compute power and a *ton* of data in order to produce satisfactory results in reasonable timeframes. Let's start with what they're not: neural networks, despite the name and every blog post and intro to machine learning text book you've probably read up till now, are not analogs of the human brain. There are some *very* surface-level similarities, but the actual functionality of a neural network has almost nothing in common with the way the neurons that make up the approximately three pounds of meat that sits between your ears and defines everything you do and how you experience reality. Just like a lot of other machine learning algorithms, they use the formula "label equals weight times data value plus offset" (or y w*x b) to define where they draw their lines/hyperplanes for making predictions. In machine learning, that slope is called a weight.)
In this paper, we discover a two-phase phenomenon in the learning of multi-layer perceptrons (MLPs). I.e., in the first phase, the training loss does not decrease significantly, but the similarity of features between different samples keeps increasing, which hurts the feature diversity. We explain such a two-phase phenomenon in terms of the learning dynamics of the MLP. Furthermore, we propose two normalization operations to eliminate the two-phase phenomenon, which avoids the decrease of the feature diversity and speeds up the training process.
The pervasiveness of GPS-enabled mobile devices and the widespread use of location-based services have resulted in the generation of massive amounts of geo-tagged data. In recent times, the data analysis now has access to more sources, including reviews, news, and images, which also raises questions about the reliability of Point-of-Interest (POI) data sources. While previous research attempted to detect fake POI data through various security mechanisms, the current work attempts to capture the fake POI data in a much simpler way. The proposed work is focused on supervised learning methods and their capability to find hidden patterns in location-based data. The ground truth labels are obtained through real-world data, and the fake data is generated using an API, so we get a dataset with both the real and fake labels on the location data. The objective is to predict the truth about a POI using the Multi-Layer Perceptron (MLP) method. In the proposed work, MLP based on data classification technique is used to classify location data accurately. The proposed method is compared with traditional classification and robust and recent deep neural methods. The results show that the proposed method is better than the baseline methods.
We establish a broad methodological foundation for mixed-integer optimization with learned constraints. We propose an end-to-end pipeline for data-driven decision making in which constraints and objectives are directly learned from data using machine learning, and the trained models are embedded in an optimization formulation. We exploit the mixed-integer optimization-representability of many machine learning methods, including linear models, decision trees, ensembles, and multi-layer perceptrons. The consideration of multiple methods allows us to capture various underlying relationships between decisions, contextual variables, and outcomes. We also characterize a decision trust region using the convex hull of the observations, to ensure credible recommendations and avoid extrapolation. We efficiently incorporate this representation using column generation and clustering. In combination with domain-driven constraints and objective terms, the embedded models and trust region define a mixed-integer optimization problem for prescription generation. We implement this framework as a Python package (OptiCL) for practitioners. We demonstrate the method in both chemotherapy optimization and World Food Programme planning. The case studies illustrate the benefit of the framework in generating high-quality prescriptions, the value added by the trust region, the incorporation of multiple machine learning methods, and the inclusion of multiple learned constraints.