Top 10 Machine Learning Algorithms for Data Science


For the majority of newcomers, machine learning algorithms may seem too boring and complicated subject to be mastered. Well, to some extent, this is true. In most cases, you stumble upon a few-page description for each algorithm and yes, it's hard to find time and energy to deal with each and every detail. However, if you truly, madly, deeply want to be an ML-expert, you have to brush up your knowledge regarding it and there is no other way to be. But relax, today I will try to simplify this task and explain core principles of 10 most common algorithms in simple words (each includes a brief description, guides, and useful links).

In praise of the autoencoder


When you consider all the machine learning (ML) algorithms, you'll find there is a subset of very pragmatic ones: neural networks. They usually require no statistical hypothesis and no specific data preparation except for normalization. The power of each network lies in its architecture, its activation functions, its regularization terms, plus a few other features. When you consider architectures for neural networks, there is a very versatile one that can serve a variety of purposes -- two in particular: detection of unknown unexpected events and dimensionality reduction of the input space. This neural network is called autoencoder.

Deep fakes, falsification of reality - Hello Future Orange


A threat to democracy (manipulation of public opinion, aggravation of social or community-based tensions, etc.), invasion of a person's privacy or violation of their dignity, a risk of fraud or scams, or still a headache for future researchers seeking the truth, etc. Deep fakes, technologies that make it possible to replace someone's face with that of another person in a video thanks to deep learning, are worrying. Ever more sophisticated, today they are pretty much available to all thanks to relatively easy-to-use tools. A chat about these "weapons of mass falsification" with Vincent Nozick, a teacher and researcher at the LIGM and co-author of a publication presenting an efficient method to detect deep fakes (MesoNet: a Compact Facial Video Forgery Detection Network, Darius Afchar, Vincent Nozick, Junichi Yamagishi, Isao Echizen, 2018). There are several methods for tampering with faces, some of which do use deep learning, like Deepfake, which is one of the most well-known. Deepfake is a program that belongs to the GAN family, Generative Adversarial Networks, and makes it possible to transfer facial expressions onto video.

Neural Anomaly Detection Using Keras -- Visual Studio Magazine


An advantage of using a neural technique compared to a standard clustering technique is that neural techniques can handle non-numeric data by encoding that data. Anomaly detection, also called outlier detection, is the process of finding rare items in a dataset. Examples include finding fraudulent login events and fake news items. Take a look at the demo program in Figure 1. The demo examines a 1,000-item subset of the well-known MNIST (modified National Institute of Standards and Technology) dataset.

Self-adversarial Variational Autoencoder with Gaussian Anomaly Prior Distribution for Anomaly Detection Artificial Intelligence

Recently, deep generative models have become increasingly popular in unsupervised anomaly detection. However, deep generative models aim at recovering the data distribution rather than detecting anomalies. Besides, deep generative models have the risk of overfitting training samples, which has disastrous effects on anomaly detection performance. To solve the above two problems, we propose a Self-adversarial Variational Autoencoder with a Gaussian anomaly prior assumption. We assume that both the anomalous and the normal prior distribution are Gaussian and have overlaps in the latent space. Therefore, a Gaussian transformer net T is trained to synthesize anomalous but near-normal latent variables. Keeping the original training objective of Variational Autoencoder, besides, the generator G tries to distinguish between the normal latent variables and the anomalous ones synthesized by T, and the encoder E is trained to discriminate whether the output of G is real. These new objectives we added not only give both G and E the ability to discriminate but also introduce additional regularization to prevent overfitting. Compared with the SOTA baselines, the proposed model achieves significant improvements in extensive experiments. Datasets and our model are available at a Github repository.

Deep Learning Multidimensional Projections Machine Learning

Dimensionality reduction methods, also known as projections, are frequently used for exploring multidimensional data in machine learning, data science, and information visualization. Among these, t-SNE and its variants have become very popular for their ability to visually separate distinct data clusters. However, such methods are computationally expensive for large datasets, suffer from stability problems, and cannot directly handle out-of-sample data. We propose a learning approach to construct such projections. We train a deep neural network based on a collection of samples from a given data universe, and their corresponding projections, and next use the network to infer projections of data from the same, or similar, universes. Our approach generates projections with similar characteristics as the learned ones, is computationally two to three orders of magnitude faster than SNE-class methods, has no complex-to-set user parameters, handles out-of-sample data in a stable manner, and can be used to learn any projection technique. We demonstrate our proposal on several real-world high dimensional datasets from machine learning.

Deep learning approach based on dimensionality reduction for designing electromagnetic nanostructures Machine Learning

In this paper, we demonstrate a computationally efficient new approach based on deep learning (DL) techniques for analysis, design, and optimization of electromagnetic (EM) nanostructures. We use the strong correlation among features of a generic EM problem to considerably reduce the dimensionality of the problem and thus, the computational complexity, without imposing considerable errors. By employing the dimensionality reduction concept using the more recently demonstrated autoencoder technique, we redefine the conventional many-to-one design problem in EM nanostructures into a one-to-one problem plus a much simpler many-to-one problem, which can be simply solved using an analytic formulation. This approach reduces the computational complexity in solving both the forward problem (i.e., analysis) and the inverse problem (i.e., design) by orders of magnitude compared to conventional approaches. In addition, it provides analytic formulations that, despite their complexity, can be used to obtain intuitive understanding of the physics and dynamics of EM wave interaction with nanostructures with minimal computation requirements. As a proof-of-concept, we applied such an efficacious method to design a new class of on-demand reconfigurable optical metasurfaces based on phase-change materials (PCM). We envision that the integration of such a DL-based technique with full-wave commercial software packages offers a powerful toolkit to facilitate the analysis, design, and optimization of the EM nanostructures as well as explaining, understanding, and predicting the observed responses in such structures.

Deep Learning Based Autoencoder for Interference Channel Machine Learning

Deep learning (DL) based autoencoder has shown great potential to significantly enhance the physical layer performance. In this paper, we present a DL based autoencoder for interference channel. Based on a characterization of a k-user Gaussian interference channel, where the interferences are classified as different levels from weak to very strong interferences based on a coupling parameter {\alpha}, a DL neural network (NN) based autoencoder is designed to train the data set and decode the received signals. The performance such a DL autoencoder for different interference scenarios are studied, with {\alpha} known or partially known, where we assume that {\alpha} is predictable but with a varying up to 10\% at the training stage. The results demonstrate that DL based approach has a significant capability to mitigate the effect induced by a poor signal-to-noise ratio (SNR) and a high interference-to-noise ratio (INR). However, the enhancement depends on the knowledge of {\alpha} as well as the interference levels. The proposed DL approach performs well with {\alpha} up to 10\% offset for weak interference level. For strong and very strong interference channel, the offset of {\alpha} needs to be constrained to less than 5\% and 2\%, respectively, to maintain similar performance as {\alpha} is known.

Mitigation of Adversarial Examples in RF Deep Classifiers Utilizing AutoEncoder Pre-training Machine Learning

Adversarial examples in machine learning for images are widely publicized and explored. Illustrations of misclassifications caused by slightly perturbed inputs are abundant and commonly known (e.g., a picture of panda imperceptibly perturbed to fool the classifier into incorrectly labeling it as a gibbon). Similar attacks on deep learning (DL) for radio frequency (RF) signals and their mitigation strategies are scarcely addressed in the published work. Yet, RF adversarial examples (AdExs) with minimal waveform perturbations can cause drastic, targeted misclassification results, particularly against spectrum sensing/survey applications (e.g. BPSK is mistaken for 8-PSK). Our research on deep learning AdExs and proposed defense mechanisms are RF-centric, and incorporate physical world, over-the-air (OTA) effects. We herein present defense mechanisms based on pre-training the target classifier using an autoencoder. Our results validate this approach as a viable mitigation method to subvert adversarial attacks against deep learning-based communications and radar sensing systems.

Adversarially Approximated Autoencoder for Image Generation and Manipulation Machine Learning

Regularized autoencoders learn the latent codes, a structure with the regularization under the distribution, which enables them the capability to infer the latent codes given observations and generate new samples given the codes. However, they are sometimes ambiguous as they tend to produce reconstructions that are not necessarily faithful reproduction of the inputs. The main reason is to enforce the learned latent code distribution to match a prior distribution while the true distribution remains unknown. To improve the reconstruction quality and learn the latent space a manifold structure, this work present a novel approach using the adversarially approximated autoencoder (AAAE) to investigate the latent codes with adversarial approximation. Instead of regularizing the latent codes by penalizing on the distance between the distributions of the model and the target, AAAE learns the autoencoder flexibly and approximates the latent space with a simpler generator. The ratio is estimated using generative adversarial network (GAN) to enforce the similarity of the distributions. Additionally, the image space is regularized with an additional adversarial regularizer. The proposed approach unifies two deep generative models for both latent space inference and diverse generation. The learning scheme is realized without regularization on the latent codes, which also encourages faithful reconstruction. Extensive validation experiments on four real-world datasets demonstrate the superior performance of AAAE. In comparison to the state-of-the-art approaches, AAAE generates samples with better quality and shares the properties of regularized autoencoder with a nice latent manifold structure.