@machinelearnbot


Is Python or Perl faster than R?

@machinelearnbot

Though a lot of statistical / machine learning algorithms are now being implemented in Python - see Python and R articles - and it seems that Python is more appropriate for production code and big data flowing in real time, while R is often used for EDA - exporatory data analysis - in manual mode. My question is, if you make a true apple-to-apple comparison, what kind of computations does Python perform much faster than R, (or the other way around) depending on data size / memory size? Here I have in mind algorithms such as classifying millions of keywords, something requiring trillions of operations and not easy to do with Hadoop, requiring very efficient algorithms designed for sparse data (sometimes called sparse computing). For instance, the following article topic (see data science book pp 118-122) shows a Perl script running 10 times faster than the R equivalent, to produce R videos, but it's not because of a language or compiler issue, it's because the Perl version pre-computes all video frames very fast and load them in memory, then the video is displayed (using R ironically), while the R version produces (and displays) one frame at a time and does the whole job in R. What about accelerating tools, such as the CUDA accelerator for R?


Deep Learning Research Review Week 1: Generative Adversarial Nets

@machinelearnbot

This week, I'll be doing a new series called Deep Learning Research Review. The way the authors combat this is by using multiple CNN models to sequentially generate images in increasing scales. The approach the authors take is training a GAN that is conditioned on text features created by a recurrent text encoder (won't go too much into this, but here's the paper for those interested). In order to create these versatile models, the authors train with three types of data: {real image, right text}, {fake image, right text}, and {real image, wrong text}.


YOLO: Core ML versus MPSNNGraph

@machinelearnbot

The Core ML conversion tools do not support Darknet, so we'll first convert the Darknet files to Keras format. However, as I'm writing this the Core ML conversion tools only support Keras version 1.2.2. Now that we have YOLO in a format that the Core ML conversion tools support, we can write a Python script to turn it into the .mlmodel Note: You do not need to perform these steps if you just want to run the demo app. This means we need to put our input images into a CVPixelBuffer object somehow, and also resize this pixel buffer to 416 416 pixels -- or Core ML won't accept it.


A Semi-Supervised Classification Algorithm using Markov Chain and Random Walk in R

@machinelearnbot

From each of the unlabeled points (Markov states) a random walk with Markov transition matrix (computed from the row-stochastic kernelized distance matrix) will be started that will end in one labeled state, which will be an absorbing state in the Markov Chain. As can be seen, with increasing iterations, the probability that the state ends in that particular red absorbing state with state index 323 increases, the length of a bar in the second barplot represents the probability after an iteration and the color represents two absorbing and the other unlabeled states, where the w vector shown contains 1000 states, since the number of datapoints 1000. Each time a new unlabeled (black) point is selected, a random walk is started with the underlying Markov transition matrix and the power-iteration is continued until it terminates to one of the absorbing states with high probability. Since only two absorbing states are there, finally the point will be labeled with the label (red or blue) of the absorbing state where the random walk is likely to terminate with higher probability.


The world's first protein database for Machine Learning and AI

@machinelearnbot

I am incredibly proud and excited to present the very first public product of Peptone, the Database of Structural Propensities of Proteins. Database of Structural Propensities of Proteins (dSPP) is the world's first interactive repository of structural and dynamic features of proteins with seamless integration for leading Machine Learning frameworks, Keras and Tensorflow. As opposed to binary (logits) secondary structure assignments available in other protein datasets for experimentalists and the machine learning community, dSPP data report on protein structure and local dynamics at the residue level with atomic resolution, as gauged from continuous structural propensity assignment in a range -1.0 to 1.0. Seamless dSPP integration with Keras and Tensorflow machine learning frameworks is achieved via dspp-keras Python package, available for download and setup in under 60 seconds time.


Sentiment Analysis of Movie Reviews (2): word2vec

@machinelearnbot

If we want to find similarities between words, we have to look at a corpus of texts, build a co-occurrence matrix and perform dimensionality reduction (using, e.g., singular value decomposition). Using so-called distributed representations, a word can be represented as a vector of (say 100, 200, … whatever works best) real numbers. And as we will see, with this representation, it is possible to model semantic relationships between words! This makes a lot of sense: So amazing is most similar, then we have words like excellent and outstanding.


Some Image and Video Processing: Motion Estimation with Block-Matching in Videos, Noisy and Motion-blurred Image Restoration with Inverse Filter in Python and OpenCV

@machinelearnbot

The following figure shows how the quality of the transformed image decreases when compared to the original image, when an nxn LPF is applied and how the quality (measured in terms of PSNR) degrades as n (LPF kernel width) increases. As we go on increasing the kernel size, the quality fo the final image obtained by down/up sampling the original image decreases as n increases, as shown in the following figure. The first one is the video of some students working on a university corridor, as shown below (obtained from youtube), extract some consecutive frames, mark a face in one image and use that image to mark all thew faces om the remaining frames that are consecutive to each other, thereby mark the entire video and estimate the motion using the simple block matching technique only. The following figure shows the frame with the face marked, now we shall use this image and block matching technique to estimate the motion of the student in the video, by marking his face in all the consecutive frames and reconstructing the video, as shown below.. As can be seen from the following figure, the optimal median filter size is 5 5, which generates the highest quality output, when compared to the original image.


Python: Implementing a k-means algorithm with sklearn

@machinelearnbot

The purpose of k-means clustering is to be able to partition observations in a dataset into a specific number of clusters in order to aid in analysis of the data. Specifically, the k-means scatter plot will illustrate the clustering of specific stock returns according to their dividend yield. Specifically, we are devising a range from 1 to 20 (which represents our number of clusters), and our score variable denotes the percentage of variance explained by the number of clusters. Therefore, we set n_clusters equal to 3, and upon generating the k-means output use the data originally transformed using pca in order to plot the clusters: From the above, we see that the clustering algorithm demonstrates an overall positive correlation between stock returns and dividend yields, implying that stocks paying higher dividend yields can be expected to have higher overall returns.


GDPR – A Change in the Making

@machinelearnbot

Organizations all over the EU must be aware by now that the Data Protection Act (DPA) will be changed into GDPR (General Data Protection Regulation). GDPR (General Data Protection Regulation) was drafted to ensure that the privacy rights of EU citizens aren't threatened in anyway. Moreover, without client consent to access their data, companies cannot use personal client information in order to gain business insights and improve the Customer Experience. In the long term, there's an opportunity to differentiate your organization from your competition, and secure a competitive advantage by gaining client consent to use personal data and improve the Customer Experience.