The recent explosion of data set size, in number of records and attributes, has triggered the development of a number of big data platforms as well as parallel data analytics algorithms. At the same time though, it has pushed for usage of data dimensionality reduction procedures. Indeed, more is not always better. Large amounts of data might sometimes produce worse performances in data analytics applications.
This cheat sheet was produced by DataCamp, and it is based on the Keras library..Keras is an easy-to-use and powerful library for Theano and TensorFlow that provides a high-level neural networks API to develop and evaluate deep learning models. Originally posted here in PDF format. Click on the image below to zoom in.
In the 1990s, there was a popular book called Re-engineering the Corporation. Looking back now, Re-engineering certainly has had a mixed success – but it did have an impact over the last two decades. ERP deployments led by SAP and others were a direct result of the Business Process re-engineering phenomenon.
I always enjoy these industry-spanning infographics. They sometimes point me to companies I want to understand in greater depth. The inclusion of SAS for example as a BI enterprise system and the total absence of IBM SPSS from the data science category are huge red flags. These two companies alone control what is at least 1/3rd of the data science platform market among the global 8,000 companies with more than $1 Billion in revenue.
Think Sets and Functions, rather than manipulation of number arrays/rectangles: Linear Algebra is often introduced at the high-school level as computations one can perform on vectors and matrices - Matrix multiplication, Gauss elimination, Determinants, sometimes even Eigenvalue calculations, and I believe this introduction is quite detrimental to one's understanding of Linear Algebra. This computational approach continues on in many undergrad (and sometimes grad) level courses in Engineering and the Social Sciences. In fact, many Computer Scientists deal with Linear Algebra for decades of their professional life with this narrow (and in my opinion, harmful) view. I believe the right way to learn Linear Algebra is to view vectors as elements in a Set (Vector Space), and matrices as functions from one vector space to another. A vector of n numbers is an element in the vector space R n, and a m x n matrix is a function from R n to R m. Beyond this, all one needs to understand is that ...
Here is a short overview about how PCA (principal component analysis) works for dimension reduction, that is, to select k features (also called variables) among a larger set of n features, with k much smaller than n. This smaller set of k features built with PCA is the best subset of k features, in the sense that it minimizes the variance of the residual noise when fitting data to a linear model. Note that PCA transforms the initial features into new ones, that are linear combinations of the original features.