Data Science is a blend of various tools, algorithms, and machine learning principles to discover hidden patterns from the raw data. What makes it different from statistics is that data scientists use various advanced machine learning algorithms to identify the occurrence of a particular event in the future. A Data Scientist will look at the data from many angles, sometimes angles not known earlier. Data Visualization is one of the most important branches of data science. It is one of the main tools used to analyze and study relationships between different variables.

Can I become a data scientist with little or no math background? What essential math skills are important in data science? There are so many good packages that can be used for building predictive models or for producing data visualizations. Thanks to these packages, anyone can build a model or produce a data visualization. However, very solid background knowledge in mathematics is essential for fine-tuning your models to produce reliable models with optimal performance.

There is no such thing as a perfect machine learning model. A model's overall reported error has been incorporated into its contributions from several sources. The predictive power of a model, therefore, depends on the experience of the data scientist in dealing with these sources of error. In this blog, we discuss 12 common errors in machine learning. Data collection can produce errors at different levels.

A Support Vector Machine (SVM) is a classification method that samples hyperplanes which separate between two or multiple classes. Eventually, the hyperplane with the highest margin is retained, where "margin" is defined as the minimum distance from sample points to the hyperplane. The sample point(s) that form margin are called support vectors and establish the final SVM model. Bayes classifiers are based on a statistical model (i.e., Bayes theorem: calculating posterior probabilities based on the prior probability and the so-called likelihood). A Naive Bayes classifier assumes that all attributes are conditionally independent, thereby, computing the likelihood is simplified to the product of the conditional probabilities of observing individual attributes given a particular class label. Artificial Neural Networks (ANN) are graph-like classifiers that mimic the structure of a human or animal "brain" where the interconnected nodes represent the neurons. Decision tree classifiers are tree like graphs, where nodes in the graph test certain conditions on a particular set of features, and branches split the decision towards the leaf nodes. Leaves represent lowest level in the graph and determine the class labels. Optimal tree are trained by minimizing Gini impurity, or maximizing information gain.