Statistical Learning


The best metric to measure accuracy of classification models CleverTap

#artificialintelligence

Unlike evaluating the accuracy of models that predict a continuous or discrete dependent variable like Linear Regression models, evaluating the accuracy of a classification model could be more complex and time-consuming. Before measuring the accuracy of classification models, an analyst would first measure its robustness with the help of metrics such as AIC-BIC, AUC-ROC, AUC- PR, Kolmogorov-Smirnov chart, etc. The next logical step is to measure its accuracy.


6 ways hackers will use machine learning to launch attacks

#artificialintelligence

Defined as the "ability for (computers) to learn without being explicitly programmed," machine learning is huge news for the information security industry. It's a technology that potentially can help security analysts with everything from malware and log analysis to possibly identifying and closing vulnerabilities earlier. Perhaps too, it could improve endpoint security, automate repetitive tasks, and even reduce the likelihood of attacks resulting in data exfiltration. Get the latest from CSO by signing up for our newsletters. Naturally, this has led to the belief that these intelligent security solutions will spot - and stop - the next WannaCry attack much faster than traditional, legacy tools.


Removing Outliers Using Standard Deviation in Python

@machinelearnbot

However, the first dataset has values closer to the mean and the second dataset has values more spread out. To be more precise, the standard deviation for the first dataset is 3.13 and for the second set is 14.67. However, it's not easy to wrap your head around numbers like 3.13 or 14.67. Right now, we only know that the second data set is more "spread out" than the first one. Let's put this to a more practical use.


AI Eats Insurance – Lemonade Stories

#artificialintelligence

When food delivery services talk breathlessly about machine learning, feel free to roll your eyes: it's baked salmon they're dropping off, not Bayesian statistics. Insurance is another kettle of fish altogether. The birth of statistics is usually dated to 1662, when John Graunt calculated the probabilities of Londoners surviving to a given age. Lloyds of London started shortly thereafter, and advances in statistics and insurance have been inseparable ever since. Insurers, of course, have machines too, but the machine's'secret power' is its ability to extract prophetic insights from inhuman quantities of data.


Linear Regression -- Machine Learning with TensorFlow and Oracle JET UI Explained

#artificialintelligence

Machine learning topic is definitely popular these days. Some get wrong assumptions about it -- they think machine could learn by itself and its kind of magic. The truth is -- there is no magic, but math behind it. Machine will learn the way math model is defined for learning process. In my opinion, the best solution is a combination of machine learning math and algorithms.


A Tour of The Top 10 Algorithms for Machine Learning Newbies

#artificialintelligence

In machine learning, there's something called the "No Free Lunch" theorem. In a nutshell, it states that no one algorithm works best for every problem, and it's especially relevant for supervised learning (i.e. For example, you can't say that neural networks are always better than decision trees or vice-versa. There are many factors at play, such as the size and structure of your dataset. As a result, you should try many different algorithms for your problem, while using a hold-out "test set" of data to evaluate performance and select the winner.


Pairs Trading Analysis with R Udemy

@machinelearnbot

It explores main concepts from basic to expert level which can help you achieve better grades, develop your academic career, apply your knowledge at work or do your research as experienced investor. Learning pairs trading analysis is indispensable for finance careers in areas such as quantitative research, quantitative development, and quantitative trading mainly within investment banks and hedge funds. It is also essential for academic careers in quantitative finance. And it is necessary for experienced investors quantitative trading research and development. But as learning curve can become steep as complexity grows, this course helps by leading you step by step using MSCI Countries Indexes ETF prices historical data for back-testing to achieve greater effectiveness.


Which Machine Learning Algo will continue to be in use in year 2118?

#artificialintelligence

So what were the answers popping in your head? Random forest, SVM, K means, Knn or even Deep Learning and its variants? Now some of you might laugh and say how on earth can you predict so far ahead, predicting things 100 yrs into future is crazy. Well the answer is Lindy effect. Yes, the heuristic I am using to predict this is Lindy Effect.


Gradient Boosting in TensorFlow vs XGBoost

@machinelearnbot

Tensorflow 1.4 was released a few weeks ago with an implementation of Gradient Boosting, called TensorFlow Boosted Trees (TFBT). Unfortunately, the paper does not have any benchmarks, so I ran some against XGBoost. For many Kaggle-style data mining problems, XGBoost has been the go-to solution since its release in 2016. It's probably as close to an out-of-the-box machine learning algorithm as you can get today, as it gracefully handles un-normalized or missing data, while being accurate and fast to train. The code to reproduce the results in this article is on GitHub.


Understanding Principal Component Analysis – Hacker Noon

#artificialintelligence

The purpose of this post is to give the reader detailed understanding of Principal Component Analysis with the necessary mathematical proofs. In real world data analysis tasks we analyze complex data i.e. multi dimensional data. We plot the data and find various patterns in it or use it to train some machine learning models. One way to think about dimensions is that suppose you have an data point x, if we consider this data point as a physical object then dimensions are merely a basis of view, like where is the data located when it is observed from horizontal axis or vertical axis. As the dimensions of data increases, the difficulty to visualize it and perform computations on it also increases.