Statistical Learning


NYC Data Science Academy

#artificialintelligence

This 20-hour Machine Learning with Python course covers all the basic machine learning methods and Python modules (especially Scikit-Learn) for implementing them. The five sessions cover: simple and multiple Linear regressions; classification methods including logistic regression, discriminant analysis and naive bayes, support vector machines (SVMs) and tree based methods; cross-validation and feature selection; regularization; principal component analysis (PCA) and clustering algorithms. After successfully completing of this course, you will be able to explain the principles of machine learning algorithms and implement these methods to analyze complex datasets and make predictions in Python.


State of the Art Survey of Deep Learning and Machine Learning Models for Smart Cities and Urban Sustainability

#artificialintelligence

Deep learning (DL) and machine learning (ML) methods have recently contributed to the advancement of models in the various aspects of prediction, planning, and uncertainty analysis of smart cities and urban development. This paper presents the state of the art of DL and ML methods used in this realm. Through a novel taxonomy, the advances in model development and new application domains in urban sustainability and smart cities are presented. Findings reveal that five DL and ML methods have been most applied to address the different aspects of smart cities. These are artificial neural networks; support vector machines; decision trees; ensembles, Bayesians, hybrids, and neuro-fuzzy; and deep learning.


Interpretability: Cracking open the black box – Part III

#artificialintelligence

Previously, we looked at the pitfalls with the default "feature importance" in tree based models, talked about permutation importance, LOOC importance, and Partial Dependence Plots. Now let's switch lanes and look at a few model agnostic techniques which takes a bottom-up way of explaining predictions. Instead of looking at the model and trying to come up with global explanations like feature importance, these set of methods look at each single prediction and then try to explain them. As the name suggests, this is a model agnostic technique to generate local explanations to the model. The core idea behind the technique is quite intuitive. Suppose we have a complex classifier, with a highly non-linear decision boundary.


71 Data Science Interview Questions and Answers - Crack Technical Interview Now! - DataFlair

#artificialintelligence

DataFlair has published a series of top data science interview questions and answers which contains 130 questions of all the levels. This is the second part of the Data Science Interview Questions and Answers series. In our first part, we discussed some basic level questions which could be asked in your next interview, especially if you are a fresher in Data Science. Today, I am sharing the top 71 Data Science Interview Questions and Answers. This is the only part where you will get best scenario-based interview questions for data scientist interviews. A Data Science Interview is not a test of your knowledge, but your ability to do it at the right time. Every data science interview has many Python-related questions, so if you really want to crack your next data science interview, you need to master Python. Q.1 What is a lambda expression in Python? With the help of lambda expression, you can create an anonymous function. Unlike conventional functions, lambda functions occupy a single line of code. We obtain the output of 25. Q.2 How will you measure the Euclidean distance between the two arrays in numpy?


Classify A Rare Event Using 5 Machine Learning Algorithms - KDnuggets

#artificialintelligence

Supervised Learning is the crown jewel of Machine Learning. A couple years ago, Harvard Business Review released an article with the following title "Data Scientist: The Sexiest Job of the 21st Century." Ever since its release, Data Science or Statistics Departments become widely pursued by college students and, and Data Scientists (Nerds), for the first time, is referred to as being sexy. For some industries, Data Scientists have reshaped the corporation structure and reallocated a lot of decision-makings to the "front-line" workers. Being able to generate useful business insights from data has never been so easy.


Classify A Rare Event Using 5 Machine Learning Algorithms - KDnuggets

#artificialintelligence

Supervised Learning is the crown jewel of Machine Learning. A couple years ago, Harvard Business Review released an article with the following title "Data Scientist: The Sexiest Job of the 21st Century." Ever since its release, Data Science or Statistics Departments become widely pursued by college students and, and Data Scientists (Nerds), for the first time, is referred to as being sexy. For some industries, Data Scientists have reshaped the corporation structure and reallocated a lot of decision-makings to the "front-line" workers. Being able to generate useful business insights from data has never been so easy.


Artificial Intelligence - Changing the way we diagnose cancer

#artificialintelligence

Big thanks to technological advances in areas like genetics, imaging, cancer is now more likely to be caught at an earlier stage than it was decades ago. Though, the accuracy in medical imaging diagnosis is still low, with the professionals witnessing 20-30 percent wrong negatives in chest X-rays and mammographies. AI can prevent this, and the fact that healthcare is data-rich is an added benefit. The more data visible to them, the more likely they can uncover the hidden patterns inside it that can be used to perform diagnosis. Over time, many machine learning algorithms have been introduced, but traditional forms, like logistic regression, have demonstrated the most usefulness in clinical oncology research.


Maria Schuld: "Innovating machine learning with near-term quantum computing"

#artificialintelligence

Machine Learning for Physics and the Physics of Learning 2019 Workshop IV: Using Physical Insights for Machine Learning "Innovating machine learning with near-term quantum computing" Maria Schuld - University of KwaZulu-Natal & Xanadu Abstract: Algorithms that run on quantum computers - so-called quantum circuits - underlie different laws of information processing than conventional computations. By optimizing the physical parameters of quantum circuits we can turn these algorithms into trainable models which learn to generalize from data. This talk highlights different aspects of such "variational quantum machine learning algorithms", including their role in the development of near-term quantum technologies, their interpretation as a cross-breed of neural networks and support vector machines, strategies of automatic differentiation, and how to integrate quantum circuits with machine learning frameworks such as PyTorch and Tensorflow using open-source software.


Deception detection on the Bag-of-lies dataset

#artificialintelligence

Lie detection has been a topic of interest since the beginning of the 20th century, and since then a lot of different methods have been used to try to achieve this, such as changes in inspiration-expiration ratio, increases in systolic blood pressure, dilatation of the pupil size, heart rate, etc. Usually, when people think about lie detection, the most common method that comes to mind is the polygraph. This method combines various techniques to detect autonomic reactions which include changes in body functions that are not easily controlled by the conscious mind. However, still requires a large amount of training which is achieved by control questions where the answers are known to later compare how the subject reacts. Polygraph offers an accuracy of around 70% in the general population⁴, a number which is greater than trained humans can achieve by just looking at the person, however, this doesn't mean that this method is infallible since people have found ways to cheat the system by just training or by using drugs to suppress these reactions. In general, these methods usually have not offered as good results as to be used in court in most countries.


Bayesian Product Ranking at Wayfair Wayfair

#artificialintelligence

Given sufficient data, we could just use the logistic regression model without further changes. Wayfair handled more than 9 million orders last quarter alone, which initially might sound like more than enough. However, those orders were spread out among millions of products, yielding just a few orders per product at most. Small integers like these can be extremely noisy, so we always have to worry that one product simply seems better than another because of random chance. For example, it is hard to tell if a product that happened to attract three orders is actually any better than one that happened to attract two, or if it just got lucky.