Collaborating Authors


DeepDream: How Alexander Mordvintsev Excavated the Computer's Hidden Layers


Early in the morning on May 18, 2015, Alexander Mordvintsev made an amazing discovery. He had been having trouble sleeping. Just after midnight, he awoke with a start. He was sure he'd heard a noise in the Zurich apartment where he lived with his wife and child. Afraid that he hadn't locked the door to the terrace, he ran out of the bedroom to check if there was an intruder. All was fine; the terrace door was locked, and there was no intruder.

How to master Machine Learning during lockdown?


Machine learning is one of the most intriguing applications of Artificial Intelligence. One can start from scratch and learn it by mastering, Python, Anaconda, Jupyter, scikit-learn, NumPy, Matplotlib, and OpenCV.

Why Do So Many Practicing Data Scientists Not Understand Logistic Regression?


The U.S. Weather Service has always phrased rain forecasts as probabilities. I do not want a classification of "it will rain today." There is a slight loss/disutility of carrying an umbrella, and I want to be the one to make the tradeoff. This is coming from personal experience and from multiple contexts, but it seems that many data scientists simply do not understand logistic regression, or binomials and multinomials in general. The problem arises from logistic regression often being taught as a "classification" algorithm in the machine learning world.

How to tell if your model is over-fit using unlabeled data


In many settings, unlabeled data is plentiful (think images, text, etc), while sufficient labeled data for supervised learning might be harder to obtain. In these situations, it can be difficult to determine how well the model will generalize. Most methods for assessing model performance rely on labeled data alone, e.g. Without enough labeled data these can be unreliable. Is there anything more we can learn about the model's ability to generalize from unlabeled data? In this article, I demonstrate how unlabeled data can frequently be used to bound test loss.

Handling Missing Data For Advanced Machine Learning


Throughout this article, you will become good at spotting, understanding, and imputing missing data. We demonstrate various imputation techniques on a real-world logistic regression task using Python. Properly handling missing data has an improving effect on inferences and predictions. This is not to be ignored. The first part of this article presents the framework for understanding missing data.

Advantages and Disadvantages of Logistic Regression


Logistic Regression is one of the supervised Machine Learning algorithms used for classification i.e. to predict discrete valued outcome. It is a statistical approach that is used to predict the outcome of a dependent variable based on observations given in the training set. Logistic Regression is one of the simplest machine learning algorithms and is easy to implement yet provides great training efficiency in some cases. Also due to these reasons, training a model with this algorithm doesn't require high computation power. The predicted parameters (trained weights) give inference about the importance of each feature.

Data Science questions for interview prep (Machine Learning Concepts) -Part I


I recently finished watching this Machine Learning playlist (StatQuest by Josh Starmer) on Youtube and thought of summarizing each concept into a Q/A. As I prepare for more data science interviews, I thought it would be a good exercise to make sure that I am communicating my thoughts clearly and concisely during an interview. Let me know in the comments, if I am not doing a good job in explaining any of the concepts. NOTE: This article is not aimed for teaching a concept to beginners. It assumes that the reader has sufficient background in data science concepts.

Data-crunching AI in Japan predicts one's chances of developing 20 diseases

The Japan Times

Health researchers have put artificial intelligence to work in crunching big data, allowing them to develop technology that can predict the future onset of around 20 diseases so people can make preventative lifestyle changes. The model developed at Hirosaki University and Kyoto University calculates one's probability of developing a disease within three years based on data obtained from voluntary health checkups on about 20,000 people in Japan. If a patient agrees to disclose data on some 20 categories collected during checkups, the model can project the potential development of arteriosclerosis, hypertension, chronic kidney disease, osteoporosis, coronary heart disease and obesity, among other conditions. The team set up two groups of people for each disease -- those whose data suggested they could develop the ailment in the future and a control group -- and crunched their health data to predict whether would will actually develop the disease. "We made correct predictions on whether individuals will develop the diseases within three years with high accuracy," said Yasushi Okuno, professor at Kyoto University's Graduate School of Medicine.

Bayes' Theorem in Layman's Terms


If you have difficulty in understanding Bayes' theorem, trust me you are not alone. In this tutorial, I'll help you to cross that bridge step by step. Let's consider Alex and Brenda are two people in your office, When you are working you saw someone walked in front of you, and you didn't notice who is she/he. Now I'll give you extra information, Let's calculate the probabilities with this new information, Probability that Alex is the person passed by is 2/5 i.e, Probability that Brenda is the person passed by is 3/5 i.e, Probabilities that we are calculated before the new information are called Prior, and probabilities that we are calculated after the new information are called Posterior. Consider a scenario where, Alex comes to the office 3 days a week, and Brenda comes to the office 1 day a week.