CORRELATION


Naive Principal Component Analysis in R

@machinelearnbot

Principal Component Analysis (PCA) is a technique used to find the core components that underlie different variables. Identify number of components (aka factors) In this stage, principal components (formally called'factors' at this stage) are identified among the set of variables. Cumulative var: variance added consecutively up to the last component. Cumulative proportion: the actually explained variance added consecutively up to the last component.


IT pros get a handle on machine learning and big data

#artificialintelligence

Even as an IT generalist, it pays to at least get comfortable with the matrix of machine learning outcomes, expressed with quadrants for the counts of true positives, true negatives, false positives (items falsely identified as positive) and false negatives (positives that were missed). For example, overall accuracy is usually defined as the number of instances that were truly labeled (true positives plus true negatives) divided by the total instances. If you want to know how many of the actual positive instances you are identifying, sensitivity (or recall) is the number of true positives found divided by the total number of actual positives (true positives plus false negatives). And often precision is important too, which is the number of true positives divided by all items labeled positive (true positives plus false positives).


Deep learning vs. machine learning: The difference starts with data

#artificialintelligence

The answer to the question of what makes deep learning different from traditional machine learning may have a lot... You forgot to provide an Email Address. For example, he pointed out that conventional machine learning algorithms often plateau on analytics performance after processing a certain amount of data. Comcast is also applying computer vision, audio analysis and closed-caption text analysis to video content to break movies and TV shows into "chapters" and automatically generate natural-language summaries for each chapter. Essa said that forward-thinking enterprises will find ways to leverage deep learning to develop new business models, while traditional machine learning is essentially relegated to helping businesses perform existing operations more efficiently.


Airbnb in NYC - Spatial Analysis of Illegal Activity

@machinelearnbot

Airbnb boasts almost two million listings in 34,000 cities, and according to data from Inside Airbnb, a independent data analysis website, listed about 36000 apartments in New York as of July 5, 2016. This data exploration sets out to visualize how Airbnb operates in New York City. Airbnb's presence in NYC has been clouded in controversy from the beginning, with law makers arguing that Airbnb drive up rents for New York residents, as well as facilitating a lot of illegal hosting activities, all the while not paying any of the fees hotels are subjected to. Rent is drived up when landlords decide to rather rent apartments to short-term guests at higher rates, compared to signing up tenants for yearlong leases. In a study conducted in 2014, The New York State Attorney General concluded that 72%of all units used as private short-term rentals on Airbnb during 2010 through mid-2014 appeared to violate both state and local New York laws.


Ten Myths About Machine Learning, by Pedro Domingos 7wData

#artificialintelligence

Machine learning used to take place behind the scenes: Amazon mined your clicks and purchases for recommendations, Google mined your searches for ad placement, and Facebook mined your social network to choose which posts to show you. But now machine learning is on the front pages of newspapers, and the subject of heated debate. Learning algorithms drive cars, translate speech, and win at Jeopardy! What can and can't they do? Are they the beginning of the end of privacy, work, even the human race?


Jackknife logistic and linear regression for clustering and predictions

@machinelearnbot

This article discusses a far more general version of the technique described in our article The best kept secret about regression. Here we adapt our methodology so that it applies to data sets with a more complex structure, in particular with highly correlated independent variables. Our goal is to produce a regression tool that can be used as a black box, be very robust and parameter-free, and usable and easy-to-interpret by non-statisticians. It is part of a bigger project: automating many fundamental data science tasks, to make it easy, scalable and cheap for data consumers, not just for data experts. Readers are invited to further formalize the technology outlined here, and challenge my proposed methodology.


Big Data and Machine Learning: Building a Recommendation Engine

#artificialintelligence

In the previous blog on machine learning, we learnt about applying machine learning techniques to recommendation engines and an overview of collaborative filtering (CF) algorithms implemented in Apache Mahout. In this post, we'll discuss how to build a recommendation engine using Mahout. Let us take an example of a movie rating application that allows users to rate movies and suggests other movies that they might like. Following could be a data set where some users have rated some movies on a scale of 1 to 5 (highest). The empty cells denote that the user has not rated the movie.


Ten Myths About Machine Learning, by Pedro Domingos

#artificialintelligence

Machine learning used to take place behind the scenes: Amazon mined your clicks and purchases for recommendations, Google mined your searches for ad placement, and Facebook mined your social network to choose which posts to show you. But now machine learning is on the front pages of newspapers, and the subject of heated debate. Learning algorithms drive cars, translate speech, and win at Jeopardy! What can and can't they do? Are they the beginning of the end of privacy, work, even the human race?


Put Away Your Machine Learning Hammer, Criminality Is Not A Nail

#artificialintelligence

Earlier this month, researchers claimed to have found evidence that criminality can be predicted from facial features. In "Automated Inference on Criminality using Face Images," Xiaolin Wu and Xi Zhang describe how they trained classifiers using various machine learning techniques that were able to distinguish photos of criminals from photos of non-criminals with a high level of accuracy. The result these researchers found can be interpreted differently depending on what assumptions you bring to interpreting it, and what question you're interested in answering. The authors simply assume there's no bias in the criminal justice system, and thus that the criminals they have photos of are a representative sample of the criminals in the wider population (including those who have never been caught or convicted for their crimes). The question they're interested in is whether there's a correlation between facial features and criminality.


Data Science Has Been Using Rebel Statistics for a Long Time

@machinelearnbot

Many of those who call themselves statisticians just won't admit that data science heavily relies on and uses (heretical, rule-breaking) statistical science, or they don't recognize the true statistical nature of these data science techniques (some are 15-year old), or are opposed to the modernization of their statistical arsenal. They already missed the train when machine learning became a popular discipline (also heavily based on statistics) more than 15 years ago. Now machine learning professionals, who are statistical practitioners working on problems such as clustering, far outnumber statisticians. Many times, I have interacted with statisticians who think that anyone not calling himself statistician, knows nothing or little about statistics; see my recent bio published here, or visit the LinkedIn profiles of many data scientists, to debunk this myth. Any statistical technique that is not in their old books are considered heretical at best, or non-statistic at worst, or most of the time, not understood.