This KDnuggets post will get your feet wet in the world of autonomous vehicle algorithms, providing an introductory insight into what to expect if you travel down this road, pun intended. Recent years have witnessed amazing progress in AI related fields such as computer vision, machine learning and autonomous vehicles. The purpose of this project is to use Python to play Grand Theft Auto 5. Check out the accompanying code here: Explorations of Using Python to play Grand Theft Auto 5.

Indeed growing trend of "Artificial Intelligence" in Japan is steeper than that in English, and "Data Scientist" is now getting to be forgotten by people, although in the global market data scientist is still a major role spreading data science including both statistics and machine learning across industries. Although I did not explicitly mention in the post, I guess that Japanese people may think that data scientist is a professional for statistical analysis although artificial intelligence engineer is one for machine learning or artificial intelligence as a misleading technology. A Google Trends above clearly shows that a growing trend of "人工知能" (AI in Japanese) is steeper than that of "artificial intelligence" in English. Now it's an era of "AI", dominated by machine learning engineers, not data scientists, as Japanese people think.

While successful applications of machine learning cannot rely solely on cramming ever-increasing amounts of Big Data at algorithms and hoping for the best, the ability to leverage large amounts of data for machine learning tasks is a must-have skill for practitioners at this point. Data scientist Rubens Zimbres outlines a process for applying machine to Big Data in his original graphic below. Importantly, the machine learning process is explicitly noted as recursive, which is perhaps especially true of modeling large quantities of data. Likely of greatest importance to newcomers to data science, the sub tasks of the machine learning process are presented alongside task-relevant algorithms.

It turned out that putting more weight on close neighbors, and increasingly lower weight on far away neighbors (with weights slowly decaying to zero based on the distance to the neighbor in question) was the solution to the problem. For those interested in the theory, the fact that cases 1, 2 and 3 yield convergence to the Gaussian distribution is a consequence of the Central Limit Theorem under the Liapounov condition. More specifically, and because the samples produced here come from uniformly bounded distributions (we use a random number generator to simulate uniform deviates), all that is needed for convergence to the Gaussian distribution is that the sum of the squares of the weights -- and thus Stdev(S) as n tends to infinity -- must be infinite. More generally, we can work with more complex auto-regressive processes with a covariance matrix as general as possible, then compute S as a weighted sum of the X(k)'s, and find a relationship between the weights and the covariance matrix, to eventually identify conditions on the covariance matrix that guarantee convergence to the Gaussian destribution.

New statistics or fake data science textbooks are published every week but with the exact same technical content: KNN clustering, logistic regression, naive Bayes, decision and boosted trees, SVM, Bayesian statistics, centroid clustering, linear discrimination - as in the early eighties, applied to tiny data such as Fisher's iris data set. If you compare traffic statistics (Alexa rank) from top traditional statistics websites, with data science websites, the contrast is surprising. These numbers are based on Alexa rankings, which are notoriously inaccurate, though over time, they have improved their statistical science to measure and filter Internet traffic, and the numbers that I quote here have been stable recently, showing the same trend for months, and subject to a small 30% error rate (compared to 100% error rate a few years ago, based on comparing Alexa variances over time for multiple websites that we own and for which we know exact traffic stats after filtering out robots). Modern statistical data science techniques are far more robust than traditional statistics, and designed for big data.

During a conversation I had with Peter Norvig, we discussed about the kind of projects that we do at Machinalis and how strange does it feels to say that "we are a Machine Learning company": In many projects, the amount of effort spent on R&D on Machine Learning is usually a small fraction of the total effort, or it's not even there because we plan it for a future phase after building the application first. "Machine Learning development is like the raisins in a raisin bread: 1. A large company, at a higher level, decides to move away from standard tools, given that "our business and our data are different/peculiar", incorporating Machine Learning or Data Science into their processes. In these situations they call us because they want the whole raisin bread, or perhaps some other kind of pudding which will need raisins at some point in the future.

Let me use this space to discuss about some of the main topics of my PhD thesis: "Big Data, Cognitive Extension, Self-organizing Processes and Economic Development". My research was born as an effort to improve my comprehension of the emerging phenomenon "Big Data" and its potential impacts on the Economy, in particular Economic Development and fight against poverty. The first part explores how the phenomenon of Big Data may fit within Economic Theory. A new analytical framework is defined that will allow to link Big Data, human's cognitive extension, self-organizing processes and Economic Development.

These articles were controversial in the sense that they highlighted the differences between data science and other disciplines, at a time when many believed that data science was just old stuff being re-branded, or being practiced by people knowing nothing about statistics. Analytics practitioners and users grew by a factor 5 over the last three years, faster than they can be properly trained, despite the numerous programs available for free, including ours (for self-learners only). Thus many are not equipped with the proper training. Unfortunately, this aspect of data science is considered by many, even today, to not be part of the core data science framework: it has created much of the controversy, mostly around the concept of automated data science, automated machine learning, or automated statistical science, including the introduction of new powerful algorithms such as automated indexation - a very fast clustering algorithm for big, unstructured text data - to create large taxonomies, by companies such as Amazon or Google.

I got a glimpse into the future world of our robot overlords today. I watched two robots go on stage at a tech event to "debate" the future of humanity with each other. The robots in question are Sophia and Han, and they belong to Hanson Robotics, a Hong Kong-based company that is developing and deploying artificial intelligence in humanoids. Topics ranged from an early (and creepy) joke about taking over the world with a drone army, to ethics in robots and humans, robot job potential, and whether it is better to be rich or famous.