If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."
However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …
We are back with some highlights from the second day of NIPS. A lot of fascinating research was showcased today, and we are excited to share some of our favorites with you. If you missed them, feel free to check our Day 1 and Day 3 Highlights! One of the most memorable sessions of the first two days was today's invited talk by Kate Crawford, about bias in Machine Learning. We recommend taking a look at the feature image of this post, representing modern Machine Learning datasets as an attempt at creating a taxonomy of the world.
This feature continues our series of articles that survey the landscape of HPC and AI. This post focuses on inferencing, platforms, and infrastructure at the convergence of HPC and AI. Inferencing is the operation that makes data derived models valuable because they can predict the future and perform recognition tasks better than humans. Inferencing works because once the model is trained (meaning the bumpy surface has been fitted) the ANN can interpolate between known points on the surface to correctly make predictions for data points it has never seen before--meaning they were not in the original training data. Without getting too technical, during inferencing, ANNs perform this interpolation on a nonlinear (bumpy) surface, which means that ANNs can perform better than a straight line interpolation like a conventional linear method.
While competing in a Kaggle competition this summer, I came across a simple visualization (created by a fellow competitor) that helped me to gain a better intuitive understanding of ROC curves and Area Under the Curve (AUC). I created a video explaining this visualization to serve as a learning aid for my Data Science students, and decided to share it publicly to help others understand this complex topic. An ROC curve is the most commonly used way to visualize the performance of a binary classifier, and AUC is (arguably) the best way to summarize its performance in a single number. As such, gaining a deep understanding of ROC curves and AUC is beneficial for data scientists, machine learning practitioners, and medical researchers (among others). The 14-minute video is embedded below, followed by the complete transcript (including graphics).
A Data science-based solution needs to address problems at multiple levels. While it addresses a business problem, computationally it is comprised of a pipeline of algorithm which, in turn, operates on relevant data presented in proper format. Contrary to the popular belief, almost all non-trivial data science solutions are needed to be built ground up with minute and interrelated attention to the details of the problem at all three levels. In the following we shall try to understand that with the help of a running example of aspects of a churn analysis solution. It is vital to understand that in most real-world cases we are re-purposing the data for building the solution.
Machine learning is not a new topic nor is it new to supply chain. However, it has been garnering a lot of attention within supply chain due to its transformational business potential. The basic premise of machine learning is to build algorithms that can receive input data and use statistical analysis to predict an output value within an acceptable range." Machine learning is often categorized as being supervised and unsupervised. Supervised machine learning algorithms can apply what has been learned in the past to new data using labeled examples to predict future events.
I spent much of my time recently in conferences talking to customers and analysts and realized they all were saying many of the same things about the challenges of productized, modern analytics solutions. Digital businesses, mobility, and IoT depend on real-time actionable insights and machine learning. At the same time traditional big data, data warehousing (DW) and business intelligence (BI) solutions have mostly worked on batch and interactive data queries. Streaming solutions and machine learning logic have been added on top of legacy architecture (and are not well integrated), leading to complexity and sub-optimal performance. The time is ripe for re-architecting analytics to maximize the value of machine learning and real-time streaming, drive actionable insights, and enable continuous operations.
These question can make you think THRICE! Machine learning and data science are being looked as the drivers of the next industrial revolution happening in the world today. This also means that there are numerous exciting startups looking for data scientists. What could be a better start for your aspiring career! However, still, getting into these roles is not easy.
Employee turnover (attrition) is a major cost to an organization, and predicting turnover is at the forefront of needs of Human Resources (HR) in many organizations. Until now the mainstream approach has been to use logistic regression or survival curves to model employee attrition. However, with advancements in machine learning (ML), we can now get both better predictive performance and better explanations of what critical features are linked to employee attrition. In this post, we'll use two cutting edge techniques. First, we'll use the h2o package's new FREE automatic machine learning algorithm, h2o.automl(), to develop a predictive model that is in the same ballpark as commercial products in terms of ML accuracy.
In this tutorial, you will create your first machine learning model by analyzing the historical customer records and order logs from Haiku T-Shirts. From Dataiku DSS home page, click on the Tutorials button in the left pane, and select Tutorial: Machine Learning. In the flow, you see the steps used in the previous tutorials to create, prepare, and join the customers and orders datasets. The Confusion Matrix compares the actual values of the target variable with predicted values (hence values such as false positives, false negatives…) and some associated metrics: precision, recall, f1-score.
In this post, I share an AutoML setup to train and deploy pipelines in the cloud using Python, Flask, and two AutoML frameworks that automate feature engineering and model building. I tested and combined two open source Python tools: tsfresh, an automated feature engineering tool, and, TPOT, an automated feature preprocessing and model optimization tool. After an optimal feature engineering and model building pipeline is determined, our pipeline is persisted within our Flask application within a Python dictionary–the dictionary key being the pipeline id specified in the parameter file. I have shown how to make use of open source AutoML tools and operationalize a scalable automated feature engineering and model building pipeline to the cloud.