If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."
However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …
As artificial Intelligence is being implemented in almost all sectors of automation. Deep learning is one of the trickiest architectures used to develop and maximise the efficiency of human-like computers. To help the product developers, Google, Facebook and other big tech companies have released various frameworks for Python environment where one can learn, build and train diversified neural networks. Google's TensorFlow is an open source framework for deep learning which has received popularity over the years. With the new framework, PyTorch is receiving loads of attention from beginners because of its easy-to-write code.
The goal of chemmodlab is to streamline the fitting and assessment pipeline for many machine learning models in R, making it easy for researchers to compare the utility of new models. While focused on implementing methods for model fitting and assessment that have been accepted by experts in the cheminformatics field, all of the methods in chemmodlab have broad utility for the machine learning community. chemmodlab contains several assessment utilities including a plotting function that constructs accumulation curves and a function that computes many performance measures. The most novel feature of chemmodlab is the ease with which statistically significant performance differences for many machine learning models is presented by means of the multiple comparisons similarity plot. Differences are assessed using repeated k-fold cross validation where blocking increases precision and multiplicity adjustments are applied.
So I'm sure many of you know Stripe. It's a company that provides a platform for e-commerce. And one of the things that everyone encounters when conducting commerce online is, unsurprisingly, fraud. So before I get into the details of how we address fraud with machine learning, I want to talk a little bit about the fraud life cycle. So what typically happens in fraud is that you have an organized crime ring install malware on point-of-sale devices. For example, there was this famous breach at Target about five years ago. So you can actually go online, if you go to the deep web and buy credit card numbers that were taken from personal devices, ATM machines and so forth. What's kind of surprising and funny is that these criminals who are selling credit card numbers to smaller time criminals are quite customer service oriented. So you can say, "I want 12 credit card numbers from Wells Fargo or Citibank. I want credit card numbers that were issued in the zip codes in 94102 to 94105 and so forth." Some of them are in fact so customer serviced oriented that they guarantee you that if you are unable to commit fraud with the cards you buy, they'll give you your money back. Let's say, five years at Stripe was enough for me. I decided to leave and become a criminal, using all my knowledge.
Databricks, the Silicon Valley-based startup focused on commercializing Apache Spark, has developed MLflow, an open source toolkit for data scientists to manage the lifecycle of machine learning models. Unlike traditional software development, machine learning relies on a plethora of tools. For each stage involved in building a model, data scientists use at least half-a-dozen tools. Each stage requires extensive experimentation before settling for the right toolkit and framework. The fragmentation of tools combined with the need to rapidly iterate makes machine learning extremely complex.
The skill or prediction error of a model must be estimated, and as an estimate, it will contain error. This is made clear by distinguishing between the true error of a model and the estimated or sample error. One is the error rate of the hypothesis over the sample of data that is available. The other is the error rate of the hypothesis over the entire unknown distribution D of examples.
In the first blog, we discussed some important metrics used in regression, their pros and cons, and use cases. This part will focus on commonly used metrics in classification, why should we prefer some over others with context. Let's first understand the basic terminology used in classification problems before going through the pros and cons of each method. You can skip this section if you are already familiar with the terminology. The probabilistic interpretation of ROC-AUC score is that if you randomly choose a positive case and a negative case, the probability that the positive case outranks the negative case according to the classifier is given by the AUC.
Check out Andrew Burt's talk "Beyond Explainability: Regulating Machine Learning In Practice" at the Strata Data Conference in New York, September 11-13, 2018. Hurry--early price ends July 27. Subscribe to the O'Reilly Data Show Podcast to explore the opportunities and techniques driving big data, data science, and AI. Find us on Stitcher, TuneIn, iTunes, SoundCloud, RSS. In this episode of the Data Show, I spoke with Andrew Burt, chief privacy officer at Immuta, and Steven Touw, co-founder and CTO of Immuta.
Statisticians at German technical university Technische Universitat Dortmund built a model that used machine learning to predict Spain will win the 2018 World Cup. The prediction is based on 100,000 simulations of the tournament. Spain was followed by Germany, Brazil, France and Belgium in terms of their chances of winning. And it should be a good tournament because Spain, with a 17.8 percent chance of winning, is only slightly ahead of Germany at 17.1 percent. Brazil follows with 12.3 percent, and then it's France (11.2 percent) and Belgium (10.4 percent).
The traditional machine learning model selection process is largely iterative with data scientists searching for the best model and the best hyperparameters to fit a given data-set. Going with the philosophy I've learnt from the fast.ai This blog is an introduction to the process and a more comprehensive example can be found here. The intended audience are data analysts learning data science with a few weeks of python experience with a basic understanding of numpy and pandas. For new learners, this can serve to learn the process using a top down approach to learning.