Statisticians at German technical university Technische Universitat Dortmund built a model that used machine learning to predict Spain will win the 2018 World Cup. The prediction is based on 100,000 simulations of the tournament. Spain was followed by Germany, Brazil, France and Belgium in terms of their chances of winning. And it should be a good tournament because Spain, with a 17.8 percent chance of winning, is only slightly ahead of Germany at 17.1 percent. Brazil follows with 12.3 percent, and then it's France (11.2 percent) and Belgium (10.4 percent).
The traditional machine learning model selection process is largely iterative with data scientists searching for the best model and the best hyperparameters to fit a given data-set. Going with the philosophy I've learnt from the fast.ai This blog is an introduction to the process and a more comprehensive example can be found here. The intended audience are data analysts learning data science with a few weeks of python experience with a basic understanding of numpy and pandas. For new learners, this can serve to learn the process using a top down approach to learning.
The workflow for building machine learning models often ends at the evaluation stage: you have achieved an acceptable accuracy, and "ta-da! Beyond that, it might just be sufficient to get those nice-looking graphs for your paper or for your internal documentation. In fact, going the extra mile to put your model into production is not always needed. And even when it is, this task is delegated to a system administrator. However, nowadays, many researchers/engineers find themselves responsible for handling the complete flow from conceiving the models to serving them to the outside world.
This project was part of one my recent job interview skill test for a "Machine learning engineer" position. I had to complete the project in 48 hours which includes writing a 10-page report in latex. The dataset has classes and highly imbalanced. The primary objective of this project was to handle data imbalance issue. In the following subsections, I describe three techniques I used to overcome the data imbalance problem.
AI and machine learning are susceptible to flawed data and unseen bias everywhere, but start-ups in emerging markets should be especially careful. Check this list of best practices to follow. If you're not seeing the outcomes you want from machine learning, you may have problems in the data sets used to train the algorithms. This blog post explains why so many teams fail to notice such problems and how to fix them. Researchers with IBM created a system that could improve machine learning models by teaching them to detect what is missing from a data set.
In "The Adventure of the Silver Blaze," Sherlock Holmes famously solved a case not by discovering a clue–but by noting its absence. In that case, it was a dog that didn't bark, and that lack of barking helped identify the culprit. The fact that humans are able to make deductions and learn from something that's missing isn't something that's yet been widely applied to machine learning, but that's something that a team of researchers a IBM want to change. In a paper published earlier this year, the team outlined a means of using missing results to get a better understanding of how machine learning models work. "One of the pitfalls of deep learning is that it's more or less black box," explained Amit Dhurandhar, one of the members of the research team.
Machine learning is a way for a program to analyze previous data (or past experiences) to make decisions or predict the future. Wow, that sounds pretty complex! But aren't you claiming everyone can do it? We use frameworks like TensorFlow that make it easy to build, train, test, and use machine learning models. All you need to know is a little Python, which we will teach you, of course.
The push by enterprises for explainable artificial intelligence is shining a light on one of the problematic aspects of machine learning models. That is, if the models operate in so-called black boxes, they don't give a business visibility into why they've arrived at the recommendations they do. But, according to experts, the enterprise demand for explainable artificial intelligence overlooks a number of characteristics about current applications of AI, including the fact that not all machine learning models require the same level of interpretability. "The importance of interpretability really depends on the downstream application," said Zoubin Ghahramani, professor of information engineering at the University of Cambridge and chief scientist at Uber Technologies Inc., during a press conference at the recent Artificial Intelligence Conference hosted by O'Reilly Media and Intel AI. A machine learning model that automatically captions an image would not need to be held to the same standards as machine learning models that determine how loans should be distributed, he contended.
A month after Google announced breakthroughs in Text-to-Speech generation technologies stemming from the Magenta project, the company followed through with a major upgrade of its Speech-to-Text API cloud service. The updated service leverages deep-learning models for speech transcription that are tailored to specific use-cases: short voice commands, phone calls and video, with a default model in all other contexts. The upgraded service now handles 120 languages and variants with different model availability and feature levels. Business applications range from over-the-phone meetings, to call-centers and video transcription. Transcription accuracy is improved in the presence of multiple speakers and significant background noise.
While Google is focused on artificial intelligence and making Google Assistant smarter and more conversational, the company needs processing horsepower to make it happen. Google CEO Sundar Pichai outlined TPU 3.0, short for the third version of the Tensor Processing Unit. TPUs are Google's custom application specific processor to accelerate machine learning and model training. These TPUs handle TensorFlow workloads, which are used by researchers, developers, and businesses. TPU 3.0 will be primarily consumed through Google Cloud.