# Results

### The Number Games -- How Machine Learning is Changing Sports

Data collected, such as players' vital stats and movements in training and in play on game day are being analyzed to enhance player performance and match strategy. And by studying patterns of play and player movements, coaches can reconfigure play strategy to make use of each player's strengths and offset their weaknesses to improve overall team performance. Another application is the WASP (Winning and Scoring Prediction), which has used machine learning techniques that predict the final score in the first innings and estimates the chasing team's probability of winning in the second innings. The second innings model estimates the probability of winning as a function of balls and wickets remaining, runs scored to date, and the target score.

### Data Science Simplified: Key Concepts of Statistical Learning

I want to express data that I can't control (sales) as a function of data that I can control (advertising budget). Statistical learning reveals hidden data relationships. Similarly, in machine learning, once the model is tested on the test dataset, the performance of the model is evaluated. Let us examine the multiplication models created by Bob and Raj from a machine learning perspective.

### What is the Difference Between Test and Validation Datasets? - Machine Learning Mastery

The validation dataset is different from the test dataset that is also held back from the training of the model, but is instead used to give an unbiased estimate of the skill of the final tuned model when comparing or selecting between final models. Generally, the term "validation set" is used interchangeably with the term "test set" and refers to a sample of the dataset held back from training the model. Importantly, Russell and Norvig comment that the training dataset used to fit the model can be further split into a training set and a validation set, and that it is this subset of the training dataset, called the validation set, that can be used to get an early estimate of the skill of the model. In addition to reiterating Ripley's glossary definitions, it goes on to discuss the common misuse of the terms "test set" and "validation set" in applied machine learning.

### Getting Started with Predictive Maintenance Models - Silicon Valley Data Science

We are also provided with a training set of full run-to-failure data for a number of engines and a test set with truncated engine data and their corresponding RUL values. One way of addressing this is to look at the distribution of sensor values in "healthy" engines, and compare it to a similar set of measurements when the engines are close to failure. The figure above shows the distribution of the values of a particular sensor (sensor 2) for each engine in the training set, where healthy values (in blue) are those taken from the first 20 cycles of the engine's lifetime and failing values are from the last 20 cycles. In blue are the values of a particular sensor (sensor 2 in this case) plotted against the true RUL value at each time cycle for the engines in the training set.

### How to Solve the New \$1 Million Kaggle Problem - Home Value Estimates

More specifically, I provide here high-level advice, rather than about selecting specific statistical models or algorithms, though I also discuss algorithm selection in the last section. If this is the case, an easy improvement consists of increasing value differences between adjacent homes, by boosting the importance of lot area and square footage in locations that have very homogeneous Zillow value estimates. Then for each individual home, compute an estimate based on the bin average, and other metrics such as recent sales price for neighboring homes, trend indicator for the bin in question (using time series analysis), and home features such as school rating, square footage, number of bedrooms, 2- or 3-car garage, lot area, view or not, fireplace(s), and when the home was built. With just a few (properly binned) features, a simple predictive algorithm such as HDT (Hidden Decision Trees - a combination of multiple decision trees and special regression) can work well, for homes in zipcodes (or buckets of zipcodes) with 200 homes with recent historical sales price.

### How to Solve the New \$1 Million Kaggle Problem - Home Value Estimates

More specifically, I provide here high-level advice, rather than about selecting specific statistical models or algorithms, though I also discuss algorithm selection in the last section. If this is the case, an easy improvement consists of increasing value differences between adjacent homes, by boosting the importance of lot area and square footage in locations that have very homogeneous Zillow value estimates. Then for each individual home, compute an estimate based on the bin average, and other metrics such as recent sales price for neighboring homes, trend indicator for the bin in question (using time series analysis), and home features such as school rating, square footage, lot area, view or not, and when the home was built. With just a few (properly binned) features, a simple predictive algorithm such as HDT (Hidden Decision Trees - a combination of multiple decision trees and special regression) can work well, for homes in zipcodes (or buckets of zipcodes) with 200 homes with recent historical sales price.

### Lift Analysis – A Data Scientist's Secret Weapon

Whenever I read articles about data science I feel like there is some important aspect missing: evaluating the performance and quality of a machine learning model. Consequently, the first post on this blog will deal with a pretty useful evaluation technique: lift analysis. When evaluating machine learning models there is a plethora of possible metrics to assess performance.

### How to Train a Final Machine Learning Model - Machine Learning Mastery

In this post, you will discover how to finalize your machine learning model in order to make predictions on new data. How to Train a Final Machine Learning Model Photo by Camera Eye Photography, some rights reserved. The goal of your machine learning project is to arrive at a final model that performs the best, where "best" is defined by: In your project, you gather the data, spend the time you have, and discover the data preparation procedures, algorithm to use, and how to configure it. We gather predictions from the trained model on the inputs from the test dataset and compare them to the withheld output values of the test set.

### How to Train a Final Machine Learning Model

In this post, you will discover how to finalize your machine learning model in order to make predictions on new data. The goal of your machine learning project is to arrive at a final model that performs the best, where "best" is defined by: In your project, you gather the data, spend the time you have, and discover the data preparation procedures, algorithm to use, and how to configure it. We gather predictions from the trained model on the inputs from the test dataset and compare them to the withheld output values of the test set. This is an estimate of the skill of the algorithm trained on the problem when making predictions on unseen data.

### Big data comes to deep-water drilling

An unexpected equipment failure can affect offshore producers much the way an unexpected closure of Interstate 10 affects Houston drivers. For offshore operators, lost time means less money to the bottom line. That's bad news for companies and their shareholders as unplanned equipment shutdowns cost billions of dollars each year, potentially driving the cost of producing offshore oil well above today's market prices The safe and economical recovery of future oil and gas resources demands operational efficiency, and this efficiency can be realized only if there are no unplanned downtimes due to equipment failures. I worked with them to develop a curriculum to train engineers to use mathematical modeling, simulation and data processing to capture and use this data for real-time condition and performance monitoring of oil and gas production systems.