Predicting Hard Drive Failure with Machine Learning - Datto Engineering Blog

#artificialintelligence 

We've all had a hard drive fail on us, and often it's as sudden as booting your machine and realizing you can't access a bunch of your files. It's especially not fun when you have an entire data center full of drives that are all important to keeping your business running. What if we could predict when one of those drives would fail, and get ahead of it by preemptively replacing the hardware before the data is lost? This is where the history of predictive drive failure at Datto begins. First and foremost, to make a prediction you need data. Hard drives have a built-in utility called SMART (Self-Monitoring, Analysis and Reporting Technology) that reports an array of statistics about how the drive is functioning. Here's an abbreviated view of what that looks like: Datto collects a report like this from each hard drive in its storage servers once per day. Each attribute in the report has three important numbers associated with it: value, thresh, and worst. Each attribute also has a feature named raw_value, but this is discarded due to inconsistent reporting standards between drive manufacturers. The value reflects how well the drive is operating with respect to the attribute, with 1 being the worst and 253 being the best. The initial value is arbitrarily determined by the manufacturer, and can vary by drive model. Thresh: A threshold below which the value should not fall in normal operation.