With the advent of pervasive cloud computing technologies, service reliability and availability are becoming major concerns,especially as we start to integrate cyber-physical systems with the cloud networks. A number of smart and connected community systems such as emergency response systems utilize cloud networks to analyze real-time data streams and provide context-sensitive decision support.Improving overall system reliability requires us to study all the aspects of the end-to-end of this distributed system,including the backend data servers. In this paper, we describe a bi-layered prognostic architecture for predicting the Remaining Useful Life (RUL) of components of backend servers,especially those that are subjected to degradation. We show that our architecture is especially good at predicting the remaining useful life of hard disks. A Deep LSTM Network is used as the backbone of this fast, data-driven decision framework and dynamically captures the pattern of the incoming data. In the article, we discuss the architecture of the neural network and describe the mechanisms to choose the various hyper-parameters. We describe the challenges faced in extracting effective training sets from highly unorganized and class-imbalanced big data and establish methods for online predictions with extensive data pre-processing, feature extraction and validation through test sets with unknown remaining useful lives of the hard disks. Our algorithm performs especially well in predicting RUL near the critical zone of a device approaching failure.The proposed architecture is able to predict whether a disk is going to fail in next ten days with an average precision of 0.8435.In future, we will extend this architecture to learn and predict the RUL of the edge devices in the end-to-end distributed systems of smart communities, taking into consideration context-sensitive external features such as weather.
We introduce AutoGluon-Tabular, an open-source AutoML framework that requires only a single line of Python to train highly accurate machine learning models on an unprocessed tabular dataset such as a CSV file. Unlike existing AutoML frameworks that primarily focus on model/hyperparameter selection, AutoGluon-Tabular succeeds by ensembling multiple models and stacking them in multiple layers. Experiments reveal that our multi-layer combination of many models offers better use of allocated training time than seeking out the best. A second contribution is an extensive evaluation of public and commercial AutoML platforms including TPOT, H2O, AutoWEKA, auto-sklearn, AutoGluon, and Google AutoML Tables. Tests on a suite of 50 classification and regression tasks from Kaggle and the OpenML AutoML Benchmark reveal that AutoGluon is faster, more robust, and much more accurate. We find that AutoGluon often even outperforms the best-in-hindsight combination of all of its competitors. In two popular Kaggle competitions, AutoGluon beat 99% of the participating data scientists after merely 4h of training on the raw data.
Google has one of the largest machine learning stacks in the industry, currently centering on its Google Cloud AI and Machine Learning Platform. Google spun out TensorFlow as open source years ago, but TensorFlow is still the most mature and widely cited deep learning framework. Similarly, Google spun out Kubernetes as open source years ago, but it is still the dominant container management system. Google is one of the top sources of tools and infrastructure for developers, data scientists, and machine learning experts, but historically Google AI hasn't been all that attractive to business analysts who lack serious data science or programming backgrounds. The Google Cloud AI and Machine Learning Platform includes AI building blocks, the AI platform and accelerators, and AI solutions.
Leveraging machine learning to process data and workloads has proved to be significantly beneficial for diverse enterprise industries in recent years. Whether it be healthcare, BFSI or retail, machine learning systems turned out to be extremely promising to process millions of data and build complex models. Having said that, the traditional machine learning process involves humans to look after the operations, to code, and to build the models. But, with the crisis in hand, businesses are looking to reduce their workforce, some are even not equipped with resources to spend on employing an experienced data science team. And that's when AutoML can come to rescue for many.