Collaborating Authors


Building a Predictive Model using Python Framework: A Step-by-Step Guide


As a part of the agreement, we broke down the entire project into three stages, each consisting of a distinctive set of responsibilities. The first stage included a comprehensive understanding of our client's business values and data points. Subsequently, our data scientists refined and organized the dataset to pull out patterns and insights as needed. In the final stage, our data engineers used the refined data and developed a predictive analytics machine learning model to accurately predict upcoming sales cycles. It helped our client prepare better for the upcoming trends in the market and, resultantly, outdo their competitors.

Optimal sizing of a holdout set for safe predictive model updating Machine Learning

Risk models in medical statistics and healthcare machine learning are increasingly used to guide clinical or other interventions. Should a model be updated after a guided intervention, it may lead to its own failure at making accurate predictions. The use of a `holdout set' -- a subset of the population that does not receive interventions guided by the model -- has been proposed to prevent this. Since patients in the holdout set do not benefit from risk predictions, the chosen size must trade off maximising model performance whilst minimising the number of held out patients. By defining a general loss function, we prove the existence and uniqueness of an optimal holdout set size, and introduce parametric and semi-parametric algorithms for its estimation. We demonstrate their use on a recent risk score for pre-eclampsia. Based on these results, we argue that a holdout set is a safe, viable and easily implemented solution to the model update problem.

VC-PCR: A Prediction Method based on Supervised Variable Selection and Clustering Machine Learning

Sparse linear prediction methods suffer from decreased prediction accuracy when the predictor variables have cluster structure (e.g. there are highly correlated groups of variables). To improve prediction accuracy, various methods have been proposed to identify variable clusters from the data and integrate cluster information into a sparse modeling process. But none of these methods achieve satisfactory performance for prediction, variable selection and variable clustering simultaneously. This paper presents Variable Cluster Principal Component Regression (VC-PCR), a prediction method that supervises variable selection and variable clustering in order to solve this problem. Experiments with real and simulated data demonstrate that, compared to competitor methods, VC-PCR achieves better prediction, variable selection and clustering performance when cluster structure is present.

Smooth Nested Simulation: Bridging Cubic and Square Root Convergence Rates in High Dimensions Machine Learning

Nested simulation concerns estimating functionals of a conditional expectation via simulation. In this paper, we propose a new method based on kernel ridge regression to exploit the smoothness of the conditional expectation as a function of the multidimensional conditioning variable. Asymptotic analysis shows that the proposed method can effectively alleviate the curse of dimensionality on the convergence rate as the simulation budget increases, provided that the conditional expectation is sufficiently smooth. The smoothness bridges the gap between the cubic root convergence rate (that is, the optimal rate for the standard nested simulation) and the square root convergence rate (that is, the canonical rate for the standard Monte Carlo simulation). We demonstrate the performance of the proposed method via numerical examples from portfolio risk management and input uncertainty quantification.

Forecasting: theory and practice Machine Learning

Forecasting has always been at the forefront of decision making and planning. The uncertainty that surrounds the future is both exciting and challenging, with individuals and organisations seeking to minimise risks and maximise utilities. The large number of forecasting applications calls for a diverse set of forecasting methods to tackle real-life challenges. This article provides a non-systematic review of the theory and the practice of forecasting. We provide an overview of a wide range of theoretical, state-of-the-art models, methods, principles, and approaches to prepare, produce, organise, and evaluate forecasts. We then demonstrate how such theoretical concepts are applied in a variety of real-life contexts. We do not claim that this review is an exhaustive list of methods and applications. However, we wish that our encyclopedic presentation will offer a point of reference for the rich work that has been undertaken over the last decades, with some key insights for the future of forecasting theory and practice. Given its encyclopedic nature, the intended mode of reading is non-linear. We offer cross-references to allow the readers to navigate through the various topics. We complement the theoretical concepts and applications covered by large lists of free or open-source software implementations and publicly-available databases.

Forecast Future Demand of Phone Using Predictive Analytics


This is step by step course on how to create predictive model using machine learning. It covers Numpy, Pandas, Matplotlib, Scikit learn and Django and at the end predictive model is deployed on Django. Most of things machine learning beginner do not know is how they can deploy a created model. How to put created model into application? Training model and getting 80%, 85% or 90% accuracy does not matter. As Artificial Intelligence Engineer you should be able to put created model into application.

Tipping Point for Legislative Polarization


A predictive model of a polarized group, similar to the current U.S. Senate, demonstrates that when an outside threat – like war or a pandemic – fails to unite the group, the divide may be irreversible through democratic means. Published today in the Proceedings of the National Academy of Sciences as part of a Dynamics of Political Polarization Special Feature, the model identifies such atypical behavior among the political elite as a powerful symptom of dangerously high levels of polarization. "We see this very disturbing pattern in which a shock brings people a little bit closer initially, but if polarization is too extreme, eventually the effects of a shared fate are swamped by the existing divisions and people become divided even on the shock issue," said network scientist Boleslaw Szymanski, a professor of computer science and director of the Army Research Laboratory Network Science and Technology Center (NeST) at Rensselaer Polytechnic Institute. "If we reach that point, we cannot unite even in the face of war, climate change, pandemics, or other challenges to the survival of our society." The model – essentially a game that simulates the views of 100 theoretical legislators over time – allowed researchers to dial up party identity, intolerance for disagreement, and extremism to levels such that almost no degree of shock could unite the legislative group. In some situations, the simulation revealed that even the strongest shock fails to reverse the self-reinforcing dynamics of political polarization.

3 Best Practices For Predictive Data Modeling


Predictive modeling is used to develop models that use past occurrences as reference points for organizations to forecast future business-related events and make clever decisions. It is heavily involved in the strategy-making processes of companies in industries such as healthcare, law enforcement, pharmaceuticals and many more. The practices that can be used to make predictive data modeling error-free can be of great importance to everybody. Predictive data modeling involves the creation, testing and validation of data models that will be used for predictive analysis in businesses. The lifecycle management of such models is a part of predictive data modeling.

Combining Embeddings and Fuzzy Time Series for High-Dimensional Time Series Forecasting in Internet of Energy Applications Artificial Intelligence

The prediction of residential power usage is essential in assisting a smart grid to manage and preserve energy to ensure efficient use. An accurate energy forecasting at the customer level will reflect directly into efficiency improvements across the power grid system, however forecasting building energy use is a complex task due to many influencing factors, such as meteorological and occupancy patterns. In addiction, high-dimensional time series increasingly arise in the Internet of Energy (IoE), given the emergence of multi-sensor environments and the two way communication between energy consumers and the smart grid. Therefore, methods that are capable of computing high-dimensional time series are of great value in smart building and IoE applications. Fuzzy Time Series (FTS) models stand out as data-driven non-parametric models of easy implementation and high accuracy. Unfortunately, the existing FTS models can be unfeasible if all features were used to train the model. We present a new methodology for handling high-dimensional time series, by projecting the original high-dimensional data into a low dimensional embedding space and using multivariate FTS approach in this low dimensional representation. Combining these techniques enables a better representation of the complex content of multivariate time series and more accurate forecasts.

Learning Non-Stationary Time-Series with Dynamic Pattern Extractions Artificial Intelligence

The era of information explosion had prompted the accumulation of a tremendous amount of time-series data, including stationary and non-stationary time-series data. State-of-the-art algorithms have achieved a decent performance in dealing with stationary temporal data. However, traditional algorithms that tackle stationary time-series do not apply to non-stationary series like Forex trading. This paper investigates applicable models that can improve the accuracy of forecasting future trends of non-stationary time-series sequences. In particular, we focus on identifying potential models and investigate the effects of recognizing patterns from historical data. We propose a combination of \rebuttal{the} seq2seq model based on RNN, along with an attention mechanism and an enriched set features extracted via dynamic time warping and zigzag peak valley indicators. Customized loss functions and evaluating metrics have been designed to focus more on the predicting sequence's peaks and valley points. Our results show that our model can predict 4-hour future trends with high accuracy in the Forex dataset, which is crucial in realistic scenarios to assist foreign exchange trading decision making. We further provide evaluations of the effects of various loss functions, evaluation metrics, model variants, and components on model performance.