Regression
Model Compression for Dynamic Forecast Combination
Cerqueira, Vitor, Torgo, Luis, Soares, Carlos, Bifet, Albert
The predictive advantage of combining several different predictive models is widely accepted. Particularly in time series forecasting problems, this combination is often dynamic to cope with potential non-stationary sources of variation present in the data. Despite their superior predictive performance, ensemble methods entail two main limitations: high computational costs and lack of transparency. These issues often preclude the deployment of such approaches, in favour of simpler yet more efficient and reliable ones. In this paper, we leverage the idea of model compression to address this problem in time series forecasting tasks. Model compression approaches have been mostly unexplored for forecasting. Their application in time series is challenging due to the evolving nature of the data. Further, while the literature focuses on neural networks, we apply model compression to distinct types of methods. In an extensive set of experiments, we show that compressing dynamic forecasting ensembles into an individual model leads to a comparable predictive performance and a drastic reduction in computational costs. Further, the compressed individual model with best average rank is a rule-based regression model. Thus, model compression also leads to benefits in terms of model interpretability. The experiments carried in this paper are fully reproducible.
Machine Learning in R: Regression & Classification in 2021
Description Regression Analysis and Classification for Machine Learning & Data Science in R My course will be your hands-on guide to the theory and applications of supervised machine learning with a focus on regression analysis and classification using the R-programming language. Unlike other courses, it offers NOT ONLY the guided demonstrations of the R-scripts but also covers theoretical background that will allow you to apply and understand REGRESSION ANALYSIS and CLASSIFICATION (Linear Regression, Random Forest, KNN, etc) in R. We will cover many R packages incl. This course also covers all the main aspects of practical and highly applied data science related to Machine Learning (i.e. Thus, if you take this course, you will save lots of time & money on other expensive materials in the R based Data Science and Machine Learning domain. NO PRIOR R OR STATISTICS/MACHINE LEARNING / R KNOWLEDGE REQUIRED: You'll start by absorbing the most valuable MAchine Learning & R-programming basics, and techniques.
Complete 2-in-1 Python for Business and Finance Bootcamp
BESTSELLER, 5.0 (2 ratings), Created by Alexander Hagmann, English [Auto-generated] This is the first ever comprehensive Python Course for Business & Finance Professionals. You will learn and master Python from Zero and the full Python Data Science Stack with real Examples and Projects taken from the Business & Finance world. You will understand and master all required theoretical concepts behind the projects and the code from scratch. Learning Python is more effective when having the right context and the right examples (avoid toy examples!). Learning and mastering essential theories and concepts in Business, Finance, Statistics and Regression is way easier and more effective with Python as you can simulate, visualize and dynamically explain the intuition behind theories, math and formulas.
Model Selection for Time Series Forecasting: Empirical Analysis of Different Estimators
Cerqueira, Vitor, Torgo, Luis, Soares, Carlos
Evaluating predictive models is a crucial task in predictive analytics. This process is especially challenging with time series data where the observations show temporal dependencies. Several studies have analysed how different performance estimation methods compare with each other for approximating the true loss incurred by a given forecasting model. However, these studies do not address how the estimators behave for model selection: the ability to select the best solution among a set of alternatives. We address this issue and compare a set of estimation methods for model selection in time series forecasting tasks. We attempt to answer two main questions: (i) how often is the best possible model selected by the estimators; and (ii) what is the performance loss when it does not. We empirically found that the accuracy of the estimators for selecting the best solution is low, and the overall forecasting performance loss associated with the model selection process ranges from 1.2% to 2.3%. We also discovered that some factors, such as the sample size, are important in the relative performance of the estimators.
Logistic Regression Explained from Scratch (Visually, Mathematically and Programmatically)
A plethora of results appear on a small google search "Logistic Regression". Sometimes it gets very confusing for beginners in data science, to get around the main idea behind logistic regression. And why wouldn't they be confused!!? Every different tutorial, article, or forum has a different narration on Logistic Regression (not including the legit verbose of textbooks because that would kill the entire purpose of these "quick sources" of mastery). Some sources claim it a "Classification algorithm" and some more sophisticated ones call it a "Regressor", however, the idea and utility remain unrevealed. Remember that Logistic regression is the basic building block of artificial neural networks and no/fallacious understanding of it could make it really difficult to understand the advanced formalisms of data science.
High-Dimensional Uncertainty Quantification via Rank- and Sample-Adaptive Tensor Regression
--Fabrication process variations can significantly influence the performance and yield of nano-scale electronic and photonic circuits. Stochastic spectral methods have achieved great success in quantifying the impact of process variations, but they suffer from the curse of dimensionality. Recently, low-rank tensor methods have been developed to mitigate this issue, but two fundamental challenges remain open: how to automatically determine the tensor rank and how to adaptively pick the informative simulation samples. This paper proposes a novel tensor regression method to address these two challenges. The resulting optimization problem can be efficiently solved via an alternating minimization solver . We also propose a two-stage adaptive sampling method to reduce the simulation cost. Our method considers both exploration and exploitation via the estimated V oronoi cell volume and nonlinearity measurement respectively. The proposed model is verified with synthetic and some realistic circuit benchmarks, on which our method can well capture the uncertainty caused by 19 to 100 random variables with only 100 to 600 simulation samples. Fabrication process variations (e.g., surface roughness of interconnects and photonic waveguide, and random doping effects of transistors) have been a major concern in nano-scale chip design. They can can significantly influence chip performance and decrease product yield [2]. Monte Carlo (MC) is one of the most popular methods o quantify the chip performance under uncertainty, but it requires a huge amount of computational cost [3]. Instead, stochastic spectral methods based on generalized polynomial chaos (gPC) [4] offer efficient solutions for fast uncertainty quantification by approximating a real uncertain circuit variable as a linear combination of some stochastic basis functions [5-7].
Logistic Regression - A Complete Tutorial with Examples in R
Logistic regression is a predictive modelling algorithm that is used when the Y variable is binary categorical. That is, it can take only two values like 1 or 0. The goal is to determine a mathematical equation that can be used to predict the probability of event 1. Once the equation is established, it can be used to predict the Y when only the X--s are known. Earlier you saw what is linear regression and how to use it to predict continuous Y variables. In linear regression the Y variable is always a continuous variable.
Individually Fair Gradient Boosting
Vargo, Alexander, Zhang, Fan, Yurochkin, Mikhail, Sun, Yuekai
We consider the task of enforcing individual fairness in gradient boosting. Gradient boosting is a popular method for machine learning from tabular data, which arise often in applications where algorithmic fairness is a concern. At a high level, our approach is a functional gradient descent on a (distributionally) robust loss function that encodes our intuition of algorithmic fairness for the ML task at hand. Unlike prior approaches to individual fairness that only work with smooth ML models, our approach also works with non-smooth models such as decision trees. We show that our algorithm converges globally and generalizes. We also demonstrate the efficacy of our algorithm on three ML problems susceptible to algorithmic bias.
Modelling Heterogeneity Using Bayesian Structured Sparsity
How to estimate heterogeneity, e.g. the effect of some variable differing across observations, is a key question in political science. Methods for doing so make simplifying assumptions about the underlying nature of the heterogeneity to draw reliable inferences. This paper allows a common way of simplifying complex phenomenon (placing observations with similar effects into discrete groups) to be integrated into regression analysis. The framework allows researchers to (i) use their prior knowledge to guide which groups are permissible and (ii) appropriately quantify uncertainty. The paper does this by extending work on "structured sparsity" from a traditional penalized likelihood approach to a Bayesian one by deriving new theoretical results and inferential techniques. It shows that this method outperforms state-of-the-art methods for estimating heterogeneous effects when the underlying heterogeneity is grouped and more effectively identifies groups of observations with different effects in observational data.