Goto

Collaborating Authors

 Regression


The Maths Behind Linear Regression

#artificialintelligence

Let us discuss Linear Regression, a type of Supervised Learning algorithm often used in Data Science and other ML related predictive models, and the maths behind it. Feature data values are also called independent variables because they are not influenced by anything, they are just the property of that particular dataset. Similarly target data values are also called dependent variables because they are in some way related to the feature or dependent variables. We know that our data will not all be related in the same linear manner. Based on this, our task in Linear Regression is to find the best possible relationship for which the error or deviation of the actual target from the target that we get from our relationship is as small as possible.


Artificial Intelligence and Design of Experiments for Assessing Security of Electricity Supply: A Review and Strategic Outlook

arXiv.org Artificial Intelligence

Assessing the effects of the energy transition and liberalization of energy markets on resource adequacy is an increasingly important and demanding task. The rising complexity in energy systems requires adequate methods for energy system modeling leading to increased computational requirements. Furthermore, with complexity, uncertainty increases likewise calling for probabilistic assessments and scenario analyses. To adequately and efficiently address these various requirements, new methods from the field of data science are needed to accelerate current methods. With our systematic literature review, we want to close the gap between the three disciplines (1) assessment of security of electricity supply, (2) artificial intelligence, and (3) design of experiments. For this, we conduct a large-scale quantitative review on selected fields of application and methods and make a synthesis that relates the different disciplines to each other. Among other findings, we identify metamodeling of complex security of electricity supply models using AI methods and applications of AI-based methods for forecasts of storage dispatch and (non-)availabilities as promising fields of application that have not sufficiently been covered, yet. We end with deriving a new methodological pipeline for adequately and efficiently addressing the present and upcoming challenges in the assessment of security of electricity supply.


Amazon SageMaker Model Monitor: A System for Real-Time Insights into Deployed Machine Learning Models

arXiv.org Artificial Intelligence

With the increasing adoption of machine learning (ML) models and systems in high-stakes settings across different industries, guaranteeing a model's performance after deployment has become crucial. Monitoring models in production is a critical aspect of ensuring their continued performance and reliability. We present Amazon SageMaker Model Monitor, a fully managed service that continuously monitors the quality of machine learning models hosted on Amazon SageMaker. Our system automatically detects data, concept, bias, and feature attribution drift in models in real-time and provides alerts so that model owners can take corrective actions and thereby maintain high quality models. We describe the key requirements obtained from customers, system design and architecture, and methodology for detecting different types of drift. Further, we provide quantitative evaluations followed by use cases, insights, and lessons learned from more than 1.5 years of production deployment.


Understanding Gradient Descent with simple mathematical intuition

#artificialintelligence

In simple language, the gradient descent is an ML optimization strategy which facilitates the ML model to find the minimum loss (cost) function which relates to the optimal variable parameters. Let's understand the concept in detail by applying it on one of the most regression algorithms that ML engineers and Data scientists use, that is Linear Regression. For ease of understanding Gradient descent, we will use simple or univariate linear regression. Here, we are focusing on determining the relationship between one independent variable and one dependent variable(target variable). A univariate linear regression is mathematically represented by y mx c, where'y' is the dependent variable and'x' is the independent or target variable.


Test Set Sizing Via Random Matrix Theory

arXiv.org Machine Learning

This paper uses techniques from Random Matrix Theory to find the ideal training-testing data split for a simple linear regression with m data points, each an independent n-dimensional multivariate Gaussian. It defines "ideal" as satisfying the integrity metric, i.e. the empirical model error is the actual measurement noise, and thus fairly reflects the value or lack of same of the model. This paper is the first to solve for the training and test size for any model in a way that is truly optimal. The number of data points in the training set is the root of a quartic polynomial Theorem 1 derives which depends only on m and n; the covariance matrix of the multivariate Gaussian, the true model parameters, and the true measurement noise drop out of the calculations. The critical mathematical difficulties were realizing that the problems herein were discussed in the context of the Jacobi Ensemble, a probability distribution describing the eigenvalues of a known random matrix model, and evaluating a new integral in the style of Selberg and Aomoto. Mathematical results are supported with thorough computational evidence. This paper is a step towards automatic choices of training/test set sizes in machine learning.


Markov subsampling based Huber Criterion

arXiv.org Machine Learning

Subsampling is an important technique to tackle the computational challenges brought by big data. Many subsampling procedures fall within the framework of importance sampling, which assigns high sampling probabilities to the samples appearing to have big impacts. When the noise level is high, those sampling procedures tend to pick many outliers and thus often do not perform satisfactorily in practice. To tackle this issue, we design a new Markov subsampling strategy based on Huber criterion (HMS) to construct an informative subset from the noisy full data; the constructed subset then serves as a refined working data for efficient processing. HMS is built upon a Metropolis-Hasting procedure, where the inclusion probability of each sampling unit is determined using the Huber criterion to prevent over scoring the outliers. Under mild conditions, we show that the estimator based on the subsamples selected by HMS is statistically consistent with a sub-Gaussian deviation bound. The promising performance of HMS is demonstrated by extensive studies on large scale simulations and real data examples.


Faster Single-loop Algorithms for Minimax Optimization without Strong Concavity

arXiv.org Machine Learning

Gradient descent ascent (GDA), the simplest single-loop algorithm for nonconvex minimax optimization, is widely used in practical applications such as generative adversarial networks (GANs) and adversarial training. Albeit its desirable simplicity, recent work shows inferior convergence rates of GDA in theory even assuming strong concavity of the objective on one side. This paper establishes new convergence results for two alternative single-loop algorithms -- alternating GDA and smoothed GDA -- under the mild assumption that the objective satisfies the Polyak-Lojasiewicz (PL) condition about one variable. We prove that, to find an $\epsilon$-stationary point, (i) alternating GDA and its stochastic variant (without mini batch) respectively require $O(\kappa^{2} \epsilon^{-2})$ and $O(\kappa^{4} \epsilon^{-4})$ iterations, while (ii) smoothed GDA and its stochastic variant (without mini batch) respectively require $O(\kappa \epsilon^{-2})$ and $O(\kappa^{2} \epsilon^{-4})$ iterations. The latter greatly improves over the vanilla GDA and gives the hitherto best known complexity results among single-loop algorithms under similar settings. We further showcase the empirical efficiency of these algorithms in training GANs and robust nonlinear regression.


Welcome! You are invited to join a meeting: Conference: Scoring Systems: At the Extreme of Interpretable Machine Learning. After registering, you will receive a confirmation email about joining the meeting.

#artificialintelligence

This conference is presented as part of the Montreal Speaker Series in the Ethics of AI. SPEAKER Cynthia Rudin Professor of computer science, electrical and computer engineering, statistical science, and biostatistics & bioinformatics at Duke University With widespread use of machine learning, there have been serious societal consequences from using black box models for high-stakes decisions, including flawed bail and parole decisions in criminal justice, flawed models in healthcare, and black box loan decisions in finance. Interpretability of machine learning models is critical in high stakes decisions. In this talk, I will focus on one of the most fundamental and important problems in the field of interpretable machine learning: optimal scoring systems. Scoring systems are sparse linear models with integer coefficients. Such models first started to be used ~100 years ago. Generally, such models are created without data, or are constructed by manual feature selection and rounding logistic regression coefficients, but these manual techniques sacrifice performance; humans are not naturally adept at high-dimensional optimization. I will present the first practical algorithm for building optimal scoring systems from data. This method has been used for several important applications to healthcare and criminal justice. More information: https://sites.google.com/view/dmartin/ai-ethics/speakers?#h.nihlg6vib2nz


On the Relation between Prediction and Imputation Accuracy under Missing Covariates

arXiv.org Machine Learning

Missing covariates in regression or classification problems can prohibit the direct use of advanced tools for further analysis. Recent research has realized an increasing trend towards the usage of modern Machine Learning algorithms for imputation. It originates from their capability of showing favourable prediction accuracy in different learning problems. In this work, we analyze through simulation the interaction between imputation accuracy and prediction accuracy in regression learning problems with missing covariates when Machine Learning based methods for both, imputation and prediction are used. In addition, we explore imputation performance when using statistical inference procedures in prediction settings, such as coverage rates of (valid) prediction intervals. Our analysis is based on empirical datasets provided by the UCI Machine Learning repository and an extensive simulation study.


Implementing Logistic Regression for Stock Trading

#artificialintelligence

Most stock trading algorithms that incorporate machine learning are based upon some form of linear regression. There are benefits and drawbacks to this method. The benefit of this is that the predicted prices of linear regression can be integrated into more complex values, that need the actual price values to function. The drawback is that for the basic "buy low, sell high" strategy, it is not directly related to predicting the direction of the price. What would happen if we used logistic regression, or more specifically binary classification, to predict if the price will increase or decrease? Theoretically, it would hone in on direction itself, and become more accurate than the signals generated by linear regression.