Goto

Collaborating Authors

 Regression


Zoetrope Genetic Programming for Regression

arXiv.org Artificial Intelligence

The Zoetrope Genetic Programming (ZGP) algorithm is based on an original representation for mathematical expressions, targeting evolutionary symbolic regression.The zoetropic representation uses repeated fusion operations between partial expressions, starting from the terminal set. Repeated fusions within an individual gradually generate more complex expressions, ending up in what can be viewed as new features. These features are then linearly combined to best fit the training data. ZGP individuals then undergo specific crossover and mutation operators, and selection takes place between parents and offspring. ZGP is validated using a large number of public domain regression datasets, and compared to other symbolic regression algorithms, as well as to traditional machine learning algorithms. ZGP reaches state-of-the-art performance with respect to both types of algorithms, and demonstrates a low computational time compared to other symbolic regression approaches.


ML for Business Managers: Build Regression model in R Studio

#artificialintelligence

In this section we will learn - What does Machine Learning mean. What are the meanings or different terms associated with machine learning? You will see some examples so that you understand what machine learning actually is. It also contains steps involved in building a machine learning model, not just linear models, any machine learning model.


Logistic Regression

#artificialintelligence

In our day-to-day life we come across many problems in which we have certain problems that revolves around choosing a category such as pass/fail, win/lose, alive/dead,healthy/sick,Yes/No, etc. Decision making plays an important role in our life and selecting any of the choice has its own consequences. By reading the above stuff's, you may dwell with the question whether to proceed with this blog or skip it? Come on lets dive in assuming that you have chosen the YES Category. It was a good choice. It was an easy task for you but what if I have asked you that whether a random person with your age is likely to read my blog or not?


House Price Predictions Using Keras

#artificialintelligence

This is a starter tutorial on modeling using Keras which includes hyper-parameter tuning along with callbacks. Creating a Keras-Regression model that can accurately analyse features of a given house and predict the price accordingly. We would be using numpy and pandas for processing our dataset, matplotlib and seaborn for data visualization, and Keras for implementing our neural network. Also, we would be using Sklearn for outlier detection and scaling our dataset. We would first see all the features having missing values. This would include data from both training and testing data.


Prediction Intervals for Deep Learning Neural Networks

#artificialintelligence

Prediction intervals provide a measure of uncertainty for predictions on regression problems. For example, a 95% prediction interval indicates that 95 out of 100 times, the true value will fall between the lower and upper values of the range. This is different from a simple point prediction that might represent the center of the uncertainty interval. There are no standard techniques for calculating a prediction interval for deep learning neural networks on regression predictive modeling problems. Nevertheless, a quick and dirty prediction interval can be estimated using an ensemble of models that, in turn, provide a distribution of point predictions from which an interval can be calculated.


Learning High-Order Interactions via Targeted Pattern Search

arXiv.org Artificial Intelligence

Logistic Regression (LR) is a widely used statistical method in empirical binary classification studies. However, real-life scenarios oftentimes share complexities that prevent from the use of the as-is LR model, and instead highlight the need to include high-order interactions to capture data variability. This becomes even more challenging because of: (i) datasets growing wider, with more and more variables; (ii) studies being typically conducted in strongly imbalanced settings; (iii) samples going from very large to extremely small; (iv) the need of providing both predictive models and interpretable results. In this paper we present a novel algorithm, Learning high-order Interactions via targeted Pattern Search (LIPS), to select interaction terms of varying order to include in a LR model for an imbalanced binary classification task when input data are categorical. LIPS's rationale stems from the duality between item sets and categorical interactions. The algorithm relies on an interaction learning step based on a well-known frequent item set mining algorithm, and a novel dissimilarity-based interaction selection step that allows the user to specify the number of interactions to be included in the LR model. In addition, we particularize two variants (Scores LIPS and Clusters LIPS), that can address even more specific needs. Through a set of experiments we validate our algorithm and prove its wide applicability to real-life research scenarios, showing that it outperforms a benchmark state-of-the-art algorithm.


A Review of Generalizability and Transportability

arXiv.org Machine Learning

When assessing causal effects, determining the target population to which the results are intended to generalize is a critical decision. Randomized and observational studies each have strengths and limitations for estimating causal effects in a target population. Estimates from randomized data may have internal validity but are often not representative of the target population. Observational data may better reflect the target population, and hence be more likely to have external validity, but are subject to potential bias due to unmeasured confounding. While much of the causal inference literature has focused on addressing internal validity bias, both internal and external validity are necessary for unbiased estimates in a target population. This paper presents a framework for addressing external validity bias, including a synthesis of approaches for generalizability and transportability, the assumptions they require, as well as tests for the heterogeneity of treatment effects and differences between study and target populations.


Comparative Fault Location Estimation by Using Image Processing in Mixed Transmission Lines

arXiv.org Artificial Intelligence

The distance protection relays are used to determine the impedance based fault location according to the current and voltage magnitudes in the transmission lines. However, the fault location cannot be correctly detected in mixed transmission lines due to different characteristic impedance per unit length because the characteristic impedance of high voltage cable line is significantly different from overhead line. Thus, determinations of the fault section and location with the distance protection relays are difficult in the mixed transmission lines. In this study, 154 kV overhead transmission line and underground cable line are examined as the mixed transmission line for the distance protection relays. Phase to ground faults are created in the mixed transmission line. overhead line section and underground cable section are simulated by using PSCAD-EMTDC.The short circuit fault images are generated in the distance protection relay for the overhead transmission line and underground cable transmission line faults. The images include the R-X impedance diagram of the fault, and the R-X impedance diagram have been detected by applying image processing steps. Artificial neural network (ANN) and the regression methods are used for prediction of the fault location, and the results of image processing are used as the input parameters for the training process of ANN and the regression methods. The results of ANN and regression methods are compared to select the most suitable method at the end of this study for forecasting of the fault location in transmission lines.


Logistic Regression in SPSS for Social Science Research

#artificialintelligence

Logistic Regression in SPSS for Social Science Research Complete step by step guide on logistic regression in SPSS including interpretation and visualization New What you'll learn Social research with Logistic Regression in SPSS: A Complete Guide for the Social Sciences The only course on Udemy that shows you how to perform, interpret and visualize logistic regression in SPSS, using a real world example, using the quantitative research process. Follow along with me as I talk you through everything you need to know to become confident in using regression analysis in your quantitative research report, dissertation or thesis. Perfect for those studying social science subjects or want to increase their statistical confidence and literacy. Don't fall for other courses that are over-technical, math's based and heavy on statistics! This course cuts all that out and explains in a way that is easy to understand! Course outcomes On completion of the course you will fully understand: Logistics regression is a statistical model that is used to predict the probability of a certain outcome or event occurring, when that outcome or event is binary (such as pass/fail, true/false, healthy/sick).


Slowly Varying Regression under Sparsity

arXiv.org Machine Learning

We consider the problem of parameter estimation in slowly varying regression models with sparsity constraints. We formulate the problem as a mixed integer optimization problem and demonstrate that it can be reformulated exactly as a binary convex optimization problem through a novel exact relaxation. The relaxation utilizes a new equality on Moore-Penrose inverses that convexifies the non-convex objective function while coinciding with the original objective on all feasible binary points. This allows us to solve the problem significantly more efficiently and to provable optimality using a cutting plane-type algorithm. We develop a highly optimized implementation of such algorithm, which substantially improves upon the asymptotic computational complexity of a straightforward implementation. We further develop a heuristic method that is guaranteed to produce a feasible solution and, as we empirically illustrate, generates high quality warm-start solutions for the binary optimization problem. We show, on both synthetic and real-world datasets, that the resulting algorithm outperforms competing formulations in comparable times across a variety of metrics including out-of-sample predictive performance, support recovery accuracy, and false positive rate. The algorithm enables us to train models with 10,000s of parameters, is robust to noise, and able to effectively capture the underlying slowly changing support of the data generating process.