Goto

Collaborating Authors

 Regression


Predicting erectile dysfunction after treatment for localized prostate cancer

arXiv.org Artificial Intelligence

While the 10-year survival rate for localized prostate cancer patients is very good (>98%), side effects of treatment may limit quality of life significantly. Erectile dysfunction (ED) is a common burden associated with increasing age as well as prostate cancer treatment. Although many studies have investigated the factors affecting erectile dysfunction (ED) after prostate cancer treatment, only limited studies have investigated whether ED can be predicted before the start of treatment. The advent of machine learning (ML) based prediction tools in oncology offers a promising approach to improve accuracy of prediction and quality of care. Predicting ED may help aid shared decision making by making the advantages and disadvantages of certain treatments clear, so that a tailored treatment for an individual patient can be chosen. This study aimed to predict ED at 1-year and 2-year post-diagnosis based on patient demographics, clinical data and patient-reported outcomes (PROMs) measured at diagnosis.


Predicting Consumer Purchasing Decision in The Online Food Delivery Industry

arXiv.org Machine Learning

This transformation of food delivery businesses to online platforms has gained high attention in recent years. This due to the availability of customizing ordering experiences, easy payment methods, fast delivery, and others. The competition between online food delivery providers has intensified to attain a wider range of customers. Hence, they should have a better understanding of their customers' needs and predict their purchasing decisions. Machine learning has a significant impact on companies' bottom line. They are used to construct models and strategies in industries that rely on big data and need a system to evaluate it fast and effectively. Predictive modeling is a type of machine learning that uses various regression algorithms, analytics, and statistics to estimate the probability of an occurrence. The incorporation of predictive models helps online food delivery providers to understand their customers. In this study, a dataset collected from 388 consumers in Bangalore, India was provided to predict their purchasing decisions. Four prediction models are considered: CART and C4.5 decision trees, random forest, and rule-based classifiers, and their accuracies in providing the correct class label are evaluated. The findings show that all models perform similarly, but the C4.5 outperforms them all with an accuracy of 91.67%.


Dimension Reduction and Data Visualization for Fr\'echet Regression

arXiv.org Machine Learning

With the rapid development of data collection techniques, complex data objects that are not in the Euclidean space are frequently encountered in new statistical applications. Fr\'echet regression model (Peterson & M\"uller 2019) provides a promising framework for regression analysis with metric space-valued responses. In this paper, we introduce a flexible sufficient dimension reduction (SDR) method for Fr\'echet regression to achieve two purposes: to mitigate the curse of dimensionality caused by high-dimensional predictors, and to provide a tool for data visualization for Fr\'echet regression. Our approach is flexible enough to turn any existing SDR method for Euclidean (X,Y) into one for Euclidean X and metric space-valued Y. The basic idea is to first map the metric-space valued random object $Y$ to a real-valued random variable $f(Y)$ using a class of functions, and then perform classical SDR to the transformed data. If the class of functions is sufficiently rich, then we are guaranteed to uncover the Fr\'echet SDR space. We showed that such a class, which we call an ensemble, can be generated by a universal kernel. We established the consistency and asymptotic convergence rate of the proposed methods. The finite-sample performance of the proposed methods is illustrated through simulation studies for several commonly encountered metric spaces that include Wasserstein space, the space of symmetric positive definite matrices, and the sphere. We illustrated the data visualization aspect of our method by exploring the human mortality distribution data across countries and by studying the distribution of hematoma density.


Logistic Regression Using Python

#artificialintelligence

In the supervised machine learning world, there are two types of algorithmic tasks often performed. One is called regression (predicting continuous values) and the other is called classification (predicting discrete values). In this blog, I have presented an example of a binary classification algorithm called "Binary Logistic Regression" which comes under the Binomial family with a…


Logistic Regression for Text Classification

#artificialintelligence

Logistic regression is a statistical model that in its basic form uses a logistic function to model a binary dependent variable, although many more complex extension exists. Integration analysis, logistic regression is estimating the parameters of logistic model which is the form of binary regression. In order to introduce this logistic regression to the students, this course of logistic regression for text classification is generated for all the graduates and postgraduates students who wish to begin with data science and machine learning for natural language processing. This course content contains video lectures which will give you the basic understanding of theoretical concepts of logistic regression along with the overview of the Practical implementation. This course have used the application domain of movie reviews for sentiment analysis from textual data.


Statistics With R - Intermediate Level

#artificialintelligence

If you want to learn how to perform the most useful statistical analyses in the R program, you have come to the right place. Now you don't have to scour the web endlessly in order to find how to do a Pearson or Spearman correlation, an independent t test or a factorial ANOVA, how to perform a sequential regression analysis or how to compute the Cronbach's alpha. Everything is here, in this course, explained visually, step by step. So, what will you learn in this course? First of all, you will learn how to perform association tests in R, both parametric and non-parametric: the Pearson correlation, the Spearman and Kendall correlation, the partial correlation and the chi-square test for independence.


DICoE@FinSim-3: Financial Hypernym Detection using Augmented Terms and Distance-based Features

arXiv.org Artificial Intelligence

We present the submission of team DICoE for FinSim-3, the 3rd Shared Task on Learning Semantic Similarities for the Financial Domain. The task provides a set of terms in the financial domain and requires to classify them into the most relevant hypernym from a financial ontology. After augmenting the terms with their Investopedia definitions, our system employs a Logistic Regression classifier over financial word embeddings and a mix of hand-crafted and distance-based features. Also, for the first time in this task, we employ different replacement methods for out-of-vocabulary terms, leading to improved performance. Finally, we have also experimented with word representations generated from various financial corpora. Our best-performing submission ranked 4th on the task's leaderboard.


Adversarial Regression with Doubly Non-negative Weighting Matrices

arXiv.org Machine Learning

Many machine learning tasks that involve predicting an output response can be solved by training a weighted regression model. Unfortunately, the predictive power of this type of models may severely deteriorate under low sample sizes or under covariate perturbations. Reweighting the training samples has aroused as an effective mitigation strategy to these problems. In this paper, we propose a novel and coherent scheme for kernel-reweighted regression by reparametrizing the sample weights using a doubly non-negative matrix. When the weighting matrix is confined in an uncertainty set using either the log-determinant divergence or the Bures-Wasserstein distance, we show that the adversarially reweighted estimate can be solved efficiently using first-order methods. Numerical experiments show that our reweighting strategy delivers promising results on numerous datasets.


Robust High-Dimensional Regression with Coefficient Thresholding and its Application to Imaging Data Analysis

arXiv.org Machine Learning

It is of importance to develop statistical techniques to analyze high-dimensional data in the presence of both complex dependence and possible outliers in real-world applications such as imaging data analyses. We propose a new robust high-dimensional regression with coefficient thresholding, in which an efficient nonconvex estimation procedure is proposed through a thresholding function and the robust Huber loss. The proposed regularization method accounts for complex dependence structures in predictors and is robust against outliers in outcomes. Theoretically, we analyze rigorously the landscape of the population and empirical risk functions for the proposed method. The fine landscape enables us to establish both {statistical consistency and computational convergence} under the high-dimensional setting. The finite-sample properties of the proposed method are examined by extensive simulation studies. An illustration of real-world application concerns a scalar-on-image regression analysis for an association of psychiatric disorder measured by the general factor of psychopathology with features extracted from the task functional magnetic resonance imaging data in the Adolescent Brain Cognitive Development study.


Data Science Techniques: How to Predict the Sales With Multiple Linear Regression

#artificialintelligence

Linear regression is one of the most popular techniques in data science. It can help you predict many different scenarios. Although it is a widespread technique, it is not a one-size-fits-all model because not all relationships in life are linear. "All models are wrong, but some are useful." You are interested in predicting physical and downloaded album sales from money spent on advertising. Your boss comes into the office and asks how many albums you would sell if you spend $100,000 advertising.