AITopics | Regression

Collaborating Authors

Regression

News Overviews Instructional Materials AI-Alerts Classics

What is Gradient Descent in Machine Learning?

#artificialintelligenceOct-2-2020, 22:05:54 GMT

In every Machine Learning problem where there is an association of regression, there is one more term associated and that is called Gradient Descent. As we all know that Linear regression, Logistic regression, SVM, etc. is associated with finding the best fit line to fit in all the points where the slope of the line and bias tend to cover all the points in the dataset. This never happens as a perfect fit line leads to the condition of overfitting. So, the difference that is present between the target output and predicted output is termed as the loss function or the cost function and is given by the difference of predicted value by actual value to the power of 2. When this cost function is minimum we say that we have attained the point of least error and our model can be used as a benchmark model. In the field of statistics, there is a lot of tuning and tweaking that is done to attain the point of least error.

artificial intelligence, gradient descent, machine learning, (13 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.56)

Add feedback

Evaluating Progress on Machine Learning for Longitudinal Electronic Healthcare Data

Bellamy, David, Celi, Leo, Beam, Andrew L.

arXiv.org Machine LearningOct-2-2020

The Large Scale Visual Recognition Challenge based on the well-known Imagenet dataset catalyzed an intense flurry of progress in computer vision. Benchmark tasks have propelled other sub-fields of machine learning forward at an equally impressive pace, but in healthcare it has primarily been image processing tasks, such as in dermatology and radiology, that have experienced similar benchmark-driven progress. In the present study, we performed a comprehensive review of benchmarks in medical machine learning for structured data, identifying one based on the Medical Information Mart for Intensive Care (MIMIC-III) that allows the first direct comparison of predictive performance and thus the evaluation of progress on four clinical prediction tasks: mortality, length of stay, phenotyping, and patient decompensation. We find that little meaningful progress has been made over a 3 year period on these tasks, despite significant community engagement. Through our meta-analysis, we find that the performance of deep recurrent models is only superior to logistic regression on certain tasks. We conclude with a synthesis of these results, possible explanations, and a list of desirable qualities for future benchmarks in medical machine learning.

benchmark, dataset, prediction, (14 more...)

arXiv.org Machine Learning

2010.01149

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts (0.04)
Asia > Middle East > Israel (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.94)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.49)

Add feedback

The Efficacy of $L_1$ Regularization in Two-Layer Neural Networks

Li, Gen, Gu, Yuantao, Ding, Jie

arXiv.org Machine LearningOct-2-2020

A crucial problem in neural networks is to select the most appropriate number of hidden neurons and obtain tight statistical risk bounds. In this work, we present a new perspective towards the bias-variance tradeoff in neural networks. As an alternative to selecting the number of neurons, we theoretically show that $L_1$ regularization can control the generalization error and sparsify the input dimension. In particular, with an appropriate $L_1$ regularization on the output layer, the network can produce a statistical risk that is near minimax optimal. Moreover, an appropriate $L_1$ regularization on the input layer leads to a risk bound that does not involve the input data dimension. Our analysis is based on a new amalgamation of dimension-based and norm-based complexity analysis to bound the generalization error. A consequent observation from our results is that an excessively large number of neurons do not necessarily inflate generalization errors under a suitable regularization.

artificial intelligence, machine learning, neural network, (18 more...)

arXiv.org Machine Learning

2010.01048

Country:

North America > United States > Minnesota (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.69)

Add feedback

Linear Classifier Combination via Multiple Potential Functions

Trajdos, Pawel, Burduk, Robert

arXiv.org Machine LearningOct-2-2020

A vital aspect of the classification based model construction process is the calibration of the scoring function. One of the weaknesses of the calibration process is that it does not take into account the information about the relative positions of the recognized objects in the feature space. To alleviate this limitation, in this paper, we propose a novel concept of calculating a scoring function based on the distance of the object from the decision boundary and its distance to the class centroid. An important property is that the proposed score function has the same nature for all linear base classifiers, which means that outputs of these classifiers are equally represented and have the same meaning. The proposed approach is compared with other ensemble algorithms and experiments on multiple Keel datasets demonstrate the effectiveness of our method. To discuss the results of our experiments, we use multiple classification performance measures and statistical analysis.

artificial intelligence, classifier, machine learning, (18 more...)

arXiv.org Machine Learning

doi: 10.1016/j.patcog.2020.107681

2010.00844

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Poland > Lower Silesia Province > Wroclaw (0.04)
North America > United States > Wisconsin (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.68)

Industry: Health & Medicine > Therapeutic Area (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Estimation of causal effects of multiple treatments in healthcare database studies with rare outcomes

Hu, Liangyuan, Gu, Chenyang

arXiv.org Machine LearningOct-2-2020

The preponderance of large-scale healthcare databases provide abundant opportunities for comparative effectiveness research. Evidence necessary to making informed treatment decisions often relies on comparing effectiveness of multiple treatment options on outcomes of interest observed in a small number of individuals. Causal inference with multiple treatments and rare outcomes is a subject that has been treated sparingly in the literature. This paper designs three sets of simulations, representative of the structure of our healthcare database study, and propose causal analysis strategies for such settings. We investigate and compare the operating characteristics of three types of methods and their variants: Bayesian Additive Regression Trees (BART), regression adjustment on multivariate spline of generalized propensity scores (RAMS) and inverse probability of treatment weighting (IPTW) with multinomial logistic regression or generalized boosted models. Our results suggest that BART and RAMS provide lower bias and mean squared error, and the widely used IPTW methods deliver unfavorable operating characteristics. We illustrate the methods using a case study evaluating the comparative effectiveness of robotic-assisted surgery, video-assisted thoracoscopic surgery and open thoracotomy for treating non-small cell lung cancer.

artificial intelligence, machine learning, multiple treatment, (16 more...)

arXiv.org Machine Learning

2008.07687

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (1.00)
Health & Medicine > Therapeutic Area > Oncology > Lung Cancer (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.87)

Add feedback

It Is Likely That Your Loss Should be a Likelihood

Hamilton, Mark, Shelhamer, Evan, Freeman, William T.

arXiv.org Machine LearningOct-2-2020

Many common loss functions such as mean-squared-error, cross-entropy, and reconstruction loss are unnecessarily rigid. Under a probabilistic interpretation, these common losses correspond to distributions with fixed shapes and scales. We instead argue for optimizing full likelihoods that include parameters like the normal variance and softmax temperature. Joint optimization of these "likelihood parameters" with model parameters can adaptively tune the scales and shapes of losses in addition to the strength of regularization. We explore and systematically evaluate how to parameterize and apply likelihood parameters for robust modeling, outlier-detection, and re-calibration. Additionally, we propose adaptively tuning $L_2$ and $L_1$ weights by fitting the scale parameters of normal and Laplace priors and introduce more flexible element-wise regularizers.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Machine Learning

2007.06059

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > New York > Suffolk County > Stony Brook (0.04)
(3 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.93)
(2 more...)

Add feedback

Logistic Regression Clearly Explained

#artificialintelligenceOct-1-2020, 01:25:33 GMT

Logistic Regression is the most widely used classification algorithm in machine learning. It is used in many real-world scenarios like spam detected, cancer detection, IRIS dataset, etc. Mostly it is used in binary classification problems. But it can also be used in multiclass classification. Logistic Regression predicts the probability that the given data point belongs to a certain class or not. In this article, I will be using the famous heart disease dataset from Kaggle. In this dataset, the main goal is to predict whether the given person has heart disease or not.

artificial intelligence, machine learning, regression, (16 more...)

#artificialintelligence

Genre:

Research Report > New Finding (0.90)
Research Report > Experimental Study (0.90)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

First-order Optimization for Superquantile-based Supervised Learning

Laguel, Yassine, Malick, Jérôme, Harchaoui, Zaid

arXiv.org Machine LearningOct-1-2020

Classical supervised learning via empirical risk (or negative log-likelihood) minimization hinges upon the assumption that the testing distribution coincides with the training distribution. This assumption can be challenged in modern applications of machine learning in which learning machines may operate at prediction time with testing data whose distribution departs from the one of the training data. We revisit the superquantile regression method by proposing a first-order optimization algorithm to minimize a superquantile-based learning objective. The proposed algorithm is based on smoothing the superquantile function by infimal convolution. Promising numerical results illustrate the interest of the approach towards safer supervised learning.

artificial intelligence, inductive learning, machine learning, (15 more...)

arXiv.org Machine Learning

2009.14575

Country:

Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Arizona (0.04)
Europe > Finland (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.81)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Add feedback

Applied Machine Learning Models For Improved Startup Valuation.

#artificialintelligenceSep-30-2020, 20:12:11 GMT

Determining the valuation of an early-stage Startup is in most cases very challenging due limited historical data, little to no existing revenues, market uncertainty and many more. Traditional valuation techniques, such as Discounted Cash Flow (DCF) or Multiples (CCA), therefore often lead to inappropriate results. On the other hand, alternative valuation methods remain subject to an individual's subjective assessment and a black box for others. Therefore, the underlying study leverages machine learning algorithms to predict a fair, data-driven and comprehensible startup valuations. Three different data sources are merged and applied to three regression models.

artificial intelligence, machine learning, regression, (14 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.41)

Add feedback

Regress Consistently when Oblivious Outliers Overwhelm

d'Orsi, Tommaso, Novikov, Gleb, Steurer, David

arXiv.org Machine LearningSep-30-2020

We give a novel analysis of the Huber loss estimator for consistent robust linear regression proving that it simultaneously achieves an optimal dependency on the fraction of outliers and on the dimension. We consider a linear regression model with an oblivious adversary, who may corrupt the observations in an arbitrary way but without knowing the data. (This adversary model also captures heavy-tailed noise distributions). Given observations $y_1,\ldots,y_n$ with an $\alpha$ uncorrupted fraction, we obtain error guarantees $\tilde{O}(\sqrt{d/\alpha^2\cdot n})$, optimal up to logarithmic terms. Our algorithm works with a nearly optimal fraction of inliers $\alpha\geq \tilde{O}(\sqrt{d/n})$ and under mild restricted isometry assumptions (RIP) on the (transposed) design matrix. Prior to this work, even in the simple case of spherical Gaussian design, no estimator was known to achieve vanishing error guarantees in the high dimensional settings $d\gtrsim \sqrt{n}$, whenever the fraction of uncorrupted observations is smaller than $1/\log n$. Our analysis of the Huber loss estimator only exploits the first order optimality conditions. Furthermore, in the special case of Gaussian design $X\sim N(0,1)^{n \times d}$, we show that a strikingly simple algorithm based on computing coordinate-wise medians achieves similar guarantees in linear time. The algorithm also extends to the settings where the parameter vector $\beta^*$ is sparse.

artificial intelligence, machine learning, probability, (18 more...)

arXiv.org Machine Learning

2009.14774

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > Switzerland > Zürich > Zürich (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(3 more...)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.75)

Add feedback