AITopics | Regression

Collaborating Authors

Regression

News Overviews Instructional Materials AI-Alerts Classics

Cross-validation Confidence Intervals for Test Error

Bayle, Pierre, Bayle, Alexandre, Janson, Lucas, Mackey, Lester

arXiv.org Machine LearningOct-31-2020

This work develops central limit theorems for cross-validation and consistent estimators of its asymptotic variance under weak stability conditions on the learning algorithm. Together, these results provide practical, asymptotically-exact confidence intervals for $k$-fold test error and valid, powerful hypothesis tests of whether one learning algorithm has smaller $k$-fold test error than another. These results are also the first of their kind for the popular choice of leave-one-out cross-validation. In our real-data experiments with diverse learning algorithms, the resulting intervals and tests outperform the most popular alternative methods from the literature.

artificial intelligence, machine learning, test error, (19 more...)

arXiv.org Machine Learning

2007.12671

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(2 more...)

Genre: Research Report > Experimental Study (0.93)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.81)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.69)

Add feedback

Aspectuality Across Genre: A Distributional Semantics Approach

Kober, Thomas, Alikhani, Malihe, Stone, Matthew, Steedman, Mark

arXiv.org Artificial IntelligenceOct-31-2020

The interpretation of the lexical aspect of verbs in English plays a crucial role for recognizing textual entailment and learning discourse-level inferences. We show that two elementary dimensions of aspectual class, states vs. events, and telic vs. atelic events, can be modelled effectively with distributional semantics. We find that a verb's local context is most indicative of its aspectual class, and demonstrate that closed class words tend to be stronger discriminating contexts than content words. Our approach outperforms previous work on three datasets. Lastly, we contribute a dataset of human--human conversations annotated with lexical aspect and present experiments that show the correlation of telicity with genre and discourse goals.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2011.00345

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Austria > Vienna (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(21 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)

Add feedback

Linear Regression using PyTorch

#artificialintelligenceOct-30-2020, 22:00:10 GMT

As we know, 'Data is the new oil.' It means that just like oil. If one knows the value of data, we can learn to extract and use, it can solve many problems. Now, data can be explained by two things, Model and Error. In this article, we are going to dive into the linear model.

artificial intelligence, machine learning, variance, (13 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.43)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.43)

Add feedback

Measure Inducing Classification and Regression Trees for Functional Data

Belli, Edoardo, Vantini, Simone

arXiv.org Machine LearningOct-30-2020

We propose a tree-based algorithm for classification and regression problems in the context of functional data analysis, which allows to leverage representation learning and multiple splitting rules at the node level, reducing generalization error while retaining the interpretability of a tree. This is achieved by learning a weighted functional $L^{2}$ space by means of constrained convex optimization, which is then used to extract multiple weighted integral features from the input functions, in order to determine the binary split for each internal node of the tree. The approach is designed to manage multiple functional inputs and/or outputs, by defining suitable splitting rules and loss functions that can depend on the specific problem and can also be combined with scalar and categorical data, as the tree is grown with the original greedy CART algorithm. We focus on the case of scalar-valued functional inputs defined on unidimensional domains and illustrate the effectiveness of our method in both classification and regression tasks, through a simulation study and four real world applications.

artificial intelligence, machine learning, node, (16 more...)

arXiv.org Machine Learning

2011.00046

Country:

North America > United States > New York > New York County > New York City (0.04)
Asia > Middle East > Jordan (0.04)
Europe > Italy > Lombardy > Milan (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area > Gastroenterology (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Post-selection inference with HSIC-Lasso

Freidling, Tobias, Poignard, Benjamin, Climente-González, Héctor, Yamada, Makoto

arXiv.org Machine LearningOct-29-2020

Detecting influential features in complex (non-linear and/or high-dimensional) datasets is key for extracting the relevant information. Most of the popular selection procedures, however, require assumptions on the underlying data - such as distributional ones -, which barely agree with empirical observations. Therefore, feature selection based on nonlinear methods, such as the model-free HSIC-Lasso, is a more relevant approach. In order to ensure valid inference among the chosen features, the selection procedure must be accounted for. In this paper, we propose selective inference with HSIC-Lasso using the framework of truncated Gaussians together with the polyhedral lemma. Based on these theoretical foundations, we develop an algorithm allowing for low computational costs and the treatment of the hyper-parameter selection issue. The relevance of our method is illustrated using artificial and real-world datasets. In particular, our empirical findings emphasise that type-I error control at the considered level can be achieved.

artificial intelligence, estimator, machine learning, (17 more...)

arXiv.org Machine Learning

2010.15659

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
North America > United States > New York > New York County > New York City (0.04)
(7 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine (0.68)
Leisure & Entertainment > Sports > Golf (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

DeSMOG: Detecting Stance in Media On Global Warming

Luo, Yiwei, Card, Dallas, Jurafsky, Dan

arXiv.org Artificial IntelligenceOct-28-2020

Citing opinions is a powerful yet understudied strategy in argumentation. For example, an environmental activist might say, "Leading scientists agree that global warming is a serious concern," framing a clause which affirms their own stance ("that global warming is serious") as an opinion endorsed ("[scientists] agree") by a reputable source ("leading"). In contrast, a global warming denier might frame the same clause as the opinion of an untrustworthy source with a predicate connoting doubt: "Mistaken scientists claim [...]." Our work studies opinion-framing in the global warming (GW) debate, an increasingly partisan issue that has received little attention in NLP. We introduce DeSMOG, a dataset of stance-labeled GW sentences, and train a BERT classifier to study novel aspects of argumentation in how different sides of a debate represent their own and each other's opinions. From 56K news articles, we find that similar linguistic devices for self-affirming and opponent-doubting discourse are used across GW-accepting and skeptic media, though GW-skeptical media shows more opponent-doubt. We also find that authors often characterize sources as hypocritical, by ascribing opinions expressing the author's own view to source entities known to publicly endorse the opposing view. We release our stance dataset, model, and lexicons of framing devices for future work on opinion-framing and the automatic detection of GW stance.

artificial intelligence, environmental law, health & medicine, (20 more...)

arXiv.org Artificial Intelligence

2010.15149

Country:

Asia > China (0.28)
North America > United States > California (0.28)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
(7 more...)

Genre: Research Report > New Finding (0.68)

Industry:

Energy (1.00)
Media > News (0.93)
Law > Environmental Law (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Communications > Social Media (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Introduction to logistic regression

Chung, Moo K.

arXiv.org Machine LearningOct-28-2020

July 29, 2020 For random field theory based multiple comparison corrections In brain imaging, it is often necessary to compute the distribution of the supremum of a random field. Unfortunately, computing the distribution of the supremum of the random field is not easy and requires satisfying many distributional assumptions that may not be true in real data. Thus, there is a need to come up with a different framework that does not use the traditional statistical hypothesis testing paradigm that requires to compute p-values. With this as a motivation, we can use a different approach called the logistic regression that does not require computing the p-value and still be able to localize the regions of brain network differences (Flury 1997, Hastie et al. 2003, Chung et al. 2008). Unlike other discriminant and classification techniques that tried to classify preselected feature vectors, the method here does not require any preselected feature vectors and performs the classification at each edge level (Higdon et al. 2004, Shen et al. 2004, Thomaz et al. 2006).

artificial intelligence, discriminant power, machine learning, (17 more...)

arXiv.org Machine Learning

2008.13567

Country:

North America > United States > Wisconsin > Dane County > Madison (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > China > Shaanxi Province > Xi'an (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.74)

Add feedback

Manifold structure in graph embeddings

Rubin-Delanchy, Patrick

arXiv.org Machine LearningOct-28-2020

The hypothesis that high-dimensional data tend to live near a manifold of low dimension is an important theme of modern statistics and machine learning, often held to explain why highdimensional learning is realistically possible [61, 5, 15, 13, 18, 44]. The object of this paper is show that, for a theoretically tractable but rich class of random graph models, such a phenomenon occurs in the spectral embedding of a graph. Manifold structure is shown to arise when the graph follows a latent position model [28], wherein connections are posited to occur as a function of the nodes' underlying positions in space. Because of their intuitive appeal, such models have been employed in a great diversity of disciplines, including social science [35, 42, 21], neuroscience [17, 52], statistical mechanics [34], information technology [69], biology [53] and ecology [19]. In many more endeavours latent position models are used -- at least according to Definition 1 (to follow) -- but are known by a different name; examples include the standard [29], mixed [2] and degree-corrected [32] stochastic block models, random geometric graphs [50], and the graphon model [39], which encompasses them.

artificial intelligence, graph, machine learning, (16 more...)

arXiv.org Machine Learning

2006.05168

Country:

North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report (0.40)

Industry:

Information Technology (0.69)
Government (0.46)
Health & Medicine > Therapeutic Area (0.34)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Mastering Python for Data Science - Programmer Books

#artificialintelligenceOct-27-2020, 21:00:28 GMT

Data science is a relatively new knowledge domain which is used by various organizations to make data driven decisions. Data scientists have to wear various hats to work with data and to derive value from it. The Python programming language, beyond having conquered the scientific community in the last decade, is now an indispensable tool for the data science practitioner and a must-know tool for every aspiring data scientist. Using Python will offer you a fast, reliable, cross-platform, and mature environment for data analysis, machine learning, and algorithmic problem solving. This comprehensive guide helps you move beyond the hype and transcend the theory by providing you with a hands-on, advanced study of data science.

artificial intelligence, machine learning, python, (5 more...)

#artificialintelligence

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.43)

Add feedback

Logistic Regression in Python

#artificialintelligenceOct-27-2020, 14:05:27 GMT

Logistic Regression is used for classification problems in machine learning. It is used to deal with binary classification and multiclass classification. In logistic regression, the target variable/dependent variable should be a discrete value or categorical value. Binary classification problems with two class values like male/female, yes/no, True/False, 0/1, pass/fail. Let's learn about logistic regression for binary classification in this story.

classification problem, equation, logistic regression, (13 more...)

#artificialintelligence

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback