AITopics | Regression

Collaborating Authors

Regression

News Overviews Instructional Materials AI-Alerts Classics

Deconfounding and Causal Regularization for Stability and External Validity

arXiv.org Machine LearningAug-14-2020

Brad Efron, in his lecture at the occasion of receiving the International Prize in Statistics, brought up some fascinating thoughts on "prediction, estimation and attribution", with particular attention to the new "wide data era" which has entered statistics and data science more generally (Efron, 2019, 2020). Looking back almost 20 years ago, there has been a huge development in statistics since Leo Breiman's article "Statistical Modeling: The Two Cultures" (Breiman, 2001). Even more broadly, data science has become an emerging new field and profession. It deals with information extraction from data, often in close proximity with other sciences. Its historical roots are in statistics, and statistical "critical" thinking plays an ever important role in inference from data to models and prediction. There are many interesting facets of this broad topic, see for example David Donoho's "50 years of Data Science" (Donoho, 2017) or Bin Yu's "Veridical Data Science" (Yu and Kumbier, 2020). Efron (2019, 2020) has formulated intriguing ideas on "prediction, estimation and attribution". We are presenting here a few additional considerations on the topic, as outlined in the following Sections 1.1 and 1.2.

anchor regression, lasso, regression, (16 more...)

arXiv.org Machine Learning

2008.06234

Country:

North America > United States (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre:

Research Report (0.83)
Instructional Material > Course Syllabus & Notes (0.34)

Industry: Health & Medicine (0.69)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.93)

Add feedback

Privacy-Preserving Asynchronous Federated Learning Algorithms for Multi-Party Vertically Collaborative Learning

Gu, Bin, Xu, An, Huo, Zhouyuan, Deng, Cheng, Huang, Heng

arXiv.org Machine LearningAug-14-2020

The privacy-preserving federated learning for vertically partitioned data has shown promising results as the solution of the emerging multi-party joint modeling application, in which the data holders (such as government branches, private finance and e-business companies) collaborate throughout the learning process rather than relying on a trusted third party to hold data. However, existing federated learning algorithms for vertically partitioned data are limited to synchronous computation. To improve the efficiency when the unbalanced computation/communication resources are common among the parties in the federated learning system, it is essential to develop asynchronous training algorithms for vertically partitioned data while keeping the data privacy. In this paper, we propose an asynchronous federated SGD (AFSGD-VP) algorithm and its SVRG and SAGA variants on the vertically partitioned data. Moreover, we provide the convergence analyses of AFSGD-VP and its SVRG and SAGA variants under the condition of strong convexity. We also discuss their model privacy, data privacy, computational complexities and communication costs. To the best of our knowledge, AFSGD-VP and its SVRG and SAGA variants are the first asynchronous federated learning algorithms for vertically partitioned data. Extensive experimental results on a variety of vertically partitioned datasets not only verify the theoretical results of AFSGD-VP and its SVRG and SAGA variants, but also show that our algorithms have much higher efficiency than the corresponding synchronous algorithms.

algorithm, inequality, partitioned data, (12 more...)

arXiv.org Machine Learning

2008.06233

Country:

North America > United States (0.04)
Europe > Spain > Andalusia > Cádiz Province > Cadiz (0.04)
Asia > Nepal (0.04)
(2 more...)

Genre: Research Report (0.64)

Industry:

Information Technology > Security & Privacy (1.00)
Information Technology > Services > e-Commerce Services (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Federated Doubly Stochastic Kernel Learning for Vertically Partitioned Data

Gu, Bin, Dang, Zhiyuan, Li, Xiang, Huang, Heng

arXiv.org Machine LearningAug-14-2020

In a lot of real-world data mining and machine learning applications, data are provided by multiple providers and each maintains private records of different feature sets about common entities. It is challenging to train these vertically partitioned data effectively and efficiently while keeping data privacy for traditional data mining and machine learning algorithms. In this paper, we focus on nonlinear learning with kernels, and propose a federated doubly stochastic kernel learning (FDSKL) algorithm for vertically partitioned data. Specifically, we use random features to approximate the kernel mapping function and use doubly stochastic gradients to update the solutions, which are all computed federatedly without the disclosure of data. Importantly, we prove that FDSKL has a sublinear convergence rate, and can guarantee the data security under the semi-honest assumption. Extensive experimental results on a variety of benchmark datasets show that FDSKL is significantly faster than state-of-the-art federated learning methods when dealing with kernels, while retaining the similar generalization performance.

artificial intelligence, fdskl, machine learning, (14 more...)

arXiv.org Machine Learning

2008.06197

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > Canada > Ontario (0.04)
Asia > China (0.04)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.36)

Add feedback

An information criterion for automatic gradient tree boosting

Lunde, Berent Ånund Strømnes, Kleppe, Tore Selland, Skaug, Hans Julius

arXiv.org Machine LearningAug-13-2020

This article is motivated by the problem of selecting the functional form of trees and ensemble size in gradient tree boosting (Friedman, 2001; Mason et al., 2000). Gradient tree boosting (GTB) has become extremely popular in recent years, both in academia and industry: At present, an increase in the size of datasets, both in the number of observations and the richness of the data, or number of features, is seen. This, coupled with an exponential increase in computational power and a growing revelation and acceptance for data-driven decisions in the industry makes for an increasing interest in statistical learning (Hastie et al., 2001). For these new datasets, standard statistical methods such as generalized linear models (McCullagh and Nelder, 1989) that have a fixed learning rate due to their constrained functional form with bounded complexity, struggle in terms of predictive power, as they stop learning at certain information thresholds. The interest is therefore geared towards more flexible approaches such as ensembles of learners.

artificial intelligence, machine learning, reduction, (18 more...)

arXiv.org Machine Learning

2008.05926

Country:

Europe > Norway > Western Norway > Rogaland > Stavanger (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

(Almost) All of Entity Resolution

Binette, Olivier, Steorts, Rebecca C.

arXiv.org Machine LearningAug-10-2020

Whether the goal is to estimate the number of people that live in a congressional district, to estimate the number of individuals that have died in an armed conflict, or to disambiguate individual authors using bibliographic data, all these applications have a common theme - integrating information from multiple sources. Before such questions can be answered, databases must be cleaned and integrated in a systematic and accurate way, commonly known as record linkage, de-duplication, or entity resolution. In this article, we review motivational applications and seminal papers that have led to the growth of this area. Specifically, we review the foundational work that began in the 1940's and 50's that have led to modern probabilistic record linkage. We review clustering approaches to entity resolution, semi- and fully supervised methods, and canonicalization, which are being used throughout industry and academia in applications such as human rights, official statistics, medicine, citation networks, among others. Finally, we discuss current research topics of practical importance.

entity resolution, information retrieval, machine learning, (14 more...)

arXiv.org Machine Learning

2008.04443

Country:

Asia > Middle East > Syria (0.28)
North America > United States > North Carolina > Durham County > Durham (0.14)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
(23 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.67)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Government > Voting & Elections (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
(4 more...)

Add feedback

Robust Validation: Confident Predictions Even When Distributions Shift

Cauchois, Maxime, Gupta, Suyash, Ali, Alnur, Duchi, John C.

arXiv.org Machine LearningAug-10-2020

While the traditional viewpoint in machine learning and statistics assumes training and testing samples come from the same population, practice belies this fiction. One strategy---coming from robust statistics and optimization---is thus to build a model robust to distributional perturbations. In this paper, we take a different approach to describe procedures for robust predictive inference, where a model provides uncertainty estimates on its predictions rather than point predictions. We present a method that produces prediction sets (almost exactly) giving the right coverage level for any test distribution in an $f$-divergence ball around the training population. The method, based on conformal inference, achieves (nearly) valid coverage in finite samples, under only the condition that the training data be exchangeable. An essential component of our methodology is to estimate the amount of expected future data shift and build robustness to it; we develop estimators and prove their consistency for protection and validity of uncertainty estimates under shifts. By experimenting on several large-scale benchmark datasets, including Recht et al.'s CIFAR-v4 and ImageNet-V2 datasets, we provide complementary empirical results that highlight the importance of robust predictive validity.

artificial intelligence, machine learning, prediction, (17 more...)

arXiv.org Machine Learning

2008.04267

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > Middle East > Jordan (0.04)
North America > United States > New York (0.04)
(3 more...)

Genre: Research Report > New Finding (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

What are Classification and Regression in ML?

#artificialintelligenceAug-9-2020, 07:10:52 GMT

ML is extracting data from knowledge. Machine learning is a study of algorithms that uses a provides computers the ability to learn from the data and predict outcomes with accuracy, without being explicitly programmed. Machine learning is sub-branched into three categories- supervised learning, unsupervised learning, and reinforcement learning. As the name "supervised learning" suggests, here learning is based through example. We have a known set of inputs (called features, x) and outputs (called labels, y).

artificial intelligence, machine learning, regression, (14 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.76)

Add feedback

15 Machine Learning and Data Science Project Ideas with Datasets

#artificialintelligenceAug-9-2020, 07:10:48 GMT

In this article, we'll be discussing 15 machine learning and data science projects for beginners as well for intermediate level. Projects are some of the best investments of your time. You'll enjoy learning, stay motivated, and make faster progress. For machine learning or data science projects finding a dataset is a quite difficult task. And, to build accurate models, you need a huge amount of data.

artificial intelligence, dataset link, machine learning, (10 more...)

#artificialintelligence

Country:

Asia > India (0.15)
Asia > South Korea > Seoul > Seoul (0.05)

Industry:

Media (0.51)
Leisure & Entertainment (0.49)
Government (0.49)
Health & Medicine > Epidemiology (0.32)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.31)

Add feedback

Individualized Prediction of COVID-19 Adverse outcomes with MLHO

Estiri, Hossein, Strasser, Zachary H., Murphy, Shawn N.

arXiv.org Machine LearningAug-9-2020

The COVID-19 pandemic has devastated the world with health and economic wreckage. Precise estimates of the COVID-19 adverse outcomes on individual patients could have led to better allocation of healthcare resources and more efficient targeted preventive measures. We developed MLHO (pronounced as melo) for predicting patient-level risk of hospitalization, ICU admission, need for mechanical ventilation, and death from patients' past (before COVID-19 infection) medical records. MLHO is an end-to-end Machine Learning pipeline that implements iterative sequential representation mining and feature and model selection to predict health outcomes. MLHO's architecture enables a parallel and outcome-oriented calibration, in which different statistical learning algorithms and vectors of features are simultaneously tested and leveraged to improve prediction of health outcomes. Using clinical data from a large cohort of over 14,000 patients, we modeled the four adverse outcomes utilizing about 600 features representing patients' before-COVID health records. Overall, the best predictions were obtained from extreme and gradient boosting models. The median AUC ROC for mortality prediction was 0.91, while the prediction performance ranged between 0.79 and 0.83 for ICU, hospitalization, and ventilation. We broadly describe the clusters of features that were utilized in modeling and their relative influence on predicting each outcome. As COVID-19 cases are re-surging in the U.S. and around the world, a Machine Learning pipeline like MLHO is crucial to improve our readiness for confronting the potential future waves of COVID-19, as well as other novel infectious diseases that may emerge in the near future.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

2008.03869

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.05)
Asia > China > Hubei Province > Wuhan (0.04)
North America > United States > New York (0.04)
(2 more...)

Genre:

Research Report > New Finding (0.94)
Research Report > Experimental Study (0.67)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)

Add feedback

What is AI - specifically what is machine learning?

#artificialintelligenceAug-8-2020, 14:00:06 GMT

This entry is part 2 of 3 in the series What is AI once and for all? Artificial intelligence is science fiction. Artificial intelligence is already part of our everyday lives. All those statements are true, it just depends on what flavor of AI you are referring to. Most of us are familiar with the term "Artificial Intelligence." After all, it's been a popular focus in movies such as The Terminator, The Matrix, and Ex Machina but you may have recently been hearing about other terms like "#Machine Learning" and "#Deep Learning," sometimes used interchangeably with artificial intelligence.

algorithm, artificial intelligence, machine learning, (9 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.50)

Add feedback