AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

#artificialintelligenceJul-11-2020, 08:35:24 GMT

Regression with PyCaret: A better machine learning library

I assume you already know what regression is. "Regression is a statistical method used in finance, investing, and other disciplines that attempts to determine the strength and character of the relationship between one dependent variable (usually denoted by Y) and a series of other variables (known as independent variables)." In the most simple terms -- we want to fit a line (or hyperplane) through data points to obtain a line of best fit. The algorithm behind aims to find the line which minimizes the cost function -- typically MSE or RMSE. That's linear regression, but there are other types -- like polynomial regression.

artificial intelligence, machine learning, regression, (5 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.49)

arXiv.org Machine LearningJul-11-2020

Federated Learning's Blessing: FedAvg has Linear Speedup

Qu, Zhaonan, Lin, Kaixiang, Kalagnanam, Jayant, Li, Zhaojian, Zhou, Jiayu, Zhou, Zhengyuan

Federated learning (FL) learns a model jointly from a set of participating devices without sharing each other's privately held data. The characteristics of non-iid data across the network, low device participation, and the mandate that data remain private bring challenges in understanding the convergence of FL algorithms, particularly in regards to how convergence scales with the number of participating devices. In this paper, we focus on Federated Averaging (FedAvg)--the most widely used and effective FL algorithm in use today--and provide a comprehensive study of its convergence rate. Although FedAvg has recently been studied by an emerging line of literature, it remains open as to how FedAvg's convergence scales with the number of participating devices in the FL setting--a crucial question whose answer would shed light on the performance of FedAvg in large FL systems. We fill this gap by establishing convergence guarantees for FedAvg under three classes of problems: strongly convex smooth, convex smooth, and overparameterized strongly convex smooth problems. We show that FedAvg enjoys linear speedup in each case, although with different convergence rates. For each class, we also characterize the corresponding convergence rates for the Nesterov accelerated FedAvg algorithm in the FL setting: to the best of our knowledge, these are the first linear speedup guarantees for FedAvg when Nesterov acceleration is used. To accelerate FedAvg, we also design a new momentum-based FL algorithm that further improves the convergence rate in overparameterized linear regression problems. Empirical studies of the algorithms in various settings have supported our theoretical results.

artificial intelligence, fedavg, machine learning, (14 more...)

2007.0569

Country:

North America > United States > Michigan (0.04)
North America > United States > Virginia (0.04)
North America > United States > New York (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report > New Finding (0.67)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Law (0.92)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

#artificialintelligenceJul-10-2020, 03:09:12 GMT

scikit-learn Multiple Linear Regression with

Multiple Linear Regression with scikit-learn In this article, we studied the most fundamental machine learning algorithms i.e. linear regression. We implemented both simple linear regression and multiple linear regression with the help of the Scikit-Learn machine learning library. I hope you guys have enjoyed the reading. Let me know your doubts/suggestions in the comment section. In this 2-hour long project-based course, you will build and evaluate multiple linear regression models using Python.

artificial intelligence, machine learning, multiple linear regression, (9 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

arXiv.org Machine LearningJul-10-2020

Differentially Private Simple Linear Regression

Alabi, Daniel, McMillan, Audra, Sarathy, Jayshree, Smith, Adam, Vadhan, Salil

Economics and social science research often require analyzing datasets of sensitive personal information at fine granularity, with models fit to small subsets of the data. Unfortunately, such fine-grained analysis can easily reveal sensitive individual information. We study algorithms for simple linear regression that satisfy differential privacy, a constraint which guarantees that an algorithm's output reveals little about any individual input data record, even to an attacker with arbitrary side information about the dataset. We consider the design of differentially private algorithms for simple linear regression for small datasets, with tens to hundreds of datapoints, which is a particularly challenging regime for differential privacy. Focusing on a particular application to small-area analysis in economics research, we study the performance of a spectrum of algorithms we adapt to the setting. We identify key factors that affect their performance, showing through a range of experiments that algorithms based on robust estimators (in particular, the Theil-Sen estimator) perform well on the smallest datasets, but that other more standard algorithms do better as the dataset size increases.

algorithm, dataset, nvar, (16 more...)

2007.05157

Country:

North America > United States > District of Columbia > Washington (0.14)
North America > United States > Illinois (0.04)
North America > United States > North Carolina (0.04)
(6 more...)

Genre: Research Report > New Finding (0.67)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Regional Government > North America Government > United States Government (0.67)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Shin, Ha-Young, Oh, Hee-Seok

Robust Geodesic Regression

arXiv.org Machine LearningJul-10-2020

This paper studies robust regression for data on Riemannian manifolds. Geodesic regression is the generalization of linear regression to a setting with a manifold-valued dependent variable and one or more real-valued independent variables. The existing work on geodesic regression uses the sum-of-squared errors to find the solution, but as in the classical Euclidean case, the least-squares method is highly sensitive to outliers. In this paper, we use M-type estimators, including the $L_1$, Huber and Tukey biweight estimators, to perform robust geodesic regression, and describe how to calculate the tuning parameters for the latter two. We also show that, on compact symmetric spaces, all M-type estimators are maximum likelihood estimators, and argue for the overall superiority of the $L_1$ estimator over the $L_2$ and Huber estimators on high-dimensional manifolds and over the Tukey biweight estimator on compact high-dimensional manifolds. Results from numerical examples, including analysis of real neuroimaging data, demonstrate the promising empirical properties of the proposed approach.

estimator, manifold, regression, (17 more...)

2007.04518

Country: Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Health Care Technology (0.86)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

#artificialintelligenceJul-9-2020, 06:35:50 GMT

Statistical controversy on estimating racial bias in the criminal justice system « Statistical Modeling, Causal Inference, and Social Science

Researchers often lack the necessary data to credibly estimate racial discrimination in policing. In particular, police administrative records lack information on civilians police observe but do not investigate. In this article, we show that if police racially discriminate when choosing whom to investigate, analyses using administrative records to estimate racial discrimination in police behavior are statistically biased, and many quantities of interest are unidentified--even among investigated individuals--absent strong and untestable assumptions. Using principal stratification in a causal mediation framework, we derive the exact form of the statistical bias that results from traditional estimation. We develop a bias-correction procedure and nonparametric sharp bounds for race effects, replicate published findings, and show the traditional estimator can severely underestimate levels of racially biased policing or mask discrimination entirely.

artificial intelligence, discrimination, machine learning, (17 more...)

Country: North America > United States (0.04)

Genre:

Research Report > Experimental Study (0.70)
Research Report > Strength High (0.48)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.89)
Law > Criminal Law (0.89)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)

#artificialintelligenceJul-9-2020, 04:54:09 GMT

MachineLearningWithHuman/Projects

On prem data science solution based on AdventureWorks - MachineLearningWithHuman/Projects

deep learning, machine learning, machinelearningwithhuman project, (2 more...)

Country: Asia > India > Uttarakhand > Roorkee (0.11)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

arXiv.org Machine LearningJul-9-2020

Multi-view Orthonormalized Partial Least Squares: Regularizations and Deep Extensions

Wang, Li, Li, Ren-Cang, Wen-Wei, null

We establish a family of subspace-based learning method for multi-view learning using the least squares as the fundamental basis. Specifically, we investigate orthonormalized partial least squares (OPLS) and study its important properties for both multivariate regression and classification. Building on the least squares reformulation of OPLS, we propose a unified multi-view learning framework to learn a classifier over a common latent space shared by all views. The regularization technique is further leveraged to unleash the power of the proposed framework by providing three generic types of regularizers on its inherent ingredients including model parameters, decision values and latent projected points. We instantiate a set of regularizers in terms of various priors. The proposed framework with proper choices of regularizers not only can recast existing methods, but also inspire new models. To further improve the performance of the proposed framework on complex real problems, we propose to learn nonlinear transformations parameterized by deep networks. Extensive experiments are conducted to compare various methods on nine data sets with different numbers of views in terms of both feature extraction and cross-modal retrieval.

artificial intelligence, classification, machine learning, (15 more...)

2007.05028

Country:

North America > United States > Texas > Tarrant County > Arlington (0.04)
Asia > Taiwan (0.04)
North America > United States > Maryland > Baltimore (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Regol, Florence, Pal, Soumyasundar, Zhang, Yingxue, Coates, Mark

Active Learning on Attributed Graphs via Graph Cognizant Logistic Regression and Preemptive Query Generation

arXiv.org Machine LearningJul-9-2020

Node classification in attributed graphs is an important task in multiple practical settings, but it can often be difficult or expensive to obtain labels. Active learning can improve the achieved classification performance for a given budget on the number of queried labels. The best existing methods are based on graph neural networks, but they often perform poorly unless a sizeable validation set of labelled nodes is available in order to choose good hyperparameters. We propose a novel graph-based active learning algorithm for the task of node classification in attributed graphs; our algorithm uses graph cognizant logistic regression, equivalent to a linearized graph convolutional neural network (GCN), for the prediction phase and maximizes the expected error reduction in the query phase. To reduce the delay experienced by a labeller interacting with the system, we derive a preemptive querying system that calculates a new query during the labelling process, and to address the setting where learning starts with almost no labelled data, we also develop a hybrid algorithm that performs adaptive model averaging of label propagation and linearized GCN inference. We conduct experiments on five public benchmark datasets, demonstrating a significant improvement over state-of-the-art approaches and illustrate the practical value of the method by applying it to a private microwave link network dataset.

algorithm, artificial intelligence, machine learning, (10 more...)

2007.05003

Country:

North America > Canada > Quebec > Montreal (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Austria > Vienna (0.14)
(10 more...)

Genre:

Research Report > New Finding (0.37)
Research Report > Experimental Study (0.37)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.73)