AITopics

#artificialintelligenceApr-2-2020, 03:10:46 GMT

Machine Learning in Python: Building a Linear Regression Model

Machine Learning in Python: Building a Linear Regression Model In this video, I will be showing you how to build a linear regression model in Python using the scikit-learn package. We will be using the Diabetes dataset (built-in data from scikit-learn) and the Boston Housing (download from GitHub) dataset. This video is part of the [Python Data Science Project] series. If you're new here, it would mean the world to me if you would consider subscribing to this channel. Disclaimer: Chanin is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to http://www.amazon.com.

amzn, linear regression model, machine learning, (7 more...)

#artificialintelligence

Industry:

Retail > Online (0.63)
Information Technology > Services (0.63)
Health & Medicine > Therapeutic Area (0.41)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

arXiv.org Machine LearningApr-1-2020

Assisted Learning and Imitation Privacy

Xian, Xun, Wang, Xinran, Ding, Jie, Ghanadan, Reza

Motivated by the emerging needs of decentralized learners with personalized learning objectives, we present an Assisted Learning framework where a service provider Bob assists a learner Alice with supervised learning tasks without transmitting Bob's private algorithm or data. Bob assists Alice either by building a predictive model using Alice's labels, or by improving Alice's private learning through iterative communications where only relevant statistics are transmitted. The proposed learning framework is naturally suitable for distributed, personalized, and privacy-aware scenarios. For example, it is shown in some scenarios that two suboptimal learners could achieve much better performance through Assisted Learning. Moreover, motivated by privacy concerns in Assisted Learning, we present a new notion of privacy to quantify the privacy leakage at learning level instead of data level. This new privacy, named imitation privacy, is particularly suitable for a market of statistical learners each holding private learning algorithms as well as data.

bob, learner, privacy, (15 more...)

2004.00566

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Minnesota (0.04)

Genre: Research Report (0.82)

Industry:

Information Technology > Security & Privacy (1.00)
Education (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Modeling & Simulation (1.00)
Information Technology > Data Science > Data Mining (1.00)
(2 more...)

Tsagris, Michail, Papadovasilakis, Zacharias, Lakiotaki, Kleanthi, Tsamardinos, Ioannis

A generalised OMP algorithm for feature selection with application to gene expression data

arXiv.org Machine LearningApr-1-2020

Feature selection for predictive analytics is the problem of identifying a minimal-size subset of features that is maximally predictive of an outcome of interest. To apply to molecular data, feature selection algorithms need to be scalable to tens of thousands of available features. In this paper, we propose gOMP, a highly-scalable generalisation of the Orthogonal Matching Pursuit feature selection algorithm to several directions: (a) different types of outcomes, such as continuous, binary, nominal, and time-to-event, (b) different types of predictive models (e.g., linear least squares, logistic regression), (c) different types of predictive features (continuous, categorical), and (d) different, statistical-based stopping criteria. We compare the proposed algorithm against LASSO, a prototypical, widely used algorithm for high-dimensional data. On dozens of simulated datasets, as well as, real gene expression datasets, gOMP is on par, or outperforms LASSO for case-control binary classification, quantified outcomes (regression), and (censored) survival times (time-to-event) analysis. gOMP has also several theoretical advantages that are discussed. While gOMP is based on quite simple and basic statistical ideas, easy to implement and to generalize, we also show in an extensive evaluation that it is also quite effective in bioinformatics analysis settings.

algorithm, lasso, predictive performance, (17 more...)

2004.00281

Country:

Europe > Greece (0.05)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Hampshire > Southampton (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

arXiv.org Machine LearningMar-31-2020

Pool-Based Unsupervised Active Learning for Regression Using Iterative Representativeness-Diversity Maximization (iRDM)

Liu, Ziang, Jiang, Xue, Luo, Hanbin, Fang, Weili, Liu, Jiajing, Wu, Dongrui

Active learning (AL) selects the most beneficial unlabeled samples to label, and hence a better machine learning model can be trained from the same number of labeled samples. Most existing active learning for regression (ALR) approaches are supervised, which means the sampling process must use some label information, or an existing regression model. This paper considers completely unsupervised ALR, i.e., how to select the samples to label without knowing any true label information. We propose a novel unsupervised ALR approach, iterative representativeness-diversity maximization (iRDM), to optimally balance the representativeness and the diversity of the selected samples. Experiments on 12 datasets from various domains demonstrated its effectiveness. Our iRDM can be applied to both linear regression and kernel regression, and it even significantly outperforms supervised ALR when the number of labeled samples is small.

irdm, regression, regression model, (13 more...)

2003.07658

Country:

North America > United States > Florida > Alachua County > Gainesville (0.14)
Europe > Austria > Vienna (0.14)
Oceania > Australia > New South Wales > Sydney (0.04)
(7 more...)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.72)

#artificialintelligenceMar-30-2020, 02:06:10 GMT

Machine Learning Basics: Building a Regression model in R

You're looking for a complete Linear Regression course that teaches you everything you need to create a Linear Regression model in R, right? You've found the right Linear Regression course! How this course will help you? A Verifiable Certificate of Completion is presented to all students who undertake this Machine learning basics course. Why should you choose this course?

linear regression, machine learning basic, regression model, (3 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Detection of FLOSS version release events from Stack Overflow message data

Sokolovsky, A., Gross, T., Bacardit, J.

Topic Detection and Tracking (TDT) is a very active research question within the area of text mining, generally applied to news feeds and Twitter datasets, where topics and events are detected. The notion of "event" is broad, but typically it applies to occurrences that can be detected from a single post or a message. Little attention has been drawn to what we call "micro-events", which, due to their nature, cannot be detected from a single piece of textual information. The study investigates micro-event detection on textual data using a sample of messages from the Stack Overflow Q&A platform in order to detect Free/Libre Open Source Software (FLOSS) version releases. Micro-events are detected using logistic regression models with step-wise forward regression feature selection from a set of LDA topics and sentiment analysis features. We perform a detailed statistical analysis of the models, including influential cases, variance inflation factors, validation of the linearity assumption, pseudo R squared measures and no-information rate. Finally, in order to understand the detection limits and improve the performance of the estimators, we suggest a method for generating micro-event synthetic datasets and use them identify the micro-event detectability thresholds.

dataset, preprint, time step, (14 more...)

2003.14257

Country:

North America > United States (0.28)
Europe > United Kingdom > England > Tyne and Wear > Newcastle (0.04)
Oceania > New Zealand (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.89)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.68)
Health & Medicine > Therapeutic Area > Immunology (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

CPFed: Communication-Efficient and Privacy-Preserving Federated Learning

Hu, Rui, Gong, Yanmin, Guo, Yuanxiong

Federated learning is a machine learning setting where a set of edge devices iteratively train a model under the orchestration of a central server, while keeping all data locally on edge devices. In each iteration of federated learning, edge devices perform computation with their local data, and the local computation results are then uploaded to the server for model update. During this process, the challenges of privacy leakage and communication overhead arise due to the extensive information exchange between edge devices and the server. In this paper, we develop CPFed, a communication-efficient and privacy-preserving federated learning method, to solve the above challenges. CPFed integrates three key components: (1) periodic averaging where local computation results at edge devices are only periodically averaged at the server; (2) Gaussian mechanism where edge devices randomly perturb their local computation results before sending the results to the server; and (3) secure aggregation where the perturbed local computation results are homomorphically encrypted before being sent to the server. CPFed can address both the communication efficiency and privacy leakage challenges in federated learning while achieving high model accuracy. We provide an end-to-end privacy guarantee of CPFed and analyze its theoretical convergence rates for both convex and non-convex models. Through extensive numerical experiments on real-world datasets, we demonstrate the effectiveness and efficiency of our proposed method.

iteration, local model, server, (16 more...)

2003.13761

Country:

North America > United States > Texas > Bexar County > San Antonio (0.04)
North America > United States > California (0.04)

Genre: Research Report > New Finding (0.47)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)

Hain, Daniel, Jurowetzki, Roman

Introduction to Rare-Event Predictive Modeling for Inferential Statisticians -- A Hands-On Application in the Prediction of Breakthrough Patents

Recent years have seen a substantial development of quantitative methods, mostly led by the computer science community with the goal to develop better machine learning application, mainly focused on predictive modeling. However, economic, management, and technology forecasting research has up to now been hesitant to apply predictive modeling techniques and workflows. In this paper, we introduce to a machine learning (ML) approach to quantitative analysis geared towards optimizing the predictive performance, contrasting it with standard practices inferential statistics which focus on producing good parameter estimates. We discuss the potential synergies between the two fields against the backdrop of this at first glance, \enquote{target-incompatibility}. We discuss fundamental concepts in predictive modeling, such as out-of-sample model validation, variable and model selection, generalization and hyperparameter tuning procedures. Providing a hands-on predictive modelling for an quantitative social science audience, while aiming at demystifying computer science jargon. We use the example of \enquote{high-quality} patent identification guiding the reader through various model classes and procedures for data pre-processing, modelling and validation. We start of with more familiar easy to interpret model classes (Logit and Elastic Nets), continues with less familiar non-parametric approaches (Classification Trees and Random Forest) and finally presents artificial neural network architectures, first a simple feed-forward and then a deep autoencoder geared towards anomaly detection. Instead of limiting ourselves to the introduction of standard ML techniques, we also present state-of-the-art yet approachable techniques from artificial neural networks and deep learning to predict rare phenomena of interest.

hyperparameter, patent, prediction, (15 more...)

2003.13441

Country:

Europe > Denmark > North Jutland > Aalborg (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > Florida > Miami-Dade County > Coral Gables (0.04)
North America > United States > California (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.94)

Wu, Shengyi, Shimamura, Kaito, Yoshikawa, Kohei, Murayama, Kazuaki, Kawano, Shuichi

Variable fusion for Bayesian linear regression via spike-and-slab priors

In linear regression models, a fusion of the coefficients is used to identify the predictors having similar relationships with the response. This is called variable fusion. This paper presents a novel variable fusion method in terms of Bayesian linear regression models. We focus on hierarchical Bayesian models based on a spike-and-slab prior approach. A spike-and-slab prior is designed to perform variable fusion. To obtain estimates of parameters, we develop a Gibbs sampler for the parameters. Simulation studies and a real data analysis show that our proposed method has better performances than previous methods.

coefficient, full conditional posterior, variable selection, (8 more...)

2003.13299

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)