Goto

Collaborating Authors

 Regression


Logistic Regression using SAS - Indepth Predictive Modeling

#artificialintelligence

What is this course all about? This course is all about credit scoring / logistic regression model building using SAS. There course promises to explain concepts in a crystal clear manner. It goes through the practical issue faced by analyst. How to clarify objective and ensure data sufficiency?


Adiabatic Quantum Linear Regression

arXiv.org Machine Learning

A major challenge in machine learning is the computational expense of training these models. Model training can be viewed as a form of optimization used to fit a machine learning model to a set of data, which can take up significant amount of time on classical computers. Adiabatic quantum computers have been shown to excel at solving optimization problems, and therefore, we believe, present a promising alternative to improve machine learning training times. In this paper, we present an adiabatic quantum computing approach for training a linear regression model. In order to do this, we formulate the regression problem as a quadratic unconstrained binary optimization (QUBO) problem. We analyze our quantum approach theoretically, test it on the D-Wave 2000Q adiabatic quantum computer and compare its performance to a classical approach that uses the Scikit-learn library in Python. Our analysis shows that the quantum approach attains up to 2.8x speedup over the classical approach on larger datasets, and performs at par with the classical approach on the regression error metric.


QUBO Formulations for Training Machine Learning Models

arXiv.org Machine Learning

Training machine learning models on classical computers is usually a time and compute intensive process. With Moore's law coming to an end and ever increasing demand for large-scale data analysis using machine learning, we must leverage non-conventional computing paradigms like quantum computing to train machine learning models efficiently. Adiabatic quantum computers like the D-Wave 2000Q can approximately solve NP-hard optimization problems, such as the quadratic unconstrained binary optimization (QUBO), faster than classical computers. Since many machine learning problems are also NP-hard, we believe adiabatic quantum computers might be instrumental in training machine learning models efficiently in the post Moore's law era. In order to solve a problem on adiabatic quantum computers, it must be formulated as a QUBO problem, which is a challenging task in itself. In this paper, we formulate the training problems of three machine learning models---linear regression, support vector machine (SVM) and equal-sized k-means clustering---as QUBO problems so that they can be trained on adiabatic quantum computers efficiently. We also analyze the time and space complexities of our formulations and compare them to the state-of-the-art classical algorithms for training these machine learning models. We show that the time and space complexities of our formulations are better (in the case of SVM and equal-sized k-means clustering) or equivalent (in case of linear regression) to their classical counterparts.


Alzheimer's Dementia Recognition through Spontaneous Speech: The ADReSS Challenge

arXiv.org Machine Learning

The ADReSS Challenge at INTERSPEECH 2020 defines a shared task through which different approaches to the automated recognition of Alzheimer's dementia based on spontaneous speech can be compared. ADReSS provides researchers with a benchmark speech dataset which has been acoustically pre-processed and balanced in terms of age and gender, defining two cognitive assessment tasks, namely: the Alzheimer's speech classification task and the neuropsychological score regression task. In the Alzheimer's speech classification task, ADReSS challenge participants create models for classifying speech as dementia or healthy control speech. In the the neuropsychological score regression task, participants create models to predict mini-mental state examination scores. This paper describes the ADReSS Challenge in detail and presents a baseline for both tasks, including feature extraction procedures and results for classification and regression models. ADReSS aims to provide the speech and language Alzheimer's research community with a platform for comprehensive methodological comparisons. This will hopefully contribute to addressing the lack of standardisation that currently affects the field and shed light on avenues for future research and clinical applicability.


Introduction to Data Science with Python

#artificialintelligence

If you want to learn more about exploratory analysis using Pandas, check out Simplilearn's Data Science with Python video, which can help. We can see that columns like LoanAmount and ApplicantIncome contain some extreme values. We need to process this data using data wrangling techniques to normalize and standardize the data. We will now take a look at data wrangling using Pandas as a part of our learning of Data Science with Python. Data wrangling refers to the process of cleaning and unifying messy and complicated data sets.


Generative Ensemble-Regression: Learning Stochastic Dynamics from Discrete Particle Ensemble Observations

arXiv.org Machine Learning

We propose a new method for inferring the governing stochastic ordinary differential equations by observing particle ensembles at discrete and sparse time instants, i.e., multiple "snapshots". Particle coordinates at a single time instant, possibly noisy or truncated, are recorded in each snapshot but are unpaired across the snapshots. By training a generative model that generates "fake" sample paths, we aim to fit the observed particle ensemble distributions with a curve in the probability measure space, which is induced from the inferred particle dynamics. We employ different metrics to quantify the differences between distributions, like the sliced Wasserstein distances and the adversarial losses in generative adversarial networks. We refer to this approach as generative "ensemble-regression", in analogy to the classic "point-regression", where we infer the dynamics by performing regression in the Euclidean space, e.g. linear/logistic regression. We illustrate the ensemble-regression by learning the drift and diffusion terms of particle ensembles governed by stochastic ordinary differential equations with Brownian motions and L\'evy processes up to 20 dimensions. We also discuss how to treat cases with noisy or truncated observations, as well as the scenario of paired observations, and we prove a theorem for the convergence in Wasserstein distance for continuous sample spaces.


Well-Conditioned Methods for Ill-Conditioned Systems: Linear Regression with Semi-Random Noise

arXiv.org Machine Learning

Classical iterative algorithms for linear system solving and regression are brittle to the condition number of the data matrix. Even a semi-random adversary, constrained to only give additional consistent information, can arbitrarily hinder the resulting computational guarantees of existing solvers. We show how to overcome this barrier by developing a framework which takes state-of-the-art solvers and "robustifies" them to achieve comparable guarantees against a semi-random adversary. Given a matrix which contains an (unknown) well-conditioned submatrix, our methods obtain computational and statistical guarantees as if the entire matrix was well-conditioned. We complement our theoretical results with preliminary experimental evidence, showing that our methods are effective in practice.


How to Build a Machine Learning Model

#artificialintelligence

How to Build a Machine Learning Model A Visual Guide to Learning Data Science Jul 25 ยท 13 min read Learning data science may seem intimidating but it doesn't have to be that way. Let's make learning data science fun and easy. So the challenge is how do we exactly make learning data science both fun and easy? Cartoons are fun and since "a picture is worth a thousand words", so why not make a cartoon about data science? With that goal in mind, I've set out to doodle on my iPad the elements that are required for building a machine learning model.


Stacking Ensemble Machine Learning With Python

#artificialintelligence

Stacking or Stacked Generalization is an ensemble machine learning algorithm. It uses a meta-learning algorithm to learn how to best combine the predictions from two or more base machine learning algorithms. The benefit of stacking is that it can harness the capabilities of a range of well-performing models on a classification or regression task and make predictions that have better performance than any single model in the ensemble. In this tutorial, you will discover the stacked generalization ensemble or stacking in Python. Stacking Ensemble Machine Learning With Python Photo by lamoix, some rights reserved. Stacked Generalization or "Stacking" for short is an ensemble machine learning algorithm.


Deep Learning Prerequisites: Logistic Regression in Python

#artificialintelligence

Created by Lazy Programmer Inc. English [Auto-generated], Portuguese [Auto-generated], 1 more Created by Lazy Programmer Inc. This course is a lead-in to deep learning and neural networks - it covers a popular and fundamental technique used in machine learning, data science and statistics: logistic regression. We cover the theory from the ground up: derivation of the solution, and applications to real-world problems. We show you how one might code their own logistic regression module in Python. This course does not require any external materials.