AITopics

Country: North America > United States > Virginia (0.44)

Industry:

Government > Space Agency (0.94)
Government > Regional Government > North America Government > United States Government (0.94)

Technology:

Information Technology > Communications > Social Media (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.40)

Yamada, Makoto, Umezu, Yuta, Fukumizu, Kenji, Takeuchi, Ichiro

Post Selection Inference with Kernels

arXiv.org Machine LearningOct-13-2016

We propose a novel kernel based post selection inference (PSI) algorithm, which can not only handle non-linearity in data but also structured output such as multi-dimensional and multi-label outputs. Specifically, we develop a PSI algorithm for independence measures, and propose the Hilbert-Schmidt Independence Criterion (HSIC) based PSI algorithm (hsicInf). The novelty of the proposed algorithm is that it can handle non-linearity and/or structured data through kernels. Namely, the proposed algorithm can be used for wider range of applications including nonlinear multi-class classification and multi-variate regressions, while existing PSI algorithms cannot handle them. Through synthetic experiments, we show that the proposed approach can find a set of statistically significant features for both regression and classification problems. Moreover, we apply the hsicInf algorithm to a real-world data, and show that hsicInf can successfully identify important features.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

1610.03725

Country: Asia > Japan > Honshū (0.46)

Genre:

Research Report (1.00)
Instructional Material > Course Syllabus & Notes (0.68)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)

#artificialintelligenceOct-12-2016, 15:31:09 GMT

Multiple Linear Regression in Machine Learning

A couple of weeks ago I wrote an article on simple linear regression, which I would recommend reading before proceeding to read this one. Machine learning is a very interesting topic and I have been studying it on my free time. I hope this article sparks your interest in the subject or helps continue fuel it. In simple linear regression there is a one-to-one relationship between the input variable and the output variable. But in multiple linear regression, as the name implies there is a many-to-one relationship, instead of just using one input variable, you use several.

artificial intelligence, input variable, machine learning, (10 more...)

Industry: Energy > Oil & Gas (0.32)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Kawano, Shuichi, Fujisawa, Hironori, Takada, Toyoyuki, Shiroishi, Toshihiko

Sparse principal component regression for generalized linear models

arXiv.org Machine LearningOct-12-2016

Principal component regression (PCR) is a widely used two-stage procedure: principal component analysis (PCA), followed by regression in which the selected principal components are regarded as new explanatory variables in the model. Note that PCA is based only on the explanatory variables, so the principal components are not selected using the information on the response variable. In this paper, we propose a one-stage procedure for PCR in the framework of generalized linear models. The basic loss function is based on a combination of the regression loss and PCA loss. An estimate of the regression parameter is obtained as the minimizer of the basic loss function with a sparse penalty. We call the proposed method sparse principal component regression for generalized linear models (SPCR-glm). Taking the two loss function into consideration simultaneously, SPCR-glm enables us to obtain sparse principal component loadings that are related to a response variable. However, a combination of loss functions may cause a parameter identification problem, but this potential problem is avoided by virtue of the sparse penalty. Thus, the sparse penalty plays two roles in this method. The parameter estimation procedure is proposed using various update algorithms with the coordinate descent algorithm. We apply SPCR-glm to two real datasets, doctor visits data and mouse consomic strain data. SPCR-glm provides more easily interpretable principal component (PC) scores and clearer classification on PC plots than the usual PCA.

artificial intelligence, machine learning, regression, (17 more...)

arXiv.org Machine Learning

1609.08886

Country:

Asia > Japan (0.28)
Europe > Austria (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.94)

@machinelearnbotOct-10-2016, 22:26:36 GMT

Guide To Linear Regression

Linear regression is one of the first things you should try if you're modeling a linear relationship (actually, non-linear relationships too!). It's fairly simple, and probably the first thing to learn when tackling machine learning.

artificial intelligence, linear regression, machine learning

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.86)

#artificialintelligenceOct-10-2016, 01:26:03 GMT

Machine Learning with InsightEdge: Part II - DZone Big Data

Now that we have training and test datasets sampled, initially preprocessed and available in the data grid, we can close Web Notebook and start experimenting with different techniques and algorithms by submitting Spark applications. For our first baseline approach let's take a single feature device_conn_type and logistic regression algorithm: We will explain a little bit more what happens here. At first, we load the training dataset from the data grid, which we prepared and saved earlier with Web Notebook. Then we use StringIndexer and OneHotEncoder to map a column of categories to a column of binary vectors. For example, with 4 categories of device_conn_type, an input value of the second category would map to an output vector of [0.0, 1.0, 0.0, 0.0, 0.0].

artificial intelligence, insightedge, machine learning, (12 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.75)

#artificialintelligenceOct-9-2016, 08:00:47 GMT

Learning from Disaster – The Random Forest Approach.

Having tried logistic regression the first time around, I moved on to decision trees and KNN. But unfortunately, those models performed horribly and had to be scrapped. Random Forest seemed to be the buzz word around the Kaggle forums, so I obviously had to try it out next. I took a couple of days to read up on it, worked out a few examples on my own before re-taking a stab at the titanic dataset. The'caret' package is a beauty.

artificial intelligence, decision tree learning, machine learning, (11 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.36)

#artificialintelligenceOct-9-2016, 02:46:50 GMT

Logistic model tree - Wikipedia, the free encyclopedia

In computer science, a logistic model tree (LMT) is a classification model with an associated supervised training algorithm that combines logistic regression (LR) and decision tree learning.[1][2] Logistic model trees are based on the earlier idea of a model tree: a decision tree that has linear regression models at its leaves to provide a piecewise linear regression model (where ordinary decision trees with constants at their leaves would produce a piecewise constant model).[1] In the logistic variant, the LogitBoost algorithm is used to produce an LR model at every node in the tree; the node is then split using the C4.5 criterion. Each LogitBoost invocation is warm-started[vague] from its results in the parent node. Finally, the tree is pruned.[3]

artificial intelligence, machine learning, model tree, (9 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

@machinelearnbotOct-4-2016, 22:50:52 GMT

Proper train and test sets when using ML on a dataset? • /r/MachineLearning

I just completed a take home assessment as part of the interview process for a company. I was told I didn't pass because my answer lacked proper training and test sets The data set consisted of a mix of categorical and numerical predictors, with the dependent variable being a numerical variable. I then removed all rows with NA values and generated boxplots for each predictor. For one variable, I replaced all of its outliers with the median. For some other variables that indicated percentage values, I did not remove the outliers because they did not seem like obvious outliers (for example, the boxplot showed that values greater than .1 were outliers, but all of those outliers still ranged from 0 to 1 so I didn't think they were typos) I then ran a Lasso linear regression model.

artificial intelligence, machine learning, proper train and test, (7 more...)

Industry: Media > News (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.78)

@machinelearnbotOct-4-2016, 04:05:13 GMT

Introduction to Logistic Regression in R

In my previous blog I have explained about linear regression. In today's post I will explain about logistic regression. Consider a scenario where we need to predict a medical condition of a patient (HBP),HAVE HIGH BP or NO HIGH BP, based on some observed symptoms – Age, weight, Issmoking, Systolic value, Diastolic value, RACE, etc.. In this scenario we have to build a model which takes the above mentioned symptoms as input values and HBP as response variable. Note that the response variable (HBP) is a value among a fixed set of classes, HAVE HIGH BP or NO HIGH BP.

artificial intelligence, machine learning, response variable, (9 more...)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)