AITopics

1608.06048

Genre: Research Report (0.53)

Industry: Law Enforcement & Public Safety > Fraud (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.30)

#artificialintelligenceAug-21-2016, 08:05:31 GMT

The Gentlest Introduction to Tensorflow – Part 2

Editor's note: You may want to check out part 1 of this tutorial before proceeding. In the previous article, we used Tensorflow (TF) to build and learn a linear regression model with a single feature so that given a feature value (house size/sqm), we can predict the outcome (house price/). In machine learning (ML) literature, we come across the term'training' very often, let us literally look at what that means in TF. The goal in linear regression is to find W, b, such that given any feature value (x), we can find the prediction (y) by substituting W, x, b values into the model. However to find W, b that can give accurate predictions, we need to'train' the model using available data (the multiple pairs of actual feature (x), and actual outcome (y_), note the underscore).

artificial intelligence, feature value, machine learning, (9 more...)

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.06)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.80)

#artificialintelligenceAug-20-2016, 11:20:28 GMT

R FUNCTIONS FOR REGRESSION ANALYSIS – Step Up Analytics

Here are some helpful R functions for regression analysis grouped by their goal. The name of package is in parentheses. Base has a method for objects inheriting from class "lm" (stasts) This is a generic function, but currently only has a methods for objects inheriting from classes "lm" and "glm" (stasts) AIC: Generic function calculating the Akaike information criterion for one or several fitted model objects for which a log-likelihood value can be obtained, according to the formula -2*log-likelihood k*npar, where npar represents the number of parameters in the fitted model, and k 2 for the usual AIC, or k log(n) (n the number of observations) for the so-called BIC or SBC (Schwarz's Bayesian criterion) (stats) Four plots (selectable by which) are currently provided: a plot of residuals against fitted values, a Scale-Location plot of sqrt{ residuals } against fitted values, a Normal Q-Q plot, and a plot of Cook's distances versus row labels (stats) Performs Bartlett's test of the null that the variances in each of the groups (samples) are the same (stats) bgtest: Breusch-Godfrey Test (lmtest) bptest: Breusch-Pagan Test (lmtest)

artificial intelligence, machine learning, stat, (17 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.86)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.88)

#artificialintelligenceAug-20-2016, 09:35:38 GMT

Linear Regression Analysis using R – Step Up Analytics

One of the most frequent used techniques in statistics is linear regression where we investigate the potential relationship between a variable of interest (often called the response variable but there are many other names in use) and a set of one of more variables (known as the independent variables or some other term). Unsurprisingly there are flexible facilities inR for fitting a range of linear models from the simple case of a single variable to more complex relationships. In this post we will consider the case of simple linear regression with one response variable and a single independent variable. The purpose of using this data is to determine whether there is a relationship, described by a simple linear regression model, between variables. You seen in the image that first i checked my working directory and then changed it to another directory, this means the working datafiles have another location so i changed it for my help.

artificial intelligence, linear regression analysis, machine learning, (3 more...)

Genre:

Research Report > New Finding (0.40)
Research Report > Experimental Study (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

#artificialintelligenceAug-19-2016, 09:05:55 GMT

Simple Logistic Regression using Keras

This post basically takes the tutorial on Classifying MNIST digits using Logistic Regression which is primarily written for Theano and attempts to port it to Keras. So, what better way to put that claim to the test than to write some code! Keras comes with great documentation. One can really get up and running in a matter of minutes. Everything needed to accomplish the goal can be found on the Guide to Sequential Model page (assuming of course the initial setup and configuration is all taken care of).

artificial intelligence, logistic regression, machine learning, (2 more...)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

arXiv.org Machine LearningAug-19-2016

High-dimensional Mixed Graphical Models

Cheng, Jie, Li, Tianxi, Levina, Elizaveta, Zhu, Ji

High-Dimensional Mixed Graphical Models Jie Cheng †, Tianxi Li‡, Elizaveta Levina‡, Ji Zhu‡ † Google, Inc.,‡ Department of Statistics, University of Michigan March 22, 2018 Abstract While graphical models for continuous data (Gaussian graphical models) and discrete data (Ising models) have been extensively studied, there is little work on graphical models for data sets with both continuous and discrete variables (mixed data), which are common in many scientific applications. We propose a novel graphical model for mixed data, which is simple enough to be suitable for high-dimensional data, yet flexible enough to represent all possible graph structures. We develop a computationally efficient regression-based algorithm for fitting the model by focusing on the conditional log-likelihood of each variable given the rest. The parameters have a natural group structure, and sparsity in the fitted graph is attained by incorporating a group lasso penalty, approximated by a weighted lasso penalty for computational efficiency. We demonstrate the effectiveness of our method through an extensive simulation study and apply it to a music annotation data set (CAL500), obtaining a sparse and interpretable graphical model relating the continuous features of the audio signal to binary variables such as genre, emotions, and usage associated with particular songs. 1 arXiv:1304.2810v3 Key Words: Conditional Gaussian density, Graphical model, Group lasso, Mixed variables, Music annotation. 1 Introduction Graphical models have proven to be a useful tool in representing the conditional dependency structure of multivariate distributions. The undirected graphical model in particular, sometimes also referred to as the Markov network, has drawn a notable amount of attention over the past decade. In an undirected graphical model, nodes in the graph represent the variables, while an edge between a pair of variables indicates that they are dependent conditional on all other variables. The properties of these models are by now well understood and studied both in the classical and the high-dimensional settings. Both these models can only deal with variables of one kind - either all continuous variables in Gaussian models or all binary variables in the Ising model (extensions of the Ising model to general discrete data, while possible in principle, are rarely used in 2 practice). In many applications, however, data sources are complex and varied, and frequently result in mixed types of data, with both continuous and discrete variables present in the same dataset. In this paper, we will focus on graphical models for this type of mixed data (mixed graphical models).

artificial intelligence, machine learning, regression, (15 more...)

1304.281

Country: North America > United States > Michigan (0.24)

Genre: Research Report (0.66)

Industry:

Media > Music (0.86)
Leisure & Entertainment (0.86)

Technology:

Information Technology > Artificial Intelligence > Systems & Languages (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)

Yi, Xinyang, Caramanis, Constantine, Sanghavi, Sujay

Solving a Mixture of Many Random Linear Equations by Tensor Decomposition and Alternating Minimization

arXiv.org Machine LearningAug-19-2016

We consider the problem of solving mixed random linear equations with $k$ components. This is the noiseless setting of mixed linear regression. The goal is to estimate multiple linear models from mixed samples in the case where the labels (which sample corresponds to which model) are not observed. We give a tractable algorithm for the mixed linear equation problem, and show that under some technical conditions, our algorithm is guaranteed to solve the problem exactly with sample complexity linear in the dimension, and polynomial in $k$, the number of components. Previous approaches have required either exponential dependence on $k$, or super-linear dependence on the dimension. The proposed algorithm is a combination of tensor decomposition and alternating minimization. Our analysis involves proving that the initialization provided by the tensor method allows alternating minimization, which is equivalent to EM in our setting, to converge to the global optimum at a linear rate.

artificial intelligence, machine learning, probability, (13 more...)

1608.05749

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

#artificialintelligenceAug-18-2016, 08:40:30 GMT

NCBI-Hackathons/Machine_Learning_Immunogenicity

This project looks into the application of Machine Learning (ML) techniques in the prediction of Immunogenicity (Categorical; Positive or Negative) based on a peptide and its associated amino acid properties. This study uses peptide data from the Immune Epitode Database (IEDB). The R package "Peptides" has been used to compute the amino acid properties and mashup with peptide data to enable the use of ML algorithms for immunogenicity analysis, particularly, the algorithms that are more efficient with numeric and categorical data instead of string sequence. Tensorflow is an open source software library ML that provides linear regression and classification algorithms (open sourced by Google in Nov 2015) for multi-dimensional arrays (aka "Tensors"). K-fold cross-validation as well as hold-out of test data was used to train and test the generated models.

artificial intelligence, immunogenicity, machine learning, (10 more...)

Genre: Contests & Prizes (0.43)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.44)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.39)

arXiv.org Machine LearningAug-17-2016

Conditional Sparse Linear Regression

Juba, Brendan

Linear regression, the fitting of linear relationships among variables in a data set, is a standard tool in data analysis. In particular, for the sake of interpretability and utility in further analysis, we desire to find highly sparse linear relationships, i.e., involving only a few variables. Of course, such simple linear relationships often will not hold across an entire population. But, more frequently there will exist conditions - perhaps a range of parameters or a segment of a larger population - under which such sparse models fit the data quite well. For example, Rosenfeld et al. [16] used data mining heuristics to identify small segments of a population in which a few additional risk factors were highly predictive of certain kinds of cancer, whereas these same risk factors were not significant in the overall population. Simple rules for special cases may also hint at the more complex general rules. More generally, we need to develop new techniques to reason about populations in which most members are atypical in some way, which are colloquially (and somewhat abusively) referred to as long-tailed distributions. We are seeking principled alternatives to ad-hoc approaches such as trying a variety of methods for clustering the data and hoping that the identified clusters can be modeled well.

algorithm, artificial intelligence, machine learning, (17 more...)

1608.05152

Country:

North America > United States (0.14)
Africa (0.14)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.34)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.63)

Futoma, Joseph, Sendak, Mark, Cameron, C. Blake, Heller, Katherine

Scalable Modeling of Multivariate Longitudinal Data for Prediction of Chronic Kidney Disease Progression

arXiv.org Machine LearningAug-16-2016

Prediction of the future trajectory of a disease is an important challenge for personalized medicine and population health management. However, many complex chronic diseases exhibit large degrees of heterogeneity, and furthermore there is not always a single readily available biomarker to quantify disease severity. Even when such a clinical variable exists, there are often additional related biomarkers routinely measured for patients that may better inform the predictions of their future disease state. To this end, we propose a novel probabilistic generative model for multivariate longitudinal data that captures dependencies between multivariate trajectories. We use a Gaussian process based regression model for each individual trajectory, and build off ideas from latent class models to induce dependence between their mean functions. We fit our method using a scalable variational inference algorithm to a large dataset of longitudinal electronic patient health records, and find that it improves dynamic predictions compared to a recent state of the art method. Our local accountable care organization then uses the model predictions during chart reviews of high risk patients with chronic kidney disease.

machine learning, prediction, trajectory, (13 more...)

1608.04615

Country: North America > United States (0.15)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Nephrology (1.00)

Technology:

Information Technology > Data Science (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)