AITopics

1508.01217

Country: North America > United States > Michigan (0.28)

Genre: Research Report > Promising Solution (0.34)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)

@machinelearnbotJun-8-2017, 22:00:12 GMT

[P] Extracting input-to-output gradients from a Keras model • r/MachineLearning

Hi, so I am coming from a background in linear algebra and traditional numerical gradient-based optimization, but excited by the advancements that have been made in deep learning. To get my feet wet a bit, I made a pretty simple NN model to do some non-linear regressions for me. I uploaded my jupyter notebookit as a gist here (renders properly on github), which is pretty short and to the point. It just fits the 1D function y (x - 5)2 / 25. I know that Theano and Tensorflow are, at their core, graph based derivative (gradient) passing frameworks.

artificial intelligence, deep learning, machine learning, (6 more...)

Industry: Media > News (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.61)

arXiv.org Machine LearningJun-8-2017

Collaborative Filtering with Side Information: a Gaussian Process Perspective

Kim, Hyunjik, Lu, Xiaoyu, Flaxman, Seth, Teh, Yee Whye

We tackle the problem of collaborative filtering (CF) with side information, through the lens of Gaussian Process (GP) regression. Driven by the idea of using the kernel to explicitly model user-item similarities, we formulate the GP in a way that allows the incorporation of low-rank matrix factorisation, arriving at our model, the Tucker Gaussian Process (TGP). Consequently, TGP generalises classical Bayesian matrix factorisation models, and goes beyond them to give a natural and elegant method for incorporating side information, giving enhanced predictive performance for CF problems. Moreover we show that it is a novel model for regression, especially well-suited to grid-structured data and problems where the dependence on covariates is close to being separable.

kernel, side information, tgp, (14 more...)

1605.07025

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
North America > United States > California > Los Angeles County > Los Angeles (0.04)
North America > Canada > Ontario > Toronto (0.04)
(2 more...)

Genre: Research Report > Promising Solution (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)

Lash, Michael T., Lin, Qihang, Street, W. Nick, Robinson, Jennifer G.

A budget-constrained inverse classification framework for smooth classifiers

arXiv.org Machine LearningJun-8-2017

Inverse classification is the process of manipulating an instance such that it is more likely to conform to a specific class. Past methods that address such a problem have shortcomings. Greedy methods make changes that are overly radical, often relying on data that is strictly discrete. Other methods rely on certain data points, the presence of which cannot be guaranteed. In this paper we propose a general framework and method that overcomes these and other limitations. The formulation of our method can use any differentiable classification function. We demonstrate the method by using logistic regression and Gaussian kernel SVMs. We constrain the inverse classification to occur on features that can actually be changed, each of which incurs an individual cost. We further subject such changes to fall within a certain level of cumulative change (budget). Our framework can also accommodate the estimation of (indirectly changeable) features whose values change as a consequence of actions taken. Furthermore, we propose two methods for specifying feature-value ranges that result in different algorithmic behavior. We apply our method, and a proposed sensitivity analysis-based benchmark method, to two freely available datasets: Student Performance from the UCI Machine Learning Repository and a real world cardiovascular disease dataset. The results obtained demonstrate the validity and benefits of our framework and method.

artificial intelligence, machine learning, recommendation, (19 more...)

1605.09068

Country: North America > United States > Iowa (0.28)

Genre:

Research Report > Experimental Study (0.88)
Research Report > New Finding (0.66)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Education (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

@machinelearnbotJun-7-2017, 15:10:06 GMT

Top 20 Data Science MOOCs

Introduce yourself to the basics of data science and leave armed with practical experience extracting value from big data. This course teaches the basic techniques of data science, including both SQL and NoSQL solutions for massive data management (e.g., MapReduce and contemporaries), algorithms for data mining (e.g., clustering and association rule mining), and basic statistical modelling (e.g., linear and non-linear regression).

artificial intelligence, data mining, machine learning, (14 more...)

Country:

North America > United States > California (0.05)
North America > United States > Texas > Travis County > Austin (0.05)
North America > United States > Illinois (0.05)

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Industry:

Education > Educational Setting > Online (0.87)
Education > Educational Technology > Educational Software > Computer Based Training (0.69)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.70)

arXiv.org Machine LearningJun-7-2017

A Convex Framework for Fair Regression

Berk, Richard, Heidari, Hoda, Jabbari, Shahin, Joseph, Matthew, Kearns, Michael, Morgenstern, Jamie, Neel, Seth, Roth, Aaron

The widespread use of machine learning to make consequential decisions about individual citizens (including in domains such as credit, employment, education and criminal sentencing [3, 4, 26, 29]) has been accompanied by increased reports of instances in which the algorithms and models employed can be unfair or discriminatory in a variety of ways [2, 30]. As a result, research on fairness in machine learning and statistics has seen rapid growth in recent years [1, 5-7, 9-11, 13, 14, 18-21, 25, 27], and several mathematical formulations have been proposed as metrics of (un)fairness for a number of different learning frameworks. While much of the attention to date has focused on (binary) classification settings, where standard fairness notions include equal false positive or negative rates across different populations, less attention has been paid to fairness in (linear and logistic) regression settings, where the target and/or predicted values are continuous, and the same value may not occur even twice in the training data. In this work, we introduce a rich family of fairness metrics for regression models that take the form of a fairness regularizer and apply them to the standard loss functions for linear and logistic regression. Since these loss functions and our fairness regularizer are convex, the combined objective functions obtained from our framework are also convex, and thus permit efficient optimization. Furthermore, our family of fairness metrics covers the spectrum from the type of group fairness that is common in classification formulations (where e.g.

artificial intelligence, dataset, machine learning, (16 more...)

1706.02409

Country: North America > United States (0.68)

Genre: Research Report > New Finding (0.88)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Law (1.00)
Banking & Finance > Credit (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.88)

@machinelearnbotJun-6-2017, 22:10:06 GMT

What is the Role of the Activation Function in a Neural Network?

Sorry if this is too trivial, but let me start at the "very beginning:" Linear regression. The goal of (ordinary least-squares) linear regression is to find the optimal weights that -- when linearly combined with the inputs -- result in a model that minimizes the vertical offsets between the target and explanatory variables, but let's not get distracted by model fitting, which is a different topic;). So, in linear regression, we compute a linear combination of weights and inputs (let's call this function the "net input function"). Next, let's consider logistic regression. Here, we put the net input z through a non-linear "activation function" -- the logistic sigmoid function where.

activation function, artificial intelligence, machine learning, (12 more...)

Country: North America > United States > Michigan (0.06)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

@machinelearnbotJun-6-2017, 17:01:32 GMT

24 Uses of Statistical Modeling (Part I)

Here we discuss general applications of statistical models, whether they arise from data science, operations research, engineering, machine learning or statistics. We do not discuss specific algorithms such as decision trees, logistic regression, Bayesian modeling, Markov models, data reduction or feature selection. Instead, I discuss frameworks - each one using its own types of techniques and algorithms - to solve real life problems. Most of the entries below are found in Wikipedia, and I have used a few definitions or extracts from the relevant Wikipedia articles, in addition to personal contributions. Spatial dependency is the co-variation of properties within geographic space: characteristics at proximal locations appear to be correlated, either positively or negatively.

artificial intelligence, machine learning, statistics, (15 more...)

Industry: Banking & Finance (0.98)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.36)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.35)

@machinelearnbotJun-5-2017, 15:25:12 GMT

Is Regression Analysis Really Machine Learning?

That's a broad topic which has been treated many times. Much of what has been written on this topic is good, much is bad. But I find that the stats vs. machine learning argument, at that level, tends to focus on the forest at the cost of completely overlooking the trees. Shah's definitions, which I believe are reflective of many approaches, tend to focus on different ends of the respective spectrums of each of these concepts, treating machine learning as a practical activity and statistics as a theoretical abstraction (and, yes, I'm lumping "statistical modeling" together with "statistics" in this case... at least, for now). The relationship between statistics and machine learning is actually a highly complex one, and merely defining the 2 concepts is not helpful in dissecting this connection.

artificial intelligence, machine learning, statistics, (12 more...)

Country: Europe > Switzerland > Geneva > Geneva (0.05)

Genre:

Research Report > New Finding (0.42)
Research Report > Experimental Study (0.42)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.56)

Sur, Pragya, Chen, Yuxin, Candès, Emmanuel J.

The Likelihood Ratio Test in High-Dimensional Logistic Regression Is Asymptotically a Rescaled Chi-Square

arXiv.org Machine LearningJun-5-2017

Logistic regression is used thousands of times a day to fit data, predict future outcomes, and assess the statistical significance of explanatory variables. When used for the purpose of statistical inference, logistic models produce p-values for the regression coefficients by using an approximation to the distribution of the likelihood-ratio test. Indeed, Wilks' theorem asserts that whenever we have a fixed number $p$ of variables, twice the log-likelihood ratio (LLR) $2\Lambda$ is distributed as a $\chi^2_k$ variable in the limit of large sample sizes $n$; here, $k$ is the number of variables being tested. In this paper, we prove that when $p$ is not negligible compared to $n$, Wilks' theorem does not hold and that the chi-square approximation is grossly incorrect; in fact, this approximation produces p-values that are far too small (under the null hypothesis). Assume that $n$ and $p$ grow large in such a way that $p/n\rightarrow\kappa$ for some constant $\kappa < 1/2$. We prove that for a class of logistic models, the LLR converges to a rescaled chi-square, namely, $2\Lambda~\stackrel{\mathrm{d}}{\rightarrow}~\alpha(\kappa)\chi_k^2$, where the scaling factor $\alpha(\kappa)$ is greater than one as soon as the dimensionality ratio $\kappa$ is positive. Hence, the LLR is larger than classically assumed. For instance, when $\kappa=0.3$, $\alpha(\kappa)\approx1.5$. In general, we show how to compute the scaling factor by solving a nonlinear system of two equations with two unknowns. Our mathematical arguments are involved and use techniques from approximate message passing theory, non-asymptotic random matrix theory and convex geometry. We also complement our mathematical study by showing that the new limiting distribution is accurate for finite sample sizes. Finally, all the results from this paper extend to some other regression models such as the probit regression model.

artificial intelligence, machine learning, probability, (19 more...)

1706.01191

Country: North America > United States (0.92)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)