Regression
Optimal link prediction with matrix logistic regression
Baldin, Nicolai, Berthet, Quentin
We consider the problem of link prediction, based on partial observation of a large network, and on side information associated to its vertices. The generative model is formulated as a matrix logistic regression. The performance of the model is analysed in a high-dimensional regime under a structural assumption. The minimax rate for the Frobenius-norm risk is established and a combinatorial estimator based on the penalised maximum likelihood approach is shown to achieve it. Furthermore, it is shown that this rate cannot be attained by any (randomised) algorithm computable in polynomial time under a computational complexity assumption.
Asymmetric kernel in Gaussian Processes for learning target variance
Pintea, Silvia L., van Gemert, Jan C., Smeulders, Arnold W. M.
This work incorporates the multi-modality of the data distribution into a Gaussian Process regression model. We approach the problem from a discriminative perspective by learning, jointly over the training data, the target space variance in the neighborhood of a certain sample through metric learning. We start by using data centers rather than all training samples. Subsequently, each center selects an individualized kernel metric. This enables each center to adjust the kernel space in its vicinity in correspondence with the topology of the targets --- a multi-modal approach. We additionally add descriptiveness by allowing each center to learn a precision matrix. We demonstrate empirically the reliability of the model.
High Dimensional Linear Regression using Lattice Basis Reduction
We consider a high dimensional linear regression problem where the goal is to efficiently recover an unknown vector $\beta^*$ from $n$ noisy linear observations $Y=X\beta^*+W \in \mathbb{R}^n$, for known $X \in \mathbb{R}^{n \times p}$ and unknown $W \in \mathbb{R}^n$. Unlike most of the literature on this model we make no sparsity assumption on $\beta^*$. Instead we adopt a regularization based on assuming that the underlying vectors $\beta^*$ have rational entries with the same denominator $Q \in \mathbb{Z}_{>0}$. We call this $Q$-rationality assumption. We propose a new polynomial-time algorithm for this task which is based on the seminal Lenstra-Lenstra-Lovasz (LLL) lattice basis reduction algorithm. We establish that under the $Q$-rationality assumption, our algorithm recovers exactly the vector $\beta^*$ for a large class of distributions for the iid entries of $X$ and non-zero noise $W$. We prove that it is successful under small noise, even when the learner has access to only one observation ($n=1$). Furthermore, we prove that in the case of the Gaussian white noise for $W$, $n=o\left(p/\log p\right)$ and $Q$ sufficiently large, our algorithm tolerates a nearly optimal information-theoretic level of the noise.
Rare Feature Selection in High Dimensions
It is common in modern prediction problems for many predictor variables to be counts of rarely occurring events. This leads to design matrices in which many columns are highly sparse. The challenge posed by such "rare features" has received little attention despite its prevalence in diverse areas, ranging from natural language processing (e.g., rare words) to biology (e.g., rare species). We show, both theoretically and empirically, that not explicitly accounting for the rareness of features can greatly reduce the effectiveness of an analysis. We next propose a framework for aggregating rare features into denser features in a flexible manner that creates better predictors of the response. Our strategy leverages side information in the form of a tree that encodes feature similarity. We apply our method to data from TripAdvisor, in which we predict the numerical rating of a hotel based on the text of the associated review. Our method achieves high accuracy by making effective use of rare words; by contrast, the lasso is unable to identify highly predictive words if they are too rare. A companion R package, called rare, implements our new estimator, using the alternating direction method of multipliers.
Double/De-Biased Machine Learning Using Regularized Riesz Representers
Chernozhukov, Victor, Newey, Whitney, Robins, James
We provide adaptive inference methods for linear functionals of sparse linear approximations to the conditional expectation function. Examples of such functionals include average derivatives, policy effects, average treatment effects, and many others. The construction relies on building Neyman-orthogonal equations that are approximately invariant to perturbations of the nuisance parameters, including the Riesz representer for the linear functionals. We use L1-regularized methods to learn approximations to the regression function and the Riesz representer, and construct the estimator for the linear functionals as the solution to the orthogonal estimating equations. We establish that under weak assumptions the estimator concentrates in a 1/root n neighborhood of the target with deviations controlled by the normal laws, and the estimator attains the semi-parametric efficiency bound in many cases. In particular, either the approximation to the regression function or the approximation to the Riesz representer can be "dense" as long as one of them is sufficiently "sparse". Our main results are non-asymptotic and imply asymptotic uniform validity over large classes of models.
Machine Learning With Python – Introduction – Developers Area
Python is a great programming language for data analysis. Scikit Learn is one of the scikit packages and its a very easy to use Machine learning package. It implements many machine learning algorithms and all you need to know is which algorithm solves your problem. We will use linear regression model. It is the easiest algorithm to start with – you have a function f(x) y, you have some pairs of (x,y) that match the function and you want to predict y for other x values.
Credit Risk Prediction Using Artificial Neural Network Algorithm
Credit risk or credit default indicates the probability of non-repayment of bank financial services that have been given to the customers. Credit risk has always been an extensively studied area in bank lending decisions. Credit risk plays a crucial role for banks and financial institutions, especially for commercial banks and it is always difficult to interpret and manage. Due to the advancements in technology, banks have managed to reduce the costs, in order to develop robust and sophisticated systems and models to predict and manage credit risk. To predict the credit default, several methods have been created and proposed.
Impacts of Dirty Data: and Experimental Evaluation
Qi, Zhixin, Wang, Hongzhi, Li, Jianzhong, Gao, Hong
Data quality issues have attracted widespread attention due to the negative impacts of dirty data on data mining and machine learning results. The relationship between data quality and the accuracy of results could be applied on the selection of the appropriate algorithm with the consideration of data quality and the determination of the data share to clean. However, rare research has focused on exploring such relationship. Motivated by this, this paper conducts an experimental comparison for the effects of missing, inconsistent and conflicting data on classification, clustering, and regression algorithms. Based on the experimental findings, we provide guidelines for algorithm selection and data cleaning.
Ten Machine Learning Algorithms You Should Know to Become a Data Scientist
Let's say I am given an Excel sheet with data about various fruits and I have to tell which look like Apples. What I will do is ask a question "Which fruits are red and round?" and divide all fruits which answer yes and no to the question. Now, All Red and Round fruits might not be apples and all apples won't be red and round. So I will ask a question "Which fruits have red or yellow color hints on them? " on red and round fruits and will ask "Which fruits are green and round?" on not red and round fruits. Based on these questions I can tell with considerable accuracy which are apples. This cascade of questions is what a decision tree is. However, this is a decision tree based on my intuition.
Minimax optimal rates for Mondrian trees and forests
Mourtada, Jaouad, Gaïffas, Stéphane, Scornet, Erwan
Originally introduced by [7], Random Forests (RF) are state-of-the-art classification and regression algorithms that proceed by averaging the forecasts of a number of randomized decision trees grown in parallel. Despite their widespread use and remarkable success in practical applications, the theoretical properties of such algorithms are still not fully understood. For an overview of theoretical results on random forests, see [5]. As a result of the complexity of the procedure, which combines sampling steps and feature selection, Breiman's original algorithm has proved difficult to analyze. Consequently, most theoretical studies focus on modified and stylized versions of Random Forests. Among these methods, Purely Random Forests (PRF) [6, 4, 3, 13, 2] that grow the individual trees independently of the sample, are particularly amenable to theoretical analysis. The consistency of such estimates (as well as other idealized RF procedures) was first obtained by [4], as a byproduct of the consistency of individual tree estimates. A recent line of research [25, 28, 18, 27] has sought to obtain some theoretical guarantees for RF variants that more closely resembled the algorithm used in practice. It should be noted, however, that most of these theoretical guarantees come at the price of assumptions either on the data structure or on the Random Forest algorithm itself, being thus still far from explaining the excellent empirical performance of Random Forests.