AITopics | Regression

Collaborating Authors

Regression

News Overviews Instructional Materials AI-Alerts Classics

Plug-in Regularized Estimation of High-Dimensional Parameters in Nonlinear Semiparametric Models

Chernozhukov, Victor, Nekipelov, Denis, Semenova, Vira, Syrgkanis, Vasilis

arXiv.org Machine LearningJun-30-2018

We develop a theory for estimation of a high-dimensional sparse parameter $\theta$ defined as a minimizer of a population loss function $L_D(\theta,g_0)$ which, in addition to $\theta$, depends on a, potentially infinite dimensional, nuisance parameter $g_0$. Our approach is based on estimating $\theta$ via an $\ell_1$-regularized minimization of a sample analog of $L_S(\theta, \hat{g})$, plugging in a first-stage estimate $\hat{g}$, computed on a hold-out sample. We define a population loss to be (Neyman) orthogonal if the gradient of the loss with respect to $\theta$, has pathwise derivative with respect to $g$ equal to zero, when evaluated at the true parameter and nuisance component. We show that orthogonality implies a second-order impact of the first stage nuisance error on the second stage target parameter estimate. Our approach applies to both convex and non-convex losses, albeit the latter case requires a small adaptation of our method with a preliminary estimation step of the target parameter. Our result enables oracle convergence rates for $\theta$ under assumptions on the first stage rates, typically of the order of $n^{-1/4}$. We show how such an orthogonal loss can be constructed via a novel orthogonalization process for a general model defined by conditional moment restrictions. We apply our theory to high-dimensional versions of standard estimation problems in statistics and econometrics, such as: estimation of conditional moment models with missing data, estimation of structural utilities in games of incomplete information and estimation of treatment effects in regression models with non-linear link functions.

artificial intelligence, estimation, machine learning, (18 more...)

arXiv.org Machine Learning

1806.04823

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.66)

Add feedback

A Tour of The Top 10 Algorithms for Machine Learning Newbies

#artificialintelligenceJun-29-2018, 18:31:55 GMT

In machine learning, there's something called the "No Free Lunch" theorem. In a nutshell, it states that no one algorithm works best for every problem, and it's especially relevant for supervised learning (i.e. For example, you can't say that neural networks are always better than decision trees or vice-versa. There are many factors at play, such as the size and structure of your dataset. As a result, you should try many different algorithms for your problem, while using a hold-out "test set" of data to evaluate performance and select the winner.

algorithm, artificial intelligence, machine learning, (14 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.33)

Add feedback

Polynomial Regression As an Alternative to Neural Nets

Cheng, Xi, Khomtchouk, Bohdan, Matloff, Norman, Mohanty, Pete

arXiv.org Machine LearningJun-29-2018

Despite the success of neural networks (NNs), there is still a concern among many over their "black box" nature. Why do they work? Here we present a simple analytic argument that NNs are in fact essentially polynomial regression models. This view will have various implications for NNs, e.g. providing an explanation for why convergence problems arise in NNs, and it gives rough guidance on avoiding overfitting. In addition, we use this phenomenon to predict and confirm a multicollinearity property of NNs not previously reported in the literature. Most importantly, given this loose correspondence, one may choose to routinely use polynomial models instead of NNs, thus avoiding some major problems of the latter, such as having to set many tuning parameters and dealing with convergence issues. We present a number of empirical results; in each case, the accuracy of the polynomial approach matches or exceeds that of NN approaches. A many-featured, open-source software package, polyreg, is available.

artificial intelligence, machine learning, polynomial, (16 more...)

arXiv.org Machine Learning

1806.0685

Country:

North America > United States > California > Yolo County > Davis (0.14)
North America > United States > New York (0.04)
Asia > Middle East > Jordan (0.04)
(3 more...)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Education > Educational Setting > Online (0.68)
Education > Educational Technology > Educational Software > Computer Based Training (0.46)

Technology:

Information Technology > Software (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Probabilistic Bisection with Spatial Metamodels

Rodriguez, Sergio, Ludkovski, Mike

arXiv.org Machine LearningJun-29-2018

Probabilistic Bisection Algorithm performs root finding based on knowledge acquired from noisy oracle responses. We consider the generalized PBA setting (G-PBA) where the statistical distribution of the oracle is unknown and location-dependent, so that model inference and Bayesian knowledge updating must be performed simultaneously. To this end, we propose to leverage the spatial structure of a typical oracle by constructing a statistical surrogate for the underlying logistic regression step. We investigate several non-parametric surrogates, including Binomial Gaussian Processes (B-GP), Polynomial, Kernel, and Spline Logistic Regression. In parallel, we develop sampling policies that adaptively balance learning the oracle distribution and learning the root. One of our proposals mimics active learning with B-GPs and provides a novel look-ahead predictive variance formula. The resulting gains of our Spatial PBA algorithm relative to earlier G-PBA models are illustrated with synthetic examples and a challenging stochastic root finding problem from Bermudan option pricing.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Machine Learning

1807.00095

Country:

North America > United States > California > Santa Barbara County > Santa Barbara (0.14)
Europe > Austria > Vienna (0.14)
North America > Mexico (0.04)
North America > United States > New York (0.04)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Single Index Latent Variable Models for Network Topology Inference

Mei, Jonathan, Moura, José M. F.

arXiv.org Machine LearningJun-28-2018

A semi-parametric, non-linear regression model in the presence of latent variables is applied towards learning network graph structure. These latent variables can correspond to unmodeled phenomena or unmeasured agents in a complex system of interacting entities. This formulation jointly estimates non-linearities in the underlying data generation, the direct interactions between measured entities, and the indirect effects of unmeasured processes on the observed data. The learning is posed as regularized empirical risk minimization. Details of the algorithm for learning the model are outlined. Experiments demonstrate the performance of the learned model on real data.

artificial intelligence, latent variable, machine learning, (16 more...)

arXiv.org Machine Learning

1807.00002

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Rocky Mountains (0.04)
North America > United States > North Dakota > Cass County > Fargo (0.04)
(3 more...)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

Tight Prediction Intervals Using Expanded Interval Minimization

Su, Dongqi, Ting, Ying Yin, Ansel, Jason

arXiv.org Machine LearningJun-28-2018

Prediction intervals are a valuable way of quantifying uncertainty in regression problems. Good prediction intervals should be both correct, containing the actual value between the lower and upper bound at least a target percentage of the time; and tight, having a small mean width of the bounds. Many prior techniques for generating prediction intervals make assumptions on the distribution of error, which causes them to work poorly for problems with asymmetric distributions. This paper presents Expanded Interval Minimization (EIM), a novel loss function for generating prediction intervals using neural networks. This loss function uses minibatch statistics to estimate the coverage and optimize the width of the prediction intervals. It does not make the same assumptions on the distributions of data and error as prior work. We compare to three published techniques and show EIM produces on average 1.37x tighter prediction intervals and in the worst case 1.06x tighter intervals across two large real-world datasets and varying coverage levels.

artificial intelligence, machine learning, prediction interval, (15 more...)

arXiv.org Machine Learning

1806.11222

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.30)

Add feedback

Logistic Regression from scratch (and how to make it nonlinear)

#artificialintelligenceJun-27-2018, 02:52:09 GMT

Logistic Regression is a staple of the data science workflow. Below, I show how to implement Logistic Regression with Stochastic Gradient Descent (SGD) in a few dozen lines of Python code, using NumPy. Then I will show how to build a nonlinear decision boundary with Logistic Regression by using feature crosses. Here is the repo with the full code shown below. Although, in many applications Logistic Regression has been replaced by more advanced techniques such as ensemble tree-based methods (like gradient boosting) or by deep neural networks. However, it is still commonly used due to its simplicity and interpretability.

artificial intelligence, logistic regression, machine learning, (11 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

Contextual bandits with surrogate losses: Margin bounds and efficient algorithms

Foster, Dylan J., Krishnamurthy, Akshay

arXiv.org Machine LearningJun-27-2018

We introduce a new family of margin-based regret guarantees for adversarial contextual bandit learning. Our results are based on multiclass surrogate losses. Using the ramp loss, we derive a universal margin-based regret bound in terms of the sequential metric entropy for a benchmark class of real-valued regression functions. The new margin bound serves as a complete contextual bandit analogue of the classical margin bound from statistical learning. The result applies to large nonparametric classes, improving on the best known results for Lipschitz contextual bandits (Cesa-Bianchi et al., 2017) and, as a special case, generalizes the dimension-independent Banditron regret bound (Kakade et al., 2008) to arbitrary linear classes with smooth norms. On the algorithmic side, we use the hinge loss to derive an efficient algorithm with a $\sqrt{dT}$-type mistake bound against benchmark policies induced by $d$-dimensional regression functions. This provides the first hinge loss-based solution to the open problem of Abernethy and Rakhlin (2009). With an additional i.i.d. assumption we give a simple oracle-efficient algorithm whose regret matches our generic metric entropy-based bound for sufficiently complex nonparametric classes. Under realizability assumptions our results also yield classical regret bounds.

algorithm, artificial intelligence, machine learning, (13 more...)

arXiv.org Machine Learning

1806.10745

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > New York > Tompkins County > Ithaca (0.04)
(3 more...)

Genre: Research Report > New Finding (0.54)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

Add feedback

Uncoupled isotonic regression via minimum Wasserstein deconvolution

Rigollet, Philippe, Weed, Jonathan

arXiv.org Machine LearningJun-27-2018

Isotonic regression is a standard problem in shape-constrained estimation where the goal is to estimate an unknown nondecreasing regression function $f$ from independent pairs $(x_i, y_i)$ where $\mathbb{E}[y_i]=f(x_i), i=1, \ldots n$. While this problem is well understood both statistically and computationally, much less is known about its uncoupled counterpart where one is given only the unordered sets $\{x_1, \ldots, x_n\}$ and $\{y_1, \ldots, y_n\}$. In this work, we leverage tools from optimal transport theory to derive minimax rates under weak moments conditions on $y_i$ and to give an efficient algorithm achieving optimal rates. Both upper and lower bounds employ moment-matching arguments that are also pertinent to learning mixtures of distributions and deconvolution.

artificial intelligence, machine learning, regression, (18 more...)

arXiv.org Machine Learning

1806.10648

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > New York (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(4 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.50)

Add feedback

TopoReg: A Topological Regularizer for Classifiers

Chen, Chao, Ni, Xiuyan, Bai, Qinxun, Wang, Yusu

arXiv.org Machine LearningJun-27-2018

Regularization plays a crucial role in supervised learning. A successfully regularized model strikes a balance between a perfect description of the training data and the ability to generalize to unseen data. Most existing methods enforce a global regularization in a structure agnostic manner. In this paper, we initiate a new direction and propose to enforce the structural simplicity of the classification boundary by regularizing over its topological complexity. In particular, our measurement of topological complexity incorporates the importance of topological features (e.g., connected components, handles, and so on) in a meaningful manner, and provides a direct control over spurious topological structures. We incorporate the new measurement as a topological loss in training classifiers. We also propose an efficient algorithm to compute the gradient. Our method provides a novel way to topologically simplify the global structure of the model, without having to sacrifice too much of the flexibility of the model. We demonstrate the effectiveness of our new topological regularizer on a range of synthetic and real-world datasets.

artificial intelligence, classifier, machine learning, (15 more...)

arXiv.org Machine Learning

1806.10714

Country:

North America > United States > New York (0.05)
North America > United States > Ohio > Franklin County > Columbus (0.04)
North America > United States > California > Santa Clara County > Santa Clara (0.04)

Genre: Research Report (0.65)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)

Add feedback