Goto

Collaborating Authors

 Regression


How to get the most out of Twinned Regression Methods

arXiv.org Artificial Intelligence

Twinned regression methods are designed to solve the dual problem to the original regression problem, predicting differences between regression targets rather then the targets themselves. A solution to the original regression problem can be obtained by ensembling predicted differences between the targets of an unknown data point and multiple known anchor data points. We explore different aspects of twinned regression methods: (1) We decompose different steps in twinned regression algorithms and examine their contributions to the final performance, (2) We examine the intrinsic ensemble quality, (3) We combine twin neural network regression with k-nearest neighbor regression to design a more accurate and efficient regression method, and (4) we develop a simplified semi-supervised regression scheme.


Mining the Factor Zoo: Estimation of Latent Factor Models with Sufficient Proxies

arXiv.org Artificial Intelligence

Latent factor model estimation typically relies on either using domain knowledge to manually pick several observed covariates as factor proxies, or purely conducting multivariate analysis such as principal component analysis. However, the former approach may suffer from the bias while the latter can not incorporate additional information. We propose to bridge these two approaches while allowing the number of factor proxies to diverge, and hence make the latent factor model estimation robust, flexible, and statistically more accurate. As a bonus, the number of factors is also allowed to grow. At the heart of our method is a penalized reduced rank regression to combine information. To further deal with heavy-tailed data, a computationally attractive penalized robust reduced rank regression method is proposed. We establish faster rates of convergence compared with the benchmark. Extensive simulations and real examples are used to illustrate the advantages.


Times Series and Trends with Plotly and Pandas

#artificialintelligence

Of all the graphing libraries in the land, Plotly is one of the best -- it is also one of the most frustrating. On the positive side, Plotly is capable of producing excellent visualizations, allows you to avoid Java (if that's not your thing), and natively integrates with HTML. On the negative side, the syntax can be quite confusing when switching between single plots and mixed plots. For example, with plotly_express (px) you might pass an entire dataframe as a parameter; however, with graph_objects (go), the inputs change and may require the use of dictionaries and Pandas Series instead of DataFrames. Of all the graphing libraries in the land, Plotly is one of the best -- it is also one of the most frustrating.


Online Linearized LASSO

arXiv.org Artificial Intelligence

Sparse regression has been a popular approach to perform variable selection and enhance the prediction accuracy and interpretability of the resulting statistical model. Existing approaches focus on offline regularized regression, while the online scenario has rarely been studied. In this paper, we propose a novel online sparse linear regression framework for analyzing streaming data when data points arrive sequentially. Our proposed method is memory efficient and requires less stringent restricted strong convexity assumptions. Theoretically, we show that with a properly chosen regularization parameter, the $\ell_2$-norm statistical error of our estimator diminishes to zero in the optimal order of $\tilde{O}({\sqrt{s/t}})$, where $s$ is the sparsity level, $t$ is the streaming sample size, and $\tilde{O}(\cdot)$ hides logarithmic terms. Numerical experiments demonstrate the practical efficiency of our algorithm.


Weight Decay in Multilayer Perceptrons in Deep Learning Computation

#artificialintelligence

Weight decay, also known as L2 regularization, is a technique used in machine learning to prevent overfitting by adding a penalty term to the objective function that is being optimized. The goal of weight decay is to reduce the complexity of the model by limiting the size of the weights, which can help to prevent overfitting and improve the generalization ability of the model. Weight decay is typically implemented by adding a term to the objective function that is proportional to the sum of the squares of the weights. The strength of the weight decay penalty is controlled by a hyperparameter called the decay rate or regularization strength, which determines the amount of weight decay applied to the model. For example, let's say we are training a linear regression model to predict the price of a house based on the number of bedrooms and the square footage.


Confidence Sets under Generalized Self-Concordance

arXiv.org Machine Learning

This paper revisits a fundamental problem in statistical inference from a non-asymptotic theoretical viewpoint $\unicode{x2013}$ the construction of confidence sets. We establish a finite-sample bound for the estimator, characterizing its asymptotic behavior in a non-asymptotic fashion. An important feature of our bound is that its dimension dependency is captured by the effective dimension $\unicode{x2013}$ the trace of the limiting sandwich covariance $\unicode{x2013}$ which can be much smaller than the parameter dimension in some regimes. We then illustrate how the bound can be used to obtain a confidence set whose shape is adapted to the optimization landscape induced by the loss function. Unlike previous works that rely heavily on the strong convexity of the loss function, we only assume the Hessian is lower bounded at optimum and allow it to gradually becomes degenerate. This property is formalized by the notion of generalized self-concordance which originated from convex optimization. Moreover, we demonstrate how the effective dimension can be estimated from data and characterize its estimation accuracy. We apply our results to maximum likelihood estimation with generalized linear models, score matching with exponential families, and hypothesis testing with Rao's score test.


An Efficient Hierarchical Kriging Modeling Method for High-dimension Multi-fidelity Problems

arXiv.org Artificial Intelligence

Multi-fidelity Kriging model is a promising technique in surrogate-based design as it can balance the model accuracy and cost of sample preparation by fusing low- and high-fidelity data. However, the cost for building a multi-fidelity Kriging model increases significantly with the increase of the problem dimension. To attack this issue, an efficient Hierarchical Kriging modeling method is proposed. In building the low-fidelity model, the maximal information coefficient is utilized to calculate the relative value of the hyperparameter. With this, the maximum likelihood estimation problem for determining the hyperparameters is transformed as a one-dimension optimization problem, which can be solved in an efficient manner and thus improve the modeling efficiency significantly. A local search is involved further to exploit the search space of hyperparameters to improve the model accuracy. The high-fidelity model is built in a similar manner with the hyperparameter of the low-fidelity model served as the relative value of the hyperparameter for high-fidelity model. The performance of the proposed method is compared with the conventional tuning strategy, by testing them over ten analytic problems and an engineering problem of modeling the isentropic efficiency of a compressor rotor. The empirical results demonstrate that the modeling time of the proposed method is reduced significantly without sacrificing the model accuracy. For the modeling of the isentropic efficiency of the compressor rotor, the cost saving associated with the proposed method is about 90% compared with the conventional strategy. Meanwhile, the proposed method achieves higher accuracy.


Mastering the Art of Linear Regression: A Comprehensive Guide

#artificialintelligence

Linear regression is a statistical technique for modeling the relationship between a dependent variable and one or more independent variables. At its core, linear regression is a method for predicting a numerical outcome based on a set of input variables. But what exactly is linear regression and how does it work? In this article, we'll delve into the fundamentals of linear regression and explore its applications in a variety of fields, including economics, finance, and machine learning. We'll also discuss some of the key challenges and limitations of using linear regression, and provide practical tips for implementing it in your own analyses.


How do noise tails impact on deep ReLU networks?

arXiv.org Machine Learning

This paper investigates the stability of deep ReLU neural networks for nonparametric regression under the assumption that the noise has only a finite p-th moment. We unveil how the optimal rate of convergence depends on p, the degree of smoothness and the intrinsic dimension in a class of nonparametric regression functions with hierarchical composition structure when both the adaptive Huber loss and deep ReLU neural networks are used. This optimal rate of convergence cannot be obtained by the ordinary least squares but can be achieved by the Huber loss with a properly chosen parameter that adapts to the sample size, smoothness, and moment parameters. A concentration inequality for the adaptive Huber ReLU neural network estimators with allowable optimization errors is also derived. To establish a matching lower bound within the class of neural network estimators using the Huber loss, we employ a different strategy from the traditional route: constructing a deep ReLU network estimator that has a better empirical loss than the true function and the difference between these two functions furnishes a low bound. This step is related to the Huberization bias, yet more critically to the approximability of deep ReLU networks. As a result, we also contribute some new results on the approximation theory of deep ReLU neural networks.


Essential Number of Principal Components and Nearly Training-Free Model for Spectral Analysis

arXiv.org Artificial Intelligence

Through a study of multi-gas mixture datasets, we show that in multi-component spectral analysis, the number of functional or non-functional principal components required to retain the essential information is the same as the number of independent constituents in the mixture set. Due to the mutual in-dependency among different gas molecules, near one-to-one projection from the principal component to the mixture constituent can be established, leading to a significant simplification of spectral quantification. Further, with the knowledge of the molar extinction coefficients of each constituent, a complete principal component set can be extracted from the coefficients directly, and few to none training samples are required for the learning model. Compared to other approaches, the proposed methods provide fast and accurate spectral quantification solutions with a small memory size needed.