Regression
Twin Neural Network Regression is a Semi-Supervised Regression Algorithm
Wetzel, Sebastian J., Melko, Roger G., Tamblyn, Isaac
Twin neural network regression (TNNR) is a semi-supervised regression algorithm, it can be trained on unlabelled data points as long as other, labelled anchor data points, are present. TNNR is trained to predict differences between the target values of two different data points rather than the targets themselves. By ensembling predicted differences between the targets of an unseen data point and all training data points, it is possible to obtain a very accurate prediction for the original regression problem. Since any loop of predicted differences should sum to zero, loops can be supplied to the training data, even if the data points themselves within loops are unlabelled. Semi-supervised training improves TNNR performance, which is already state of the art, significantly.
Understanding the Under-Coverage Bias in Uncertainty Estimation
Bai, Yu, Mei, Song, Wang, Huan, Xiong, Caiming
This paper is concerned with the problem of uncertainty estimation in regression problems. Uncertainty estimation is an increasingly important task in modern machine learning applications--Models should not only make high-accuracy predictions, but also have a sense of how much the true label may deviate from the prediction. This capability is crucial for deploying machine learning in the real world, in particular in risk-sensitive domains such as medical AI [15, 29], self-driving cars [47], and so on. A common approach for uncertainty estimation in regression is to learn a quantile function or a prediction interval of the true label conditioned on the input, which provides useful distributional information about the label. Such learned quantiles are typically evaluated by their coverage, i.e., probability that it covers the true label on a new test example. For example, a learned 90% upper quantile function should be an actual upper bound of the true label at least 90% of the time. Algorithms for learning quantiles date back to the classical quantile regression [35], which estimates the quantile function by solving an empirical risk minimization problem with a suitable loss function that depends on the desired quantile level α.
Support Recovery of Sparse Signals from a Mixture of Linear Measurements
Gandikota, Venkata, Mazumdar, Arya, Pal, Soumyabrata
Recovery of support of a sparse vector from simple measurements is a widely studied problem, considered under the frameworks of compressed sensing, 1-bit compressed sensing, and more general single index models. We consider generalizations of this problem: mixtures of linear regressions, and mixtures of linear classifiers, where the goal is to recover supports of multiple sparse vectors using only a small number of possibly noisy linear, and 1-bit measurements respectively. The key challenge is that the measurements from different vectors are randomly mixed. Both of these problems were also extensively studied recently. In mixtures of linear classifiers, the observations correspond to the side of queried hyperplane a random unknown vector lies in, whereas in mixtures of linear regressions we observe the projection of a random unknown vector on the queried hyperplane. The primary step in recovering the unknown vectors from the mixture is to first identify the support of all the individual component vectors. In this work, we study the number of measurements sufficient for recovering the supports of all the component vectors in a mixture in both these models. We provide algorithms that use a number of measurements polynomial in $k, \log n$ and quasi-polynomial in $\ell$, to recover the support of all the $\ell$ unknown vectors in the mixture with high probability when each individual component is a $k$-sparse $n$-dimensional vector.
Fractal Structure and Generalization Properties of Stochastic Optimization Algorithms
Camuto, Alexander, Deligiannidis, George, Erdogdu, Murat A., Gürbüzbalaban, Mert, Şimşekli, Umut, Zhu, Lingjiong
Understanding generalization in deep learning has been one of the major challenges in statistical learning theory over the last decade. While recent work has illustrated that the dataset and the training algorithm must be taken into account in order to obtain meaningful generalization bounds, it is still theoretically not clear which properties of the data and the algorithm determine the generalization performance. In this study, we approach this problem from a dynamical systems theory perspective and represent stochastic optimization algorithms as random iterated function systems (IFS). Well studied in the dynamical systems literature, under mild assumptions, such IFSs can be shown to be ergodic with an invariant measure that is often supported on sets with a fractal structure. As our main contribution, we prove that the generalization error of a stochastic optimization algorithm can be bounded based on the `complexity' of the fractal structure that underlies its invariant measure. Leveraging results from dynamical systems theory, we show that the generalization error can be explicitly linked to the choice of the algorithm (e.g., stochastic gradient descent -- SGD), algorithm hyperparameters (e.g., step-size, batch-size), and the geometry of the problem (e.g., Hessian of the loss). We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden-layered neural networks) and algorithms (e.g., SGD and preconditioned variants), and obtain analytical estimates for our bound.For modern neural networks, we develop an efficient algorithm to compute the developed bound and support our theory with various experiments on neural networks.
On the Use of Minimum Penalties in Statistical Learning
Sherwood, Ben, Price, Bradley S.
Modern multivariate machine learning and statistical methodologies estimate parameters of interest while leveraging prior knowledge of the association between outcome variables. The methods that do allow for estimation of relationships do so typically through an error covariance matrix in multivariate regression which does not scale to other types of models. In this article we proposed the MinPEN framework to simultaneously estimate regression coefficients associated with the multivariate regression model and the relationships between outcome variables using mild assumptions. The MinPen framework utilizes a novel penalty based on the minimum function to exploit detected relationships between responses. An iterative algorithm that generalizes current state of the art methods is proposed as a solution to the non-convex optimization that is required to obtain estimates. Theoretical results such as high dimensional convergence rates, model selection consistency, and a framework for post selection inference are provided. We extend the proposed MinPen framework to other exponential family loss functions, with a specific focus on multiple binomial responses. Tuning parameter selection is also addressed. Finally, simulations and two data examples are presented to show the finite sample properties of this framework.
Fully differentiable model discovery
Model discovery aims at autonomously discovering differential equations underlying a dataset. Approaches based on Physics Informed Neural Networks (PINNs) have shown great promise, but a fully-differentiable model which explicitly learns the equation has remained elusive. In this paper we propose such an approach by combining neural network based surrogates with Sparse Bayesian Learning (SBL). We start by reinterpreting PINNs as multitask models, applying multitask learning using uncertainty, and show that this leads to a natural framework for including Bayesian regression techniques. We then construct a robust model discovery algorithm by using SBL, which we showcase on various datasets. Concurrently, the multitask approach allows the use of probabilistic approximators, and we show a proof of concept using normalizing flows to directly learn a density model from single particle data. Our work expands PINNs to various types of neural network architectures, and connects neural network-based surrogates to the rich field of Bayesian parameter inference.
Lasso (l1) and Ridge (l2) Regularization Techniques
What is the need for Ridge and Lasso Regression? When we create our linear model with the best-fitted line and come on testing phase then because of increased variation, our model is over-fitted, So It will not work well in the future also not provide appropriate accuracy. Therefore, to reduce overfitting, ridge and lasso regression came into the picture. Both are powerful techniques with a slight difference used for creating such models that are efficient and computationally fit to reduce over-fitting. It is a process to classify the classes and provide additional information to prevent over-fitting.
Self-Supervised Learning with Data Augmentations Provably Isolates Content from Style
von Kügelgen, Julius, Sharma, Yash, Gresele, Luigi, Brendel, Wieland, Schölkopf, Bernhard, Besserve, Michel, Locatello, Francesco
Self-supervised representation learning has shown remarkable success in a number of domains. A common practice is to perform data augmentation via hand-crafted transformations intended to leave the semantics of the data invariant. We seek to understand the empirical success of this approach from a theoretical perspective. We formulate the augmentation process as a latent variable model by postulating a partition of the latent representation into a content component, which is assumed invariant to augmentation, and a style component, which is allowed to change. Unlike prior work on disentanglement and independent component analysis, we allow for both nontrivial statistical and causal dependencies in the latent space. We study the identifiability of the latent representation based on pairs of views of the observations and prove sufficient conditions that allow us to identify the invariant content partition up to an invertible mapping in both generative and discriminative settings. We find numerical simulations with dependent latent variables are consistent with our theory. Lastly, we introduce Causal3DIdent, a dataset of high-dimensional, visually complex images with rich causal dependencies, which we use to study the effect of data augmentations performed in practice.
Inference for Network Regression Models with Community Structure
Pan, Mengjie, McCormick, Tyler H., Fosdick, Bailey K.
Network regression models, where the outcome comprises the valued edge in a network and the predictors are actor or dyad-level covariates, are used extensively in the social and biological sciences. Valid inference relies on accurately modeling the residual dependencies among the relations. Frequently homogeneity assumptions are placed on the errors which are commonly incorrect and ignore critical, natural clustering of the actors. In this work, we present a novel regression modeling framework that models the errors as resulting from a community-based dependence structure and exploits the subsequent exchangeability properties of the error distribution to obtain parsimonious standard errors for regression parameters.
Diving Deep into Linear Regression and Polynomial Regression
I'm almost certain that now you might want to learn about these branches in greater detail. Worry not, I'll surely open the gates to these subsets in the posts to come. If you missed my post, you can find it at the following link: Branches of Artificial Intelligence. Previously, we discussed Machine Learning. We also discussed its subsets -- Supervised Learning, Unsupervised Learning, and Reinforcement Learning.