Regression
PROMISE: Preconditioned Stochastic Optimization Methods by Incorporating Scalable Curvature Estimates
Frangella, Zachary, Rathore, Pratik, Zhao, Shipu, Udell, Madeleine
This paper introduces PROMISE (Preconditioned Stochastic Optimization Methods by Incorporating Scalable Curvature Estimates), a suite of sketching-based preconditioned stochastic gradient algorithms for solving large-scale convex optimization problems arising in machine learning. PROMISE includes preconditioned versions of SVRG, SAGA, and Katyusha; each algorithm comes with a strong theoretical analysis and effective default hyperparameter values. In contrast, traditional stochastic gradient methods require careful hyperparameter tuning to succeed, and degrade in the presence of ill-conditioning, a ubiquitous phenomenon in machine learning. Empirically, we verify the superiority of the proposed algorithms by showing that, using default hyperparameter values, they outperform or match popular tuned stochastic gradient optimizers on a test bed of 51 ridge and logistic regression problems assembled from benchmark machine learning repositories. On the theoretical side, this paper introduces the notion of quadratic regularity in order to establish linear convergence of all proposed methods even when the preconditioner is updated infrequently. The speed of linear convergence is determined by the quadratic regularity ratio, which often provides a tighter bound on the convergence rate compared to the condition number, both in theory and in practice, and explains the fast global linear convergence of the proposed methods.
Convergence analysis of online algorithms for vector-valued kernel regression
Griebel, Michael, Oswald, Peter
We consider the problem of approximating the regression function from noisy vector-valued data by an online learning algorithm using an appropriate reproducing kernel Hilbert space (RKHS) as prior. In an online algorithm, i.i.d. samples become available one by one by a random process and are successively processed to build approximations to the regression function. We are interested in the asymptotic performance of such online approximation algorithms and show that the expected squared error in the RKHS norm can be bounded by $C^2 (m+1)^{-s/(2+s)}$, where $m$ is the current number of processed data, the parameter $0
GP-BART: a novel Bayesian additive regression trees approach using Gaussian processes
Maia, Mateus, Murphy, Keefe, Parnell, Andrew C.
The Bayesian additive regression trees (BART) model is an ensemble method extensively and successfully used in regression tasks due to its consistently strong predictive performance and its ability to quantify uncertainty. BART combines "weak" tree models through a set of shrinkage priors, whereby each tree explains a small portion of the variability in the data. However, the lack of smoothness and the absence of an explicit covariance structure over the observations in standard BART can yield poor performance in cases where such assumptions would be necessary. The Gaussian processes Bayesian additive regression trees (GP-BART) model is an extension of BART which addresses this limitation by assuming Gaussian process (GP) priors for the predictions of each terminal node among all trees. The model's effectiveness is demonstrated through applications to simulated and real-world data, surpassing the performance of traditional modeling approaches in various scenarios.
Remote Inference of Cognitive Scores in ALS Patients Using a Picture Description
Agurto, Carla, Cecchi, Guillermo, Wen, Bo, Fraenkel, Ernest, Berry, James, Navar, Indu, Norel, Raquel
Amyotrophic lateral sclerosis is a fatal disease that not only affects movement, speech, and breath but also cognition. Recent studies have focused on the use of language analysis techniques to detect ALS and infer scales for monitoring functional progression. In this paper, we focused on another important aspect, cognitive impairment, which affects 35-50% of the ALS population. In an effort to reach the ALS population, which frequently exhibits mobility limitations, we implemented the digital version of the Edinburgh Cognitive and Behavioral ALS Screen (ECAS) test for the first time. This test which is designed to measure cognitive impairment was remotely performed by 56 participants from the EverythingALS Speech Study. As part of the study, participants (ALS and non-ALS) were asked to describe weekly one picture from a pool of many pictures with complex scenes displayed on their computer at home. We analyze the descriptions performed within +/- 60 days from the day the ECAS test was administered and extract different types of linguistic and acoustic features. We input those features into linear regression models to infer 5 ECAS sub-scores and the total score. Speech samples from the picture description are reliable enough to predict the ECAS subs-scores, achieving statistically significant Spearman correlation values between 0.32 and 0.51 for the model's performance using 10-fold cross-validation.
Optimizing Offensive Gameplan in the National Basketball Association with Machine Learning
Throughout the analytical revolution that has occurred in the NBA, the development of specific metrics and formulas has given teams, coaches, and players a new way to see the game. However - the question arises - how can we verify any metrics? One method would simply be eyeball approximation (trying out many different gameplans) and/or trial and error - an estimation-based and costly approach. Another approach is to try to model already existing metrics with a unique set of features using machine learning techniques. The key to this approach is that with these features that are selected, we can try to gauge the effectiveness of these features combined, rather than using individual analysis in simple metric evaluation. If we have an accurate model, it can particularly help us determine the specifics of gameplan execution. In this paper, the statistic ORTG (Offensive Rating, developed by Dean Oliver) was found to have a correlation with different NBA playtypes using both a linear regression model and a neural network regression model, although ultimately, a neural network worked slightly better than linear regression. Using the accuracy of the models as a justification, the next step was to optimize the output of the model with test examples, which would demonstrate the combination of features to best achieve a highly functioning offense.
A Consistent and Scalable Algorithm for Best Subset Selection in Single Index Models
Tang, Borui, Zhu, Jin, Zhu, Junxian, Wang, Xueqin, Zhang, Heping
Analysis of high-dimensional data has led to increased interest in both single index models (SIMs) and best subset selection. SIMs provide an interpretable and flexible modeling framework for high-dimensional data, while best subset selection aims to find a sparse model from a large set of predictors. However, best subset selection in high-dimensional models is known to be computationally intractable. Existing methods tend to relax the selection, but do not yield the best subset solution. In this paper, we directly tackle the intractability by proposing the first provably scalable algorithm for best subset selection in high-dimensional SIMs. Our algorithmic solution enjoys the subset selection consistency and has the oracle property with a high probability. The algorithm comprises a generalized information criterion to determine the support size of the regression coefficients, eliminating the model selection tuning. Moreover, our method does not assume an error distribution or a specific link function and hence is flexible to apply. Extensive simulation results demonstrate that our method is not only computationally efficient but also able to exactly recover the best subset in various settings (e.g., linear regression, Poisson regression, heteroscedastic models).
Epistemic Modeling Uncertainty of Rapid Neural Network Ensembles for Adaptive Learning
Beachy, Atticus, Bae, Harok, Camberos, Jose, Grandhi, Ramana
Emulator embedded neural networks, which are a type of physics informed neural network, leverage multi-fidelity data sources for efficient design exploration of aerospace engineering systems. Multiple realizations of the neural network models are trained with different random initializations. The ensemble of model realizations is used to assess epistemic modeling uncertainty caused due to lack of training samples. This uncertainty estimation is crucial information for successful goal-oriented adaptive learning in an aerospace system design exploration. However, the costs of training the ensemble models often become prohibitive and pose a computational challenge, especially when the models are not trained in parallel during adaptive learning. In this work, a new type of emulator embedded neural network is presented using the rapid neural network paradigm. Unlike the conventional neural network training that optimizes the weights and biases of all the network layers by using gradient-based backpropagation, rapid neural network training adjusts only the last layer connection weights by applying a linear regression technique. It is found that the proposed emulator embedded neural network trains near-instantaneously, typically without loss of prediction accuracy. The proposed method is demonstrated on multiple analytical examples, as well as an aerospace flight parameter study of a generic hypersonic vehicle.
AKEM: Aligning Knowledge Base to Queries with Ensemble Model for Entity Recognition and Linking
Lu, Di, Liang, Zhongping, Yuan, Caixia, Wang, Xiaojie
This paper presents a novel approach to address the Entity Recognition and Linking Challenge at NLPCC 2015. The task involves extracting named entity mentions from short search queries and linking them to entities within a reference Chinese knowledge base. To tackle this problem, we first expand the existing knowledge base and utilize external knowledge to identify candidate entities, thereby improving the recall rate. Next, we extract features from the candidate entities and utilize Support Vector Regression and Multiple Additive Regression Tree as scoring functions to filter the results. Additionally, we apply rules to further refine the results and enhance precision. Our method is computationally efficient and achieves an F1 score of 0.535.
Temporal-spatial model via Trend Filtering
Padilla, Carlos Misael Madrid, Padilla, Oscar Hernan Madrid, Wang, Daren
This research focuses on the estimation of a non-parametric regression function designed for data with simultaneous time and space dependencies. In such a context, we study the Trend Filtering, a nonparametric estimator introduced by \cite{mammen1997locally} and \cite{rudin1992nonlinear}. For univariate settings, the signals we consider are assumed to have a kth weak derivative with bounded total variation, allowing for a general degree of smoothness. In the multivariate scenario, we study a $K$-Nearest Neighbor fused lasso estimator as in \cite{padilla2018adaptive}, employing an ADMM algorithm, suitable for signals with bounded variation that adhere to a piecewise Lipschitz continuity criterion. By aligning with lower bounds, the minimax optimality of our estimators is validated. A unique phase transition phenomenon, previously uncharted in Trend Filtering studies, emerges through our analysis. Both Simulation studies and real data applications underscore the superior performance of our method when compared with established techniques in the existing literature.
Liu-type Shrinkage Estimators for Mixture of Poisson Regressions with Experts: A Heart Disease Study
Ghanem, Elsayed, Yoosefi, Moein, Hatefi, Armin
Count data play a critical role in medical research, such as heart disease. The Poisson regression model is a common technique for evaluating the impact of a set of covariates on the count responses. The mixture of Poisson regression models with experts is a practical tool to exploit the covariates, not only to handle the heterogeneity in the Poisson regressions but also to learn the mixing structure of the population. Multicollinearity is one of the most common challenges with regression models, leading to ill-conditioned design matrices of Poisson regression components and expert classes. The maximum likelihood method produces unreliable and misleading estimates for the effects of the covariates in multicollinearity. In this research, we develop Ridge and Liu-type methods as two shrinkage approaches to cope with the ill-conditioned design matrices of the mixture of Poisson regression models with experts. Through various numerical studies, we demonstrate that the shrinkage methods offer more reliable estimates for the coefficients of the mixture model in multicollinearity while maintaining the classification performance of the ML method. The shrinkage methods are finally applied to a heart study to analyze the heart disease rate stages.