Regression
Learning Decisions Offline from Censored Observations with {\epsilon}-insensitive Operational Costs
Chen, Minxia, Fu, Ke, Huang, Teng, Bai, Miao
Many important managerial decisions are made based on censored observations. Making decisions without adequately handling the censoring leads to inferior outcomes. We investigate the data-driven decision-making problem with an offline dataset containing the feature data and the censored historical data of the variable of interest without the censoring indicators. Without assuming the underlying distribution, we design and leverage {\epsilon}-insensitive operational costs to deal with the unobserved censoring in an offline data-driven fashion. We demonstrate the customization of the {\epsilon}-insensitive operational costs for a newsvendor problem and use such costs to train two representative ML models, including linear regression (LR) models and neural networks (NNs). We derive tight generalization bounds for the custom LR model without regularization (LR-{\epsilon}NVC) and with regularization (LR-{\epsilon}NVC-R), and a high-probability generalization bound for the custom NN (NN-{\epsilon}NVC) trained by stochastic gradient descent. The theoretical results reveal the stability and learnability of LR-{\epsilon}NVC, LR-{\epsilon}NVC-R and NN-{\epsilon}NVC. We conduct extensive numerical experiments to compare LR-{\epsilon}NVC-R and NN-{\epsilon}NVC with two existing approaches, estimate-as-solution (EAS) and integrated estimation and optimization (IEO). The results show that LR-{\epsilon}NVC-R and NN-{\epsilon}NVC outperform both EAS and IEO, with maximum cost savings up to 14.40% and 12.21% compared to the lowest cost generated by the two existing approaches. In addition, LR-{\epsilon}NVC-R's and NN-{\epsilon}NVC's order quantities are statistically significantly closer to the optimal solutions should the underlying distribution be known.
Theoretical and Practical Progress in Hyperspectral Pixel Unmixing with Large Spectral Libraries from a Sparse Perspective
Preston, Jade, Basener, William
Hyperspectral unmixing is the process of determining the presence of individual materials and their respective abundances from an observed pixel spectrum. Unmixing is a fundamental process in hyperspectral image analysis, and is growing in importance as increasingly large spectral libraries are created and used. Unmixing is typically done with ordinary least squares (OLS) regression. However, unmixing with large spectral libraries where the materials present in a pixel are not a priori known, solving for the coefficients in OLS requires inverting a non-invertible matrix from a large spectral library. A number of regression methods are available that can produce a numerical solution using regularization, but with considerably varied effectiveness. Also, simple methods that are unpopular in the statistics literature (i.e. step-wise regression) are used with some level of effectiveness in hyperspectral analysis. In this paper, we provide a thorough performance evaluation of the methods considered, evaluating methods based on how often they select the correct materials in the models. Investigated methods include ordinary least squares regression, non-negative least squares regression, ridge regression, lasso regression, step-wise regression and Bayesian model averaging. We evaluated these unmixing approaches using multiple criteria: incorporation of non-negative abundances, model size, accurate mineral detection and root mean squared error (RMSE). We provide a taxonomy of the regression methods, showing that most methods can be understood as Bayesian methods with specific priors. We conclude that methods that can be derived with priors that correspond to the phenomenology of hyperspectral imagery outperform those with priors that are optimal for prediction performance under the assumptions of ordinary least squares linear regression.
Optimizing Emotion Recognition with Wearable Sensor Data: Unveiling Patterns in Body Movements and Heart Rate through Random Forest Hyperparameter Tuning
Nur, Zikri Kholifah, Wijaya, Rifki, Wulandari, Gia Septiana
This research delves into the utilization of smartwatch sensor data and heart rate monitoring to discern individual emotions based on body movement and heart rate. Emotions play a pivotal role in human life, influencing mental well-being, quality of life, and even physical and physiological responses. The data were sourced from prior research by Juan C. Quiroz, PhD. The study enlisted 50 participants who donned smartwatches and heart rate monitors while completing a 250-meter walk. Emotions were induced through both audio-visual and audio stimuli, with participants' emotional states evaluated using the PANAS questionnaire. The study scrutinized three scenarios: viewing a movie before walking, listening to music before walking, and listening to music while walking. Personal baselines were established using DummyClassifier with the 'most_frequent' strategy from the sklearn library, and various models, including Logistic Regression and Random Forest, were employed to gauge the impacts of these activities. Notably, a novel approach was undertaken by incorporating hyperparameter tuning to the Random Forest model using RandomizedSearchCV. The outcomes showcased substantial enhancements with hyperparameter tuning in the Random Forest model, yielding mean accuracies of 86.63% for happy vs. sad and 76.33% for happy vs. neutral vs. sad.
Alpha-Trimming: Locally Adaptive Tree Pruning for Random Forests
Surjanovic, Nikola, Henrey, Andrew, Loughin, Thomas M.
We demonstrate that adaptively controlling the size of individual regression trees in a random forest can improve predictive performance, contrary to the conventional wisdom that trees should be fully grown. A fast pruning algorithm, alpha-trimming, is proposed as an effective approach to pruning trees within a random forest, where more aggressive pruning is performed in regions with a low signal-to-noise ratio. The amount of overall pruning is controlled by adjusting the weight on an information criterion penalty as a tuning parameter, with the standard random forest being a special case of our alpha-trimmed random forest. A remarkable feature of alpha-trimming is that its tuning parameter can be adjusted without refitting the trees in the random forest once the trees have been fully grown once. In a benchmark suite of 46 example data sets, mean squared prediction error is often substantially lowered by using our pruning algorithm and is never substantially increased compared to a random forest with fully-grown trees at default parameter settings.
Federated Smoothing Proximal Gradient for Quantile Regression with Non-Convex Penalties
Mirzaeifard, Reza, Ghaderyan, Diyako, Werner, Stefan
Distributed sensors in the internet-of-things (IoT) generate vast amounts of sparse data. Analyzing this high-dimensional data and identifying relevant predictors pose substantial challenges, especially when data is preferred to remain on the device where it was collected for reasons such as data integrity, communication bandwidth, and privacy. This paper introduces a federated quantile regression algorithm to address these challenges. Quantile regression provides a more comprehensive view of the relationship between variables than mean regression models. However, traditional approaches face difficulties when dealing with nonconvex sparse penalties and the inherent non-smoothness of the loss function. For this purpose, we propose a federated smoothing proximal gradient (FSPG) algorithm that integrates a smoothing mechanism with the proximal gradient framework, thereby enhancing both precision and computational speed. This integration adeptly handles optimization over a network of devices, each holding local data samples, making it particularly effective in federated learning scenarios. The FSPG algorithm ensures steady progress and reliable convergence in each iteration by maintaining or reducing the value of the objective function. By leveraging nonconvex penalties, such as the minimax concave penalty (MCP) and smoothly clipped absolute deviation (SCAD), the proposed method can identify and preserve key predictors within sparse models. Comprehensive simulations validate the robust theoretical foundations of the proposed algorithm and demonstrate improved estimation precision and reliable convergence.
Method-of-Moments Inference for GLMs and Doubly Robust Functionals under Proportional Asymptotics
Chen, Xingyu, Liu, Lin, Mukherjee, Rajarshi
In this paper, we consider the estimation of regression coefficients and signal-to-noise (SNR) ratio in high-dimensional Generalized Linear Models (GLMs), and explore their implications in inferring popular estimands such as average treatment effects in high-dimensional observational studies. Under the ``proportional asymptotic'' regime and Gaussian covariates with known (population) covariance $\Sigma$, we derive Consistent and Asymptotically Normal (CAN) estimators of our targets of inference through a Method-of-Moments type of estimators that bypasses estimation of high dimensional nuisance functions and hyperparameter tuning altogether. Additionally, under non-Gaussian covariates, we demonstrate universality of our results under certain additional assumptions on the regression coefficients and $\Sigma$. We also demonstrate that knowing $\Sigma$ is not essential to our proposed methodology when the sample covariance matrix estimator is invertible. Finally, we complement our theoretical results with numerical experiments and comparisons with existing literature.
On the Convergence of a Federated Expectation-Maximization Algorithm
Tao, Zhixu, Chandak, Rajita, Kulkarni, Sanjeev
Data heterogeneity has been a long-standing bottleneck in studying the convergence rates of Federated Learning algorithms. In order to better understand the issue of data heterogeneity, we study the convergence rate of the Expectation-Maximization (EM) algorithm for the Federated Mixture of $K$ Linear Regressions model. We fully characterize the convergence rate of the EM algorithm under all regimes of $m/n$ where $m$ is the number of clients and $n$ is the number of data points per client. We show that with a signal-to-noise-ratio (SNR) of order $\Omega(\sqrt{K})$, the well-initialized EM algorithm converges within the minimax distance of the ground truth under each of the regimes. Interestingly, we identify that when $m$ grows exponentially in $n$, the EM algorithm only requires a constant number of iterations to converge. We perform experiments on synthetic datasets to illustrate our results. Surprisingly, the results show that rather than being a bottleneck, data heterogeneity can accelerate the convergence of federated learning algorithms.
Fever Detection with Infrared Thermography: Enhancing Accuracy through Machine Learning Techniques
Razmara, Parsa, Khezresmaeilzadeh, Tina, Jenkins, B. Keith
The COVID-19 pandemic has underscored the necessity for advanced diagnostic tools in global health systems. Infrared Thermography (IRT) has proven to be a crucial non-contact method for measuring body temperature, vital for identifying febrile conditions associated with infectious diseases like COVID-19. Traditional non-contact infrared thermometers (NCITs) often exhibit significant variability in readings. To address this, we integrated machine learning algorithms with IRT to enhance the accuracy and reliability of temperature measurements. Our study systematically evaluated various regression models using heuristic feature engineering techniques, focusing on features' physiological relevance and statistical significance. The Convolutional Neural Network (CNN) model, utilizing these techniques, achieved the lowest RMSE of 0.2223, demonstrating superior performance compared to results reported in previous literature. Among non-neural network models, the Binning method achieved the best performance with an RMSE of 0.2296. Our findings highlight the potential of combining advanced feature engineering with machine learning to improve diagnostic tools' effectiveness, with implications extending to other non-contact or remote sensing biomedical applications. This paper offers a comprehensive analysis of these methodologies, providing a foundation for future research in the field of non-invasive medical diagnostics.
Deep-change at AXOLOTL-24: Orchestrating WSD and WSI Models for Semantic Change Modeling
Kokosinskii, Denis, Kuklin, Mikhail, Arefyev, Nikolay
This paper describes our solution of the first subtask from the AXOLOTL-24 shared task on Semantic Change Modeling. The goal of this subtask is to distribute a given set of usages of a polysemous word from a newer time period between senses of this word from an older time period and clusters representing gained senses of this word. We propose and experiment with three new methods solving this task. Our methods achieve SOTA results according to both official metrics of the first substask. Additionally, we develop a model that can tell if a given word usage is not described by any of the provided sense definitions. This model serves as a component in one of our methods, but can potentially be useful on its own.
Better Locally Private Sparse Estimation Given Multiple Samples Per User
Ma, Yuheng, Jia, Ke, Yang, Hanfang
Previous studies yielded discouraging results for item-level locally differentially private linear regression with $s^*$-sparsity assumption, where the minimax rate for $nm$ samples is $\mathcal{O}(s^{*}d / nm\varepsilon^2)$. This can be challenging for high-dimensional data, where the dimension $d$ is extremely large. In this work, we investigate user-level locally differentially private sparse linear regression. We show that with $n$ users each contributing $m$ samples, the linear dependency of dimension $d$ can be eliminated, yielding an error upper bound of $\mathcal{O}(s^{*2} / nm\varepsilon^2)$. We propose a framework that first selects candidate variables and then conducts estimation in the narrowed low-dimensional space, which is extendable to general sparse estimation problems with tight error bounds. Experiments on both synthetic and real datasets demonstrate the superiority of the proposed methods. Both the theoretical and empirical results suggest that, with the same number of samples, locally private sparse estimation is better conducted when multiple samples per user are available.