Regression
Revisiting Quantum Algorithms for Linear Regressions: Quadratic Speedups without Data-Dependent Parameters
Song, Zhao, Yin, Junze, Zhang, Ruizhe
Linear regression is one of the most fundamental linear algebra problems. Given a dense matrix $A \in \mathbb{R}^{n \times d}$ and a vector $b$, the goal is to find $x'$ such that $ \| Ax' - b \|_2^2 \leq (1+\epsilon) \min_{x} \| A x - b \|_2^2 $. The best classical algorithm takes $O(nd) + \mathrm{poly}(d/\epsilon)$ time [Clarkson and Woodruff STOC 2013, Nelson and Nguyen FOCS 2013]. On the other hand, quantum linear regression algorithms can achieve exponential quantum speedups, as shown in [Wang Phys. Rev. A 96, 012335, Kerenidis and Prakash ITCS 2017, Chakraborty, Gily{\'e}n and Jeffery ICALP 2019]. However, the running times of these algorithms depend on some quantum linear algebra-related parameters, such as $\kappa(A)$, the condition number of $A$. In this work, we develop a quantum algorithm that runs in $\widetilde{O}(\epsilon^{-1}\sqrt{n}d^{1.5}) + \mathrm{poly}(d/\epsilon)$ time. It provides a quadratic quantum speedup in $n$ over the classical lower bound without any dependence on data-dependent parameters. In addition, we also show our result can be generalized to multiple regression and ridge linear regression.
Unveiling The Factors of Aesthetic Preferences with Explainable AI
Soydaner, Derya, Wagemans, Johan
The allure of aesthetic appeal in images captivates our senses, yet the underlying intricacies of aesthetic preferences remain elusive. In this study, we pioneer a novel perspective by utilizing machine learning models that focus on aesthetic attributes known to influence preferences. Through a data mining approach, our models process these attributes as inputs to predict the aesthetic scores of images. Moreover, to delve deeper and obtain interpretable explanations regarding the factors driving aesthetic preferences, we utilize the popular Explainable AI (XAI) technique known as SHapley Additive exPlanations (SHAP). Our methodology involves employing various machine learning models, including Random Forest, XGBoost, Support Vector Regression, and Multilayer Perceptron, to compare their performances in accurately predicting aesthetic scores, and consistently observing results in conjunction with SHAP. We conduct experiments on three image aesthetic benchmarks, providing insights into the roles of attributes and their interactions. Ultimately, our study aims to shed light on the complex nature of aesthetic preferences in images through machine learning and provides a deeper understanding of the attributes that influence aesthetic judgements.
A latent linear model for nonlinear coupled oscillators on graphs
Goyal, Agam, Wu, Zhaoxing, Yim, Richard P., Chen, Binhao, Xu, Zihong, Lyu, Hanbaek
A system of coupled oscillators on an arbitrary graph is locally driven by the tendency to mutual synchronization between nearby oscillators, but can and often exhibit nonlinear behavior on the whole graph. Understanding such nonlinear behavior has been a key challenge in predicting whether all oscillators in such a system will eventually synchronize. In this paper, we demonstrate that, surprisingly, such nonlinear behavior of coupled oscillators can be effectively linearized in certain latent dynamic spaces. The key insight is that there is a small number of `latent dynamics filters', each with a specific association with synchronizing and non-synchronizing dynamics on subgraphs so that any observed dynamics on subgraphs can be approximated by a suitable linear combination of such elementary dynamic patterns. Taking an ensemble of subgraph-level predictions provides an interpretable predictor for whether the system on the whole graph reaches global synchronization. We propose algorithms based on supervised matrix factorization to learn such latent dynamics filters. We demonstrate that our method performs competitively in synchronization prediction tasks against baselines and black-box classification algorithms, despite its simple and interpretable architecture.
Sharing pattern submodels for prediction with missing values
Stempfle, Lena, Panahi, Ashkan, Johansson, Fredrik D.
Missing values are unavoidable in many applications of machine learning and present challenges both during training and at test time. When variables are missing in recurring patterns, fitting separate pattern submodels have been proposed as a solution. However, fitting models independently does not make efficient use of all available data. Conversely, fitting a single shared model to the full data set relies on imputation which often leads to biased results when missingness depends on unobserved factors. We propose an alternative approach, called sharing pattern submodels, which i) makes predictions that are robust to missing values at test time, ii) maintains or improves the predictive power of pattern submodels, and iii) has a short description, enabling improved interpretability. Parameter sharing is enforced through sparsity-inducing regularization which we prove leads to consistent estimation. Finally, we give conditions for when a sharing model is optimal, even when both missingness and the target outcome depend on unobserved variables. Classification and regression experiments on synthetic and real-world data sets demonstrate that our models achieve a favorable tradeoff between pattern specialization and information sharing.
Machine Learning For An Explainable Cost Prediction of Medical Insurance
Orji, Ugochukwu, Ukwandu, Elochukwu
Predictive modeling in healthcare continues to be an active actuarial research topic as more insurance companies aim to maximize the potential of Machine Learning approaches to increase their productivity and efficiency. In this paper, the authors deployed three regression-based ensemble ML models that combine variations of decision trees through Extreme Gradient Boosting, Gradient-boosting Machine, and Random Forest) methods in predicting medical insurance costs. Explainable Artificial Intelligence methods SHapley Additive exPlanations and Individual Conditional Expectation plots were deployed to discover and explain the key determinant factors that influence medical insurance premium prices in the dataset. The dataset used comprised 986 records and is publicly available in the KAGGLE repository. The models were evaluated using four performance evaluation metrics, including R-squared, Mean Absolute Error, Root Mean Squared Error, and Mean Absolute Percentage Error. The results show that all models produced impressive outcomes; however, the XGBoost model achieved a better overall performance although it also expanded more computational resources, while the RF model recorded a lesser prediction error and consumed far fewer computing resources than the XGBoost model. Furthermore, we compared the outcome of both XAi methods in identifying the key determinant features that influenced the PremiumPrices for each model and whereas both XAi methods produced similar outcomes, we found that the ICE plots showed in more detail the interactions between each variable than the SHAP analysis which seemed to be more high-level. It is the aim of the authors that the contributions of this study will help policymakers, insurers, and potential medical insurance buyers in their decision-making process for selecting the right policies that meet their specific needs.
BrainZ-BP: A Non-invasive Cuff-less Blood Pressure Estimation Approach Leveraging Brain Bio-impedance and Electrocardiogram
Yang, Bufang, Liu, Le, Wu, Wenxuan, Zhou, Mengliang, Liu, Hongxing, Ning, Xinbao
Accurate and continuous blood pressure (BP) monitoring is essential to the early prevention of cardiovascular diseases. Non-invasive and cuff-less BP estimation algorithm has gained much attention in recent years. Previous studies have demonstrated that brain bio-impedance (BIOZ) is a promising technique for non-invasive intracranial pressure (ICP) monitoring. Clinically, treatment for patients with traumatic brain injuries (TBI) requires monitoring the ICP and BP of patients simultaneously. Estimating BP by brain BIOZ directly can reduce the number of sensors attached to the patients, thus improving their comfort. To address the issues, in this study, we explore the feasibility of leveraging brain BIOZ for BP estimation and propose a novel cuff-less BP estimation approach called BrainZ-BP. Two electrodes are placed on the forehead and occipital bone of the head in the anterior-posterior direction for brain BIOZ measurement. Various features including pulse transit time and morphological features of brain BIOZ are extracted and fed into four regression models for BP estimation. Results show that the mean absolute error, root mean square error, and correlation coefficient of random forest regression model are 2.17 mmHg, 3.91 mmHg, and 0.90 for systolic pressure estimation, and are 1.71 mmHg, 3.02 mmHg, and 0.89 for diastolic pressure estimation. The presented BrainZ-BP can be applied in the brain BIOZ-based ICP monitoring scenario to monitor BP simultaneously.
Risk Bounds of Accelerated SGD for Overparameterized Linear Regression
Li, Xuheng, Deng, Yihe, Wu, Jingfeng, Zhou, Dongruo, Gu, Quanquan
Accelerated stochastic gradient descent (ASGD) is a workhorse in deep learning and often achieves better generalization performance than SGD. However, existing optimization theory can only explain the faster convergence of ASGD, but cannot explain its better generalization. In this paper, we study the generalization of ASGD for overparameterized linear regression, which is possibly the simplest setting of learning with overparameterization. We establish an instance-dependent excess risk bound for ASGD within each eigen-subspace of the data covariance matrix. Our analysis shows that (i) ASGD outperforms SGD in the subspace of small eigenvalues, exhibiting a faster rate of exponential decay for bias error, while in the subspace of large eigenvalues, its bias error decays slower than SGD; and (ii) the variance error of ASGD is always larger than that of SGD. Our result suggests that ASGD can outperform SGD when the difference between the initialization and the true weight vector is mostly confined to the subspace of small eigenvalues. Additionally, when our analysis is specialized to linear regression in the strongly convex setting, it yields a tighter bound for bias error than the best-known result.
Gradient-based bilevel optimization for multi-penalty Ridge regression through matrix differential calculus
Maroni, Gabriele, Cannelli, Loris, Piga, Dario
Common regularization algorithms for linear regression, such as LASSO and Ridge regression, rely on a regularization hyperparameter that balances the tradeoff between minimizing the fitting error and the norm of the learned model coefficients. As this hyperparameter is scalar, it can be easily selected via random or grid search optimizing a cross-validation criterion. However, using a scalar hyperparameter limits the algorithm's flexibility and potential for better generalization. In this paper, we address the problem of linear regression with l2-regularization, where a different regularization hyperparameter is associated with each input variable. We optimize these hyperparameters using a gradient-based approach, wherein the gradient of a cross-validation criterion with respect to the regularization hyperparameters is computed analytically through matrix differential calculus. Additionally, we introduce two strategies tailored for sparse model learning problems aiming at reducing the risk of overfitting to the validation data. Numerical examples demonstrate that our multi-hyperparameter regularization approach outperforms LASSO, Ridge, and Elastic Net regression. Moreover, the analytical computation of the gradient proves to be more efficient in terms of computational time compared to automatic differentiation, especially when handling a large number of input variables. Application to the identification of over-parameterized Linear Parameter-Varying models is also presented.
Stability and L2-penalty in Model Averaging
Model averaging has received much attention in the past two decades, which integrates available information by averaging over potential models. Although various model averaging methods have been developed, there are few literatures on the theoretical properties of model averaging from the perspective of stability, and the majority of these methods constrain model weights to a simplex. The aim of this paper is to introduce stability from statistical learning theory into model averaging. Thus, we define the stability, asymptotic empirical risk minimizer, generalization, and consistency of model averaging and study the relationship among them. Our results indicate that stability can ensure that model averaging has good generalization performance and consistency under reasonable conditions, where consistency means model averaging estimator can asymptotically minimize the mean squared prediction error. We also propose a L2-penalty model averaging method without limiting model weights and prove that it has stability and consistency. In order to reduce the impact of tuning parameter selection, we use 10-fold cross-validation to select a candidate set of tuning parameters and perform a weighted average of the estimators of model weights based on estimation errors. The Monte Carlo simulation and an illustrative application demonstrate the usefulness of the proposed method.
Optimal Linear Signal: An Unsupervised Machine Learning Framework to Optimize PnL with Linear Signals
This study presents an unsupervised machine learning approach for optimizing Profit and Loss (PnL) in quantitative finance. Our algorithm, akin to an unsupervised variant of linear regression, maximizes the Sharpe Ratio of PnL generated from signals constructed linearly from exogenous variables. The methodology employs a linear relationship between exogenous variables and the trading signal, with the objective of maximizing the Sharpe Ratio through parameter optimization. Empirical application on an ETF representing U.S. Treasury bonds demonstrates the model's effectiveness, supported by regularization techniques to mitigate overfitting. The study concludes with potential avenues for further development, including generalized time steps and enhanced corrective terms.