Regression
Classification Metrics Walkthrough: Logistic Regression with Accuracy, Precision, Recall, and ROC - KDnuggets
Metrics are an important element of machine learning. In regard to classification tasks, there are different types of metrics that allow you to assess the performance of machine learning models. However, it can be difficult to choose the right one for your task at hand. In this article, I will be going through 4 common classification metrics: Accuracy, Precision, Recall, and ROC in relation to Logistic Regression. Logistic Regression is a form of Supervised Learning - when the algorithm learns on a labeled dataset and analyses the training data.
Smooth Calibration, Leaky Forecasts, Finite Recall, and Nash Dynamics
We propose to smooth out the calibration score, which measures how good a forecaster is, by combining nearby forecasts. While regular calibration can be guaranteed only by randomized forecasting procedures, we show that smooth calibration can be guaranteed by deterministic procedures. As a consequence, it does not matter if the forecasts are leaked, i.e., made known in advance: smooth calibration can nevertheless be guaranteed (while regular calibration cannot). Moreover, our procedure has finite recall, is stationary, and all forecasts lie on a finite grid. To construct the procedure, we deal also with the related setups of online linear regression and weak calibration. Finally, we show that smooth calibration yields uncoupled finite-memory dynamics in n-person games "smooth calibrated learning" in which the players play approximate Nash equilibria in almost all periods (by contrast, calibrated learning, which uses regular calibration, yields only that the time-averages of play are approximate correlated equilibria).
Reduced regression models and tests for linear hypotheses
On a SAS discussion forum, a statistical programmer asked about how to understand the statistics that are displayed when you use the TEST statement in PROC REG (or other SAS regression procedures) to test for linear relationships between regression coefficients. The documentation for the TEST statement in PROC REG explains the F test in terms of a matrix of linear constraints. However, the programmer wanted a simpler explanation. Fortunately, there is an easy way to explain the TEST statement from first principles. The explanation involves running two regression models.
FCT-GAN: Enhancing Table Synthesis via Fourier Transform
Zhao, Zilong, Birke, Robert, Chen, Lydia Y.
Synthetic tabular data emerges as an alternative for sharing knowledge while adhering to restrictive data access regulations, e.g., European General Data Protection Regulation (GDPR). Mainstream state-of-the-art tabular data synthesizers draw methodologies from Generative Adversarial Networks (GANs), which are composed of a generator and a discriminator. While convolution neural networks are shown to be a better architecture than fully connected networks for tabular data synthesizing, two key properties of tabular data are overlooked: (i) the global correlation across columns, and (ii) invariant synthesizing to column permutations of input data. To address the above problems, we propose a Fourier conditional tabular generative adversarial network (FCT-GAN). We introduce feature tokenization and Fourier networks to construct a transformer-style generator and discriminator, and capture both local and global dependencies across columns. The tokenizer captures local spatial features and transforms original data into tokens. Fourier networks transform tokens to frequency domains and element-wisely multiply a learnable filter. Extensive evaluation on benchmarks and real-world data shows that FCT-GAN can synthesize tabular data with high machine learning utility (up to 27.8% better than state-of-the-art baselines) and high statistical similarity to the original data (up to 26.5% better), while maintaining the global correlation across columns, especially on high dimensional dataset.
A Neural Mean Embedding Approach for Back-door and Front-door Adjustment
We consider the estimation of average and counterfactual treatment effects, under two settings: back-door adjustment and front-door adjustment. The goal in both cases is to recover the treatment effect without having an access to a hidden confounder. This objective is attained by first estimating the conditional mean of the desired outcome variable given relevant covariates (the "first stage" regression), and then taking the (conditional) expectation of this function as a "second stage" procedure. We propose to compute these conditional expectations directly using a regression function to the learned input features of the first stage, thus avoiding the need for sampling or density estimation. All functions and features (and in particular, the output features in the second stage) are neural networks learned adaptively from data, with the sole requirement that the final layer of the first stage should be linear. The proposed method is shown to converge to the true causal parameter, and outperforms the recent state-of-the-art methods on challenging causal benchmarks, including settings involving high-dimensional image data.
A Momentum Accelerated Adaptive Cubic Regularization Method for Nonconvex Optimization
The cubic regularization method (CR) and its adaptive version (ARC) are popular Newton-type methods in solving unconstrained non-convex optimization problems, due to its global convergence to local minima under mild conditions. The main aim of this paper is to develop a momentum-accelerated adaptive cubic regularization method (ARCm) to improve the convergent performance. With the proper choice of momentum step size, we show the global convergence of ARCm and the local convergence can also be guaranteed under the \KL property. Such global and local convergence can also be established when inexact solvers with low computational costs are employed in the iteration procedure. Numerical results for non-convex logistic regression and robust linear regression models are reported to demonstrate that the proposed ARCm significantly outperforms state-of-the-art cubic regularization methods (e.g., CR, momentum-based CR, ARC) and the trust region method. In particular, the number of iterations required by ARCm is less than 10\% to 50\% required by the most competitive method (ARC) in the experiments.
Predicting housing prices and analyzing real estate market in the Chicago suburbs using Machine Learning
The pricing of housing properties is determined by a variety of factors. However, post-pandemic markets have experienced volatility in the Chicago suburb area, which have affected house prices greatly. In this study, analysis was done on the Naperville/Bolingbrook real estate market to predict property prices based on these housing attributes through machine learning models, and to evaluate the effectiveness of such models in a volatile market space. Gathering data from Redfin, a real estate website, sales data from 2018 up until the summer season of 2022 were collected for research. By analyzing these sales in this range of time, we can also look at the state of the housing market and identify trends in price. For modeling the data, the models used were linear regression, support vector regression, decision tree regression, random forest regression, and XGBoost regression. To analyze results, comparison was made on the MAE, RMSE, and R-squared values for each model. It was found that the XGBoost model performs the best in predicting house prices despite the additional volatility sponsored by post-pandemic conditions. After modeling, Shapley Values (SHAP) were used to evaluate the weights of the variables in constructing models.
From Optimization Dynamics to Generalization Bounds via {\L}ojasiewicz Gradient Inequality
Liu, Fusheng, Yang, Haizhao, Hayou, Soufiane, Li, Qianxiao
Optimization and generalization are two essential aspects of statistical machine learning. In this paper, we propose a framework to connect optimization with generalization by analyzing the generalization error based on the optimization trajectory under the gradient flow algorithm. The key ingredient of this framework is the Uniform-LGI, a property that is generally satisfied when training machine learning models. Leveraging the Uniform-LGI, we first derive convergence rates for gradient flow algorithm, then we give generalization bounds for a large class of machine learning models. We further apply our framework to three distinct machine learning models: linear regression, kernel regression, and two-layer neural networks. Through our approach, we obtain generalization estimates that match or extend previous results.
Walk a Mile in Their Shoes: a New Fairness Criterion for Machine Learning
The old empathetic adage, ``Walk a mile in their shoes,'' asks that one imagine the difficulties others may face. This suggests a new ML counterfactual fairness criterion, based on a \textit{group} level: How would members of a nonprotected group fare if their group were subject to conditions in some protected group? Instead of asking what sentence would a particular Caucasian convict receive if he were Black, take that notion to entire groups; e.g. how would the average sentence for all White convicts change if they were Black, but with their same White characteristics, e.g. same number of prior convictions? We frame the problem and study it empirically, for different datasets. Our approach also is a solution to the problem of covariate correlation with sensitive attributes.
Neuro-symbolic Explainable Artificial Intelligence Twin for Zero-touch IoE in Wireless Network
Munir, Md. Shirajum, Kim, Ki Tae, Adhikary, Apurba, Saad, Walid, Shetty, Sachin, Park, Seong-Bae, Hong, Choong Seon
Explainable artificial intelligence (XAI) twin systems will be a fundamental enabler of zero-touch network and service management (ZSM) for sixth-generation (6G) wireless networks. A reliable XAI twin system for ZSM requires two composites: an extreme analytical ability for discretizing the physical behavior of the Internet of Everything (IoE) and rigorous methods for characterizing the reasoning of such behavior. In this paper, a novel neuro-symbolic explainable artificial intelligence twin framework is proposed to enable trustworthy ZSM for a wireless IoE. The physical space of the XAI twin executes a neural-network-driven multivariate regression to capture the time-dependent wireless IoE environment while determining unconscious decisions of IoE service aggregation. Subsequently, the virtual space of the XAI twin constructs a directed acyclic graph (DAG)-based Bayesian network that can infer a symbolic reasoning score over unconscious decisions through a first-order probabilistic language model. Furthermore, a Bayesian multi-arm bandits-based learning problem is proposed for reducing the gap between the expected explained score and the current obtained score of the proposed neuro-symbolic XAI twin. To address the challenges of extensible, modular, and stateless management functions in ZSM, the proposed neuro-symbolic XAI twin framework consists of two learning systems: 1) an implicit learner that acts as an unconscious learner in physical space, and 2) an explicit leaner that can exploit symbolic reasoning based on implicit learner decisions and prior evidence. Experimental results show that the proposed neuro-symbolic XAI twin can achieve around 96.26% accuracy while guaranteeing from 18% to 44% more trust score in terms of reasoning and closed-loop automation.