Goto

Collaborating Authors

 snr 1


Variable Selection Using Relative Importance Rankings

arXiv.org Machine Learning

Although conceptually related, variable selection and relative importance (RI) analysis have been treated quite differently in the literature. While RI is typically used for post-hoc model explanation, this paper explores its potential for variable ranking and filter-based selection before model creation. Specifically, we anticipate strong performance from the RI measures because they incorporate both direct and combined effects of predictors, addressing a key limitation of marginal correlation that ignores dependencies among predictors. We implement and evaluate the RI-based variable selection methods using general dominance (GD), comprehensive relative importance (CRI), and a newly proposed, computationally efficient variant termed CRI.Z. We first demonstrate how the RI measures more accurately rank the variables than the marginal correlation, especially when there are suppressed or weak predictors. We then show that predictive models built on these rankings are highly competitive, often outperforming state-of-the-art methods such as the lasso and relaxed lasso. The proposed RI-based methods are particularly effective in challenging cases involving clusters of highly correlated predictors, a setting known to cause failures in many benchmark methods. Although lasso methods have dominated the recent literature on variable selection, our study reveals that the RI-based method is a powerful and competitive alternative. We believe these underutilized tools deserve greater attention in statistics and machine learning communities. The code is available at: https://github.com/tien-endotchang/RI-variable-selection.


Interference-Aware Super-Constellation Design for NOMA

arXiv.org Artificial Intelligence

Non-orthogonal multiple access (NOMA) has gained significant attention as a potential next-generation multiple access technique. However, its implementation with finite-alphabet inputs faces challenges. Particularly, due to inter-user interference, superimposed constellations may have overlapping symbols leading to high bit error rates when successive interference cancellation (SIC) is applied. To tackle the issue, this paper employs autoencoders to design interference-aware super-constellations. Unlike conventional methods where superimposed constellation may have overlapping symbols, the proposed autoencoder-based NOMA (AE-NOMA) is trained to design super-constellations with distinguishable symbols at receivers, regardless of channel gains. The proposed architecture removes the need for SIC, allowing maximum likelihood-based approaches to be used instead. The paper presents the conceptual architecture, loss functions, and training strategies for AE-NOMA. Various test results are provided to demonstrate the effectiveness of interference-aware constellations in improving the bit error rate, indicating the adaptability of AE-NOMA to different channel scenarios and its promising potential for implementing NOMA systems


COMSPLIT: A Communication-Aware Split Learning Design for Heterogeneous IoT Platforms

arXiv.org Artificial Intelligence

The significance of distributed learning and inference algorithms in Internet of Things (IoT) network is growing since they flexibly distribute computation load between IoT devices and the infrastructure, enhance data privacy, and minimize latency. However, a notable challenge stems from the influence of communication channel conditions on their performance. In this work, we introduce COMSPLIT: a novel communication-aware design for split learning (SL) and inference paradigm tailored to processing time series data in IoT networks. COMSPLIT provides a versatile framework for deploying adaptable SL in IoT networks affected by diverse channel conditions. In conjunction with the integration of an early-exit strategy, and addressing IoT scenarios containing devices with heterogeneous computational capabilities, COMSPLIT represents a comprehensive design solution for communication-aware SL in IoT networks. Numerical results show superior performance of COMSPLIT compared to vanilla SL approaches (that assume ideal communication channel), demonstrating its ability to offer both design simplicity and adaptability to different channel conditions.


Multi-User Semantic Fusion for Semantic Communications over Degraded Broadcast Channels

arXiv.org Artificial Intelligence

Degraded broadcast channels (DBC) are a typical multiuser communication scenario, Semantic communications over DBC still lack in-depth research. In this paper, we design a semantic communications approach based on multi-user semantic fusion for wireless image transmission over DBC. In the proposed method, the transmitter extracts semantic features for two users separately. It then effectively fuses these semantic features for broadcasting by leveraging semantic similarity. Unlike traditional allocation of time, power, or bandwidth, the semantic fusion scheme can dynamically control the weight of the semantic features of the two users to balance the performance between the two users. Considering the different channel state information (CSI) of both users over DBC, a DBC-Aware method is developed that embeds the CSI of both users into the joint source-channel coding encoder and fusion module to adapt to the channel. Experimental results show that the proposed system outperforms the traditional broadcasting schemes.


Improved theoretical guarantee for rank aggregation via spectral method

arXiv.org Machine Learning

Given pairwise comparisons between multiple items, how to rank them so that the ranking matches the observations? This problem, known as rank aggregation, has found many applications in sports, recommendation systems, and other web applications. As it is generally NP-hard to find a global ranking that minimizes the mismatch (known as the Kemeny optimization), we focus on the Erd\"os-R\'enyi outliers (ERO) model for this ranking problem. Here, each pairwise comparison is a corrupted copy of the true score difference. We investigate spectral ranking algorithms that are based on unnormalized and normalized data matrices. The key is to understand their performance in recovering the underlying scores of each item from the observed data. This reduces to deriving an entry-wise perturbation error bound between the top eigenvectors of the unnormalized/normalized data matrix and its population counterpart. By using the leave-one-out technique, we provide a sharper $\ell_{\infty}$-norm perturbation bound of the eigenvectors and also derive an error bound on the maximum displacement for each item, with only $\Omega(n\log n)$ samples. Our theoretical analysis improves upon the state-of-the-art results in terms of sample complexity, and our numerical experiments confirm these theoretical findings.


A Weighted Autoencoder-Based Approach to Downlink NOMA Constellation Design

arXiv.org Artificial Intelligence

End-to-end design of communication systems using deep autoencoders (AEs) is gaining attention due to its flexibility and excellent performance. Besides single-user transmission, AE-based design is recently explored in multi-user setup, e.g., for designing constellations for non-orthogonal multiple access (NOMA). In this paper, we further advance the design of AE-based downlink NOMA by introducing weighted loss function in the AE training. By changing the weight coefficients, one can flexibly tune the constellation design to balance error probability of different users, without relying on explicit information about their channel quality. Combined with the SICNet decoder, we demonstrate a significant improvement in achievable levels and flexible control of error probability of different users using the proposed weighted AE-based framework.


On the Benefits of Early Fusion in Multimodal Representation Learning

arXiv.org Artificial Intelligence

Intelligently reasoning about the world often requires integrating data from multiple modalities, as any individual modality may contain unreliable or incomplete information. On the other hand, the brain performs multimodal processing almost immediately. This divide between conventional multimodal learning and neuroscience suggests that a detailed study of early multimodal fusion could improve artificial multimodal representations. To facilitate the study of early multimodal fusion, we create a convolutional LSTM network architecture that simultaneously processes both audio and visual inputs, and allows us to select the layer at which audio and visual information combines. Our results demonstrate that immediate fusion of audio and visual inputs in the initial C-LSTM layer results in higher performing networks that are more robust to the addition of white noise in both audio and visual inputs. In many cases, an individual modality does not contain sufficient information to classify the scene.


Surprises in High-Dimensional Ridgeless Least Squares Interpolation

arXiv.org Machine Learning

Modern deep learning models involve a huge number of parameters. In nearly all applications of these models, current practice suggests that we should design the network to be sufficiently complex so that the model (as trained, typically, by gradient descent) interpolates the data, i.e., achieves zero training error. Indeed, in a thought-provoking experiment, Zhang et al. (2016) showed that state-of-the-art deep neural network architectures can be trained to interpolate the data even when the actual labels are replaced by entirely random ones. Despite their enormous complexity, deep neural networks are frequently seen to generalize well, in meaningful practical problems. At first sight, this seems to defy conventional statistical wisdom: interpolation (vanishing training error) is usually taken to be a proxy for overfitting or poor generalization (large gap between training and test error). In an insightful series of papers, Belkin et al. (2018b,c,a) pointed out that these concepts are, in general, distinct, and interpolation does not contradict generalization. For example, kernel ridge regression is a relatively well-understood setting in which interpolation can coexist with good generalization (Liang and Rakhlin, 2018). In this paper, we examine the prediction risk of minimum l norm or "ridgeless" least squares regression, under


High-dimensional regression in practice: an empirical study of finite-sample prediction, variable selection and ranking

arXiv.org Machine Learning

Penalized likelihood methods are widely used for high-dimensional regression. Although many methods have been proposed and the associated theory is now well-developed, the relative efficacy of different methods in finite-sample settings, as encountered in practice, remains incompletely understood. There is therefore a need for empirical investigations in this area that can offer practical insight and guidance to users of these methods. In this paper we present a large-scale comparison of penalized regression methods. We distinguish between three related goals: prediction, variable selection and variable ranking. Our results span more than 1,800 data-generating scenarios, allowing us to systematically consider the influence of various factors (sample size, dimensionality, sparsity, signal strength and multicollinearity). We consider several widely-used methods (Lasso, Elastic Net, Ridge Regression, SCAD, the Dantzig Selector as well as Stability Selection). We find considerable variation in performance between methods, with results dependent on details of the data-generating scenario and the specific goal. Our results support a `no panacea' view, with no unambiguous winner across all scenarios, even in this restricted setting where all data align well with the assumptions underlying the methods. Lasso is well-behaved, performing competitively in many scenarios, while SCAD is highly variable. Substantial benefits from a Ridge-penalty are only seen in the most challenging scenarios with strong multi-collinearity. The results are supported by semi-synthetic analyzes using gene expression data from cancer samples. Our empirical results complement existing theory and provide a resource to compare methods across a range of scenarios and metrics.


Sparse Variational Bayesian Approximations for Nonlinear Inverse Problems: applications in nonlinear elastography

arXiv.org Machine Learning

This paper presents an efficient Bayesian framework for solving nonlinear, high-dimensional model calibration problems. It is based on a Variational Bayesian formulation that aims at approximating the exact posterior by means of solving an optimization problem over an appropriately selected family of distributions. The goal is two-fold. Firstly, to find lower-dimensional representations of the unknown parameter vector that capture as much as possible of the associated posterior density, and secondly to enable the computation of the approximate posterior density with as few forward calls as possible. We discuss how these objectives can be achieved by using a fully Bayesian argumentation and employing the marginal likelihood or evidence as the ultimate model validation metric for any proposed dimensionality reduction. We demonstrate the performance of the proposed methodology for problems in nonlinear elastography where the identification of the mechanical properties of biological materials can inform non-invasive, medical diagnosis. An Importance Sampling scheme is finally employed in order to validate the results and assess the efficacy of the approximations provided.