Goto

Collaborating Authors

 class separation




Steering Distortions to Preserve Classes and Neighbors in Supervised Dimensionality Reduction

Neural Information Processing Systems

Nonlinear dimensionality reduction of high-dimensional data is challenging as the low-dimensional embedding will necessarily contain distortions, and it can be hard to determine which distortions are the most important to avoid. When annotation of data into known relevant classes is available, it can be used to guide the embedding to avoid distortions that worsen class separation. The supervised mapping method introduced in the present paper, called ClassNeRV, proposes an original stress function that takes class annotation into account and evaluates embedding quality both in terms of false neighbors and missed neighbors. ClassNeRV shares the theoretical framework of a family of methods descended from Stochastic Neighbor Embedding (SNE). Our approach has a key advantage over previous ones: in the literature supervised methods often emphasize class separation at the price of distorting the data neighbors' structure; conversely, unsupervised methods provide better preservation of structure at the price of often mixing classes. Experiments show that ClassNeRV can preserve both neighbor structure and class separation, outperforming nine state of the art alternatives.




On the (In)Significance of Feature Selection in High-Dimensional Datasets

Neekhra, Bhavesh, Gupta, Debayan, Chakravarti, Partha Pratim

arXiv.org Machine Learning

Extensive research has been done on feature selection (FS) algorithms for high-dimensional datasets aiming to improve model performance, reduce computational cost and identify features of interest. We test the null hypothesis of using randomly selected features to compare against features selected by FS algorithms to validate the performance of the latter. Our results show that FS on high-dimensional datasets (in particular gene expression) in classification tasks is not useful. We find that (1) models trained on small subsets (0.02%-1% of all features) of randomly selected features almost always perform comparably to those trained on all features, and (2) a "typical"- sized random subset provides comparable or superior performance to that of top-k features selected in various published studies. Thus, our work challenges many feature selection results on high dimensional datasets, particularly in computational genomics. It raises serious concerns about studies that propose drug design or targeted interventions based on computationally selected genes, without further validation in a wet lab.


f0bf4a2da952528910047c31b6c2e951-Paper.pdf

Neural Information Processing Systems

Previous work has proposed many new loss functions and regularizers that improve test accuracy on image classification tasks. However, it is not clear whether these loss functions learn better representations for downstream tasks. This paper studies how the choice of training objective affects the transferability of the hidden representations of convolutional neural networks trained on ImageNet. We show that many objectives lead to statistically significant improvements in ImageNet accuracy over vanilla softmax cross-entropy, but the resulting fixed feature extractors transfer substantially worse to downstream tasks, and the choice of loss has little effect when networks are fully fine-tuned on the new tasks. Using centered kernel alignment to measure similarity between hidden representations of networks, we find that differences among loss functions are apparent only in the last few layers of the network. We delve deeper into representations of the penultimate layer, finding that different objectives and hyperparameter combinations lead to dramatically different levels of class separation. Representations with higher class separation obtain higher accuracy on the original task, but their features are less useful for downstream tasks. Our results suggest there exists a trade-off between learning invariant features for the original task and features relevant for transfer tasks.


A Convex formulation for linear discriminant analysis

Surineela, Sai Vijay Kumar, Kanakamalla, Prathyusha, Harikumar, Harigovind, Ghosh, Tomojit

arXiv.org Artificial Intelligence

The recent surge in multisource data collection has drastically increased data dimensionality, particularly in omics analysis, where gene expression data from microarrays or nextgeneration sequencing can exceed 50,000 measurements [32]. High-dimensional datasets often contain noisy, redundant, missing, or irrelevant features, which can degrade the performance of pattern recognition tasks [28]. The acquisition of such high-dimensional datasets necessitates innovative techniques that can effectively handle large-scale data while remaining robust to noise [30]. DR is widely applied as an essential step to extract meaningful features enabling more effective data visualization, feature extraction, and improved downstream predictive performance [12, 24]. With the advent of deep neural networks (DNNs) such as large language models (LLMs), convolutional neural networks (CNNs), and transformers, DR techniques may seem less prominent. However, despite the success of these complex architectures, linear dimensionality reduction remains a powerful and practical approach due to its interpretability, computational efficiency, and robustness in high-dimensional, low-sample-size (HDLSS) regimes [28]. Deep learning models excel at learning hierarchical representations but pose significant challenges. They require large amounts of labeled data, extensive hyper-parameter tuning, and substantial computational resources. Additionally, these models often function as black boxes, offering little interpretability of their decision-making processes [26].


Class-Aware Contrastive Optimization for Imbalanced Text Classification

Khvatskii, Grigorii, Moniz, Nuno, Doan, Khoa, Chawla, Nitesh V

arXiv.org Artificial Intelligence

The unique characteristics of text data make classification tasks a complex problem. Advances in unsupervised and semi-supervised learning and autoencoder architectures addressed several challenges. However, they still struggle with imbalanced text classification tasks, a common scenario in real-world applications, demonstrating a tendency to produce embeddings with unfavorable properties, such as class overlap. In this paper, we show that leveraging class-aware contrastive optimization combined with denoising autoencoders can successfully tackle imbalanced text classification tasks, achieving better performance than the current state-of-the-art. Concretely, our proposal combines reconstruction loss with contrastive class separation in the embedding space, allowing a better balance between the truthfulness of the generated embeddings and the model's ability to separate different classes. Compared with an extensive set of traditional and state-of-the-art competing methods, our proposal demonstrates a notable increase in performance across a wide variety of text datasets.


Steering Distortions to Preserve Classes and Neighbors in Supervised Dimensionality Reduction

Neural Information Processing Systems

Nonlinear dimensionality reduction of high-dimensional data is challenging as the low-dimensional embedding will necessarily contain distortions, and it can be hard to determine which distortions are the most important to avoid. When annotation of data into known relevant classes is available, it can be used to guide the embedding to avoid distortions that worsen class separation. The supervised mapping method introduced in the present paper, called ClassNeRV, proposes an original stress function that takes class annotation into account and evaluates embedding quality both in terms of false neighbors and missed neighbors. ClassNeRV shares the theoretical framework of a family of methods descended from Stochastic Neighbor Embedding (SNE). Our approach has a key advantage over previous ones: in the literature supervised methods often emphasize class separation at the price of distorting the data neighbors' structure; conversely, unsupervised methods provide better preservation of structure at the price of often mixing classes.