AITopics | feature redundancy

Collaborating Authors

feature redundancy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Beyond Redundancy: Diverse and Specialized Multi-Expert Sparse Autoencoder

Xu, Zhen, Tan, Zhen, Wang, Song, Xu, Kaidi, Chen, Tianlong

arXiv.org Artificial IntelligenceNov-11-2025

Sparse autoencoders (SAEs) have emerged as a powerful tool for interpreting large language models (LLMs) by decomposing token activations into combinations of human-understandable features. While SAEs provide crucial insights into LLM explanations, their practical adoption faces a fundamental challenge: better interpretability demands that SAEs' hidden layers have high dimensionality to satisfy sparsity constraints, resulting in prohibitive training and inference costs. Recent Mixture of Experts (MoE) approaches attempt to address this by partitioning SAEs into narrower expert networks with gated activation, thereby reducing computation. In a well-designed MoE, each expert should focus on learning a distinct set of features. However, we identify a \textit{critical limitation} in MoE-SAE: Experts often fail to specialize, which means they frequently learn overlapping or identical features. To deal with it, we propose two key innovations: (1) Multiple Expert Activation that simultaneously engages semantically weighted expert subsets to encourage specialization, and (2) Feature Scaling that enhances diversity through adaptive high-frequency scaling. Experiments demonstrate a 24\% lower reconstruction error and a 99\% reduction in feature redundancy compared to existing MoE-SAE methods. This work bridges the interpretability-efficiency gap in LLM analysis, allowing transparent model inspection without compromising computational feasibility.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2511.05745

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

FRET: Feature Redundancy Elimination for Test Time Adaptation

You, Linjing, Lu, Jiabao, Huang, Xiayuan, Nie, Xiangli

arXiv.org Artificial IntelligenceMay-19-2025

Test-Time Adaptation (TTA) aims to enhance the generalization of deep learning models when faced with test data that exhibits distribution shifts from the training data. In this context, only a pre-trained model and unlabeled test data are available, making it particularly relevant for privacy-sensitive applications. In practice, we observe that feature redundancy in embeddings tends to increase as domain shifts intensify in TTA. However, existing TTA methods often overlook this redundancy, which can hinder the model's adaptability to new data. To address this issue, we introduce Feature Redundancy Elimination for Test-time Adaptation (FRET), a novel perspective for TTA. A straightforward approach (S-FRET) is to directly minimize the feature redundancy score as an optimization objective to improve adaptation. Despite its simplicity and effectiveness, S-FRET struggles with label shifts, limiting its robustness in real-world scenarios. To mitigate this limitation, we further propose Graph-based FRET (G-FRET), which integrates a Graph Convolutional Network (GCN) with contrastive learning. This design not only reduces feature redundancy but also enhances feature discriminability in both the representation and prediction layers. Extensive experiments across multiple model architectures, tasks, and datasets demonstrate the effectiveness of S-FRET and show that G-FRET achieves state-of-the-art performance. Further analysis reveals that G-FRET enables the model to extract non-redundant and highly discriminative features during inference, thereby facilitating more robust test-time adaptation.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2505.10641

Country: Asia (0.28)

Genre: Research Report > Promising Solution (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback

Less is More: On the Feature Redundancy of Pretrained Models When Transferring to Few-shot Tasks

Luo, Xu, Zou, Difan, Gao, Lianli, Xu, Zenglin, Song, Jingkuan

arXiv.org Artificial IntelligenceOct-5-2023

Transferring a pretrained model to a downstream task can be as easy as conducting linear probing with target data, that is, training a linear classifier upon frozen features extracted from the pretrained model. As there may exist significant gaps between pretraining and downstream datasets, one may ask whether all dimensions of the pretrained features are useful for a given downstream task. We show that, for linear probing, the pretrained features can be extremely redundant when the downstream data is scarce, or few-shot. For some cases such as 5-way 1-shot tasks, using only 1\% of the most important feature dimensions is able to recover the performance achieved by using the full representation. Interestingly, most dimensions are redundant only under few-shot settings and gradually become useful when the number of shots increases, suggesting that feature redundancy may be the key to characterizing the "few-shot" nature of few-shot transfer problems. We give a theoretical understanding of this phenomenon and show how dimensions with high variance and small distance between class centroids can serve as confounding factors that severely disturb classification results under few-shot settings. As an attempt at solving this problem, we find that the redundant features are difficult to identify accurately with a small number of training samples, but we can instead adjust feature magnitude with a soft mask based on estimated feature importance. We show that this method can generally improve few-shot transfer performance across various pretrained models and downstream datasets.

feature redundancy, few-shot task, pretrained model, (1 more...)

arXiv.org Artificial Intelligence

2310.03843

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.53)

Add feedback

Feature Decomposition for Reducing Negative Transfer: A Novel Multi-task Learning Method for Recommender System

Zhou, Jie, Yu, Qian, Luo, Chuan, Zhang, Jing

arXiv.org Artificial IntelligenceFeb-9-2023

In recent years, thanks to the rapid development of deep learning (DL), DL-based multi-task learning (MTL) has made significant progress, and it has been successfully applied to recommendation systems (RS). However, in a recommender system, the correlations among the involved tasks are complex. Therefore, the existing MTL models designed for RS suffer from negative transfer to different degrees, which will injure optimization in MTL. We find that the root cause of negative transfer is feature redundancy that features learned for different tasks interfere with each other. To alleviate the issue of negative transfer, we propose a novel multi-task learning method termed Feature Decomposition Network (FDN). The key idea of the proposed FDN is reducing the phenomenon of feature redundancy by explicitly decomposing features into task-specific features and task-shared features with carefully designed constraints. We demonstrate the effectiveness of the proposed method on two datasets, a synthetic dataset and a public datasets (i.e., Ali-CCP). Experimental results show that our proposed FDN can outperform the state-of-the-art (SOTA) methods by a noticeable margin.

artificial intelligence, constraint, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2302.05031

Country: Asia > China (0.05)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

One Bit Matters: Understanding Adversarial Examples as the Abuse of Redundancy

Wang, Jingkang, Jia, Ruoxi, Friedland, Gerald, Li, Bo, Spanos, Costas

arXiv.org Machine LearningOct-23-2018

Despite the great success achieved in machine learning (ML), adversarial examples have caused concerns with regards to its trustworthiness: A small perturbation of an input results in an arbitrary failure of an otherwise seemingly well-trained ML model. While studies are being conducted to discover the intrinsic properties of adversarial examples, such as their transferability and universality, there is insufficient theoretic analysis to help understand the phenomenon in a way that can influence the design process of ML experiments. In this paper, we deduce an information-theoretic model which explains adversarial attacks as the abuse of feature redundancies in ML algorithms. We prove that feature redundancy is a necessary condition for the existence of adversarial examples. Our model helps to explain some major questions raised in many anecdotal studies on adversarial examples. Our theory is backed up by empirical measurements of the information content of benign and adversarial examples on both image and text datasets. Our measurements show that typical adversarial examples introduce just enough redundancy to overflow the decision making of an ML model trained on corresponding benign examples. We conclude with actionable recommendations to improve the robustness of machine learners against adversarial examples.

adversarial example, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

1810.0965

Country: North America > United States > California > Alameda County > Berkeley (0.15)

Genre: Research Report (1.00)

Industry:

Government (1.00)
Information Technology > Security & Privacy (0.90)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Feature Selection via L1-Penalized Squared-Loss Mutual Information

Jitkrittum, Wittawat, Hachiya, Hirotaka, Sugiyama, Masashi

arXiv.org Machine LearningOct-6-2012

Feature selection is a technique to screen out less important features. Many existing supervised feature selection algorithms use redundancy and relevancy as the main criteria to select features. However, feature interaction, potentially a key characteristic in real-world problems, has not received much attention. As an attempt to take feature interaction into account, we propose L1-LSMI, an L1-regularization based algorithm that maximizes a squared-loss variant of mutual information between selected features and outputs. Numerical results show that L1-LSMI performs well in handling redundancy, detecting non-linear dependency, and considering feature interaction.

artificial intelligence, machine learning, selection, (16 more...)

arXiv.org Machine Learning

doi: 10.1587/transinf.E96.D.1513

1210.196

Country:

Asia (0.68)
North America > United States > California (0.28)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback