AITopics | feature reduction

Two-Layer Feature Reduction for Sparse-Group Lasso via Decomposition of Convex Sets

Neural Information Processing SystemsSep-30-2025, 10:38:05 GMT

Sparse-Group Lasso (SGL) has been shown to be a powerful regression technique for simultaneously discovering group and within-group sparse patterns by using a combination of the l1 and l2 norms. However, in large-scale applications, the complexity of the regularizers entails great computational challenges. In this paper, we propose a novel two-layer feature reduction method (TLFre) for SGL via a decomposition of its dual feasible set. The two-layer reduction is able to quickly identify the inactive groups and the inactive features, respectively, which are guaranteed to be absent from the sparse representation and can be removed from the optimization. Existing feature reduction methods are only applicable for sparse models with one sparsity-inducing regularizer. To our best knowledge, TLFre is the first one that is capable of dealing with multiple sparsity-inducing regularizers. Moreover, TLFre has a very low computational cost and can be integrated with any existing solvers. Experiments on both synthetic and real data sets show that TLFre improves the efficiency of SGL by orders of magnitude.

feature reduction, sparse-group lasso, two-layer feature reduction, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

FeatureCuts: Feature Selection for Large Data by Optimizing the Cutoff

Hu, Andy, Prasad, Devika, Pizzato, Luiz, Foord, Nicholas, Abrahamyan, Arman, Leontjeva, Anna, Doyle, Cooper, Jermyn, Dan

arXiv.org Artificial IntelligenceAug-5-2025

--In machine learning, the process of feature selection involves finding a reduced subset of features that captures most of the information required to train an accurate and efficient model. This work presents FeatureCuts, a novel feature selection algorithm that adaptively selects the optimal feature cutoff after performing filter ranking. Evaluated on 14 publicly available datasets and one industry dataset, FeatureCuts achieved, on average, 15 percentage points more feature reduction and up to 99.6% less computation time while maintaining model performance, compared to existing state-of-the-art methods. When the selected features are used in a wrapper method such as Particle Swarm Optimization (PSO), it enables 25 percentage points more feature reduction, requires 66% less computation time, and maintains model performance when compared to PSO alone. The minimal overhead of FeatureCuts makes it scalable for large datasets typically seen in enterprise applications. Traditional machine learning methods work best when their prediction signals come from data with a small, but highly informative set of features.

evolutionary algorithm, machine learning, test score 0, (16 more...)

arXiv.org Artificial Intelligence

2508.00954

Country:

Oceania > Australia (0.48)
Europe (0.46)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)

Add feedback

Feature Cross-Substitution in Adversarial Classification

Bo Li, Yevgeniy Vorobeychik

Neural Information Processing SystemsFeb-9-2025, 10:58:45 GMT

The success of machine learning, particularly in supervised settings, has led to numerous attempts to apply it in adversarial settings such as spam and malware detection. The core challenge in this class of applications is that adversaries are not static data generators, but make a deliberate effort to evade the classifiers deployed to detect them. We investigate both the problem of modeling the objectives of such adversaries, as well as the algorithmic problem of accounting for rational, objective-driven adversaries. In particular, we demonstrate severe shortcomings of feature reduction in adversarial settings using several natural adversarial objective functions, an observation that is particularly pronounced when the adversary is able to substitute across similar features (for example, replace words with synonyms or replace letters in words). We offer a simple heuristic method for making learning more robust to feature cross-substitution attacks. We then present a more general approach based on mixed-integer linear programming with constraint generation, which implicitly trades off overfitting and feature selection in an adversarial setting using a sparse regularizer along with an evasion model. Our approach is the first method for combining an adversarial classification algorithm with a very general class of models of adversarial classifier evasion. We show that our algorithmic approach significantly outperforms state-of-the-art alternatives.

adversary, artificial intelligence, machine learning, (15 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.04)

Industry:

Information Technology > Security & Privacy (1.00)
Energy (0.95)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.69)
(2 more...)

Add feedback

Deep Learning Descriptor Hybridization with Feature Reduction for Accurate Cervical Cancer Colposcopy Image Classification

Saini, Saurabh, Ahuja, Kapil, Chennareddy, Siddartha, Boddupalli, Karthik

arXiv.org Artificial IntelligenceMay-1-2024

Cervical cancer stands as a predominant cause of female mortality, underscoring the need for regular screenings to enable early diagnosis and preemptive treatment of pre-cancerous conditions. The transformation zone in the cervix, where cellular differentiation occurs, plays a critical role in the detection of abnormalities. Colposcopy has emerged as a pivotal tool in cervical cancer prevention since it provides a meticulous examination of cervical abnormalities. However, challenges in visual evaluation necessitate the development of Computer Aided Diagnosis (CAD) systems. We propose a novel CAD system that combines the strengths of various deep-learning descriptors (ResNet50, ResNet101, and ResNet152) with appropriate feature normalization (min-max) as well as feature reduction technique (LDA). The combination of different descriptors ensures that all the features (low-level like edges and colour, high-level like shape and texture) are captured, feature normalization prevents biased learning, and feature reduction avoids overfitting. We do experiments on the IARC dataset provided by WHO. The dataset is initially segmented and balanced. Our approach achieves exceptional performance in the range of 97%-100% for both the normal-abnormal and the type classification. A competitive approach for type classification on the same dataset achieved 81%-91% performance.

classification, colposcopy image, type classification, (12 more...)

arXiv.org Artificial Intelligence

2405.016

Country:

North America > United States > Maryland > Montgomery County > Rockville (0.04)
North America > Costa Rica (0.04)
Asia > India > Madhya Pradesh (0.04)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area > Obstetrics/Gynecology (1.00)
Health & Medicine > Therapeutic Area > Oncology > Cervical Cancer (0.83)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Feature Cross-Substitution in Adversarial Classification

Neural Information Processing SystemsMar-13-2024, 10:26:48 GMT

The success of machine learning, particularly in supervised settings, has led to numerous attempts to apply it in adversarial settings such as spam and malware detection. The core challenge in this class of applications is that adversaries are not static data generators, but make a deliberate effort to evade the classifiers deployed to detect them. We investigate both the problem of modeling the objectives of such adversaries, as well as the algorithmic problem of accounting for rational, objective-driven adversaries. In particular, we demonstrate severe shortcomings of feature reduction in adversarial settings using several natural adversarial objective functions, an observation that is particularly pronounced when the adversary is able to substitute across similar features (for example, replace words with synonyms or replace letters in words). We offer a simple heuristic method for making learning more robust to feature cross-substitution attacks. We then present a more general approach based on mixed-integer linear programming with constraint generation, which implicitly trades off overfitting and feature selection in an adversarial setting using a sparse regularizer along with an evasion model. Our approach is the first method for combining an adversarial classification algorithm with a very general class of models of adversarial classifier evasion. We show that our algorithmic approach significantly outperforms state-of-the-art alternatives.

adversary, classification, classifier, (13 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.04)

Industry:

Information Technology > Security & Privacy (1.00)
Energy (0.95)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.69)
(2 more...)

Add feedback

Neural Networks Optimizations Against Concept and Data Drift in Malware Detection

Maillet, William, Marais, Benjamin

arXiv.org Artificial IntelligenceAug-21-2023

Traditional malware detection methods rely on signatures, heuristics and behaviors [1, 2]. However, these solutions are not suitable in the long term due to the significant number of malware present in the cyberspace, and creating new rules for detection becomes an impractical and unscalable approach. As an alternative, machine learning models have demonstrated great success in various tasks, such as classification, computer vision, and anomaly detection, making them promising solutions for the future of malicious software detection. In particular, neural networks and LightGBM [3] have shown particularly encouraging results [4, 5, 6]. Such machine learning models can use static characteristics extracted from malicious files, such as imports, strings, and headers information, or dynamic characteristics, as network activity or registry modifications, collected during files execution. While these models perform well, they face the challenge of constant malware evolution.

artificial intelligence, dataset, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2308.10821

Country:

Europe > France > Normandy (0.04)
Europe > France > Brittany > Ille-et-Vilaine > Rennes (0.04)

Genre: Research Report > Promising Solution (0.54)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Dimensionality Reduction for Machine Learning

#artificialintelligenceSep-19-2021, 22:30:05 GMT

What is High Demensional Data? How does it affect your Machine Learning models? Have you ever wondered why your model isn't meeting your expectations and you have tried hyper-tuning the parameters until the ends of the earth, with no improvements? Understanding your data and your model may be key. Underneath such an immense and complicated hood, you may be concerned that there are few to no ways of gaining more insight into your data, as well as your model.

dataset, feature reduction, reduction, (12 more...)

#artificialintelligence

Country: North America > United States > Iowa > Story County > Ames (0.05)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.43)

Add feedback

PCA, LDA, and SVD: Model Tuning Through Feature Reduction for Transportation POI Classification

#artificialintelligenceJul-10-2021, 21:25:27 GMT

PCA is a dimension reduction method that takes datasets with a large number of features and reduces them to a few underlying features. The sklearn PCA package performs this process for us. In the snippet of code below we are reducing the 75 features that the initial dataset has into 8 features. This snippet serves to show the optimal number of features for the feature reduction algorithm to fit into. The below snippets will show how to use the Gaussian Naive Bayes, Decision Tree, and the K-Nearest Neighbors Classifiers with the reduced features.

accuracy, algorithm, dataframe, (11 more...)

#artificialintelligence

Country:

North America > United States (0.04)
North America > Canada (0.04)

Industry:

Transportation > Infrastructure & Services (0.54)
Transportation > Ground (0.36)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.95)

Add feedback

EEF: Exponentially Embedded Families with Class-Specific Features for Classification

Tang, Bo, Kay, Steven, He, Haibo, Baggenstoss, Paul M.

arXiv.org Machine LearningMay-27-2016

Classification is one of fundamental problems in the fields of machine learning and signal processing. The commonly used classifier assigns a sample or a signal to the class with maximum posterior probability, which usually requires probability density function (PDF) estimation in an either model-driven or data-driven manner [1] [2] [3]. For high-dimensional data sets, it is necessary to perform feature reduction to estimate the PDFs robustly in a lowdimensional feature subspace. However, feature reduction may lose pertinent information for discrimination. For example, data samples from different classes that could be well separated in the raw data space may be overlapped in the feature subspace, causing classification errors. The PDF reconstruction approach provides a solution to address this information loss issue in feature reduction by reconstructing the PDF on raw data and making classification in raw data space, which could improve classification performance. Several approaches have been developed along this track.

artificial intelligence, classifier, machine learning, (12 more...)

arXiv.org Machine Learning

doi: 10.1109/LSP.2016.2574327

1605.03631

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.70)

Add feedback

Feature Cross-Substitution in Adversarial Classification

Li, Bo, Vorobeychik, Yevgeniy

Neural Information Processing SystemsDec-31-2014

The success of machine learning, particularly in supervised settings, has led to numerous attempts to apply it in adversarial settings such as spam and malware detection. The core challenge in this class of applications is that adversaries are not static data generators, but make a deliberate effort to evade the classifiers deployed to detect them. We investigate both the problem of modeling the objectives of such adversaries, as well as the algorithmic problem of accounting for rational, objective-driven adversaries. In particular, we demonstrate severe shortcomings of feature reduction in adversarial settings using several natural adversarial objective functions, an observation that is particularly pronounced when the adversary is able to substitute across similar features (for example, replace words with synonyms or replace letters in words). We offer a simple heuristic method for making learning more robust to feature cross-substitution attacks. We then present a more general approach based on mixed-integer linear programming with constraint generation, which implicitly trades off overfitting and feature selection in an adversarial setting using a sparse regularizer along with an evasion model. Our approach is the first method for combining an adversarial classification algorithm with a very general class of models of adversarial classifier evasion. We show that our algorithmic approach significantly outperforms state-of-the-art alternatives.

adversary, artificial intelligence, machine learning, (14 more...)

Neural Information Processing Systems

Country: North America > United States (0.68)

Industry:

Information Technology > Security & Privacy (1.00)
Energy (0.95)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

Add feedback

Filters

Collaborating Authors

feature reduction

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Two-Layer Feature Reduction for Sparse-Group Lasso via Decomposition of Convex Sets

FeatureCuts: Feature Selection for Large Data by Optimizing the Cutoff

Feature Cross-Substitution in Adversarial Classification

Deep Learning Descriptor Hybridization with Feature Reduction for Accurate Cervical Cancer Colposcopy Image Classification

Feature Cross-Substitution in Adversarial Classification

Neural Networks Optimizations Against Concept and Data Drift in Malware Detection

Dimensionality Reduction for Machine Learning

PCA, LDA, and SVD: Model Tuning Through Feature Reduction for Transportation POI Classification

EEF: Exponentially Embedded Families with Class-Specific Features for Classification

Feature Cross-Substitution in Adversarial Classification