AITopics | boruta

Collaborating Authors

boruta

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

BoMGene: Integrating Boruta-mRMR feature selection for enhanced Gene expression classification

Phan, Bich-Chung, Ma, Thanh, Nguyen, Huu-Hoa, Do, Thanh-Nghi

arXiv.org Artificial IntelligenceOct-2-2025

Feature selection is a crucial step in analyzing gene expression data, enhancing classification performance, and reducing computational costs for high-dimensional datasets. This paper proposes BoMGene, a hybrid feature selection method that effectively integrates two popular techniques: Boruta and Minimum Redundancy Maximum Relevance (mRMR). The method aims to optimize the feature space and enhance classification accuracy. Experiments were conducted on 25 publicly available gene expression datasets, employing widely used classifiers such as Support Vector Machine (SVM), Random Forest, XGBoost (XGB), and Gradient Boosting Machine (GBM). The results show that using the Boruta-mRMR combination cuts down the number of features chosen compared to just using mRMR, which helps to speed up training time while keeping or even improving classification accuracy compared to using individual feature selection methods. The proposed approach demonstrates clear advantages in accuracy, stability, and practical applicability for multi-class gene expression data analysis

artificial intelligence, machine learning, selection, (15 more...)

arXiv.org Artificial Intelligence

2510.00907

Country: Asia > Vietnam (0.15)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.54)

Add feedback

BOLIMES: Boruta and LIME optiMized fEature Selection for Gene Expression Classification

Phan, Bich-Chung, Ma, Thanh, Nguyen, Huu-Hoa, Do, and Thanh-Nghi

arXiv.org Artificial IntelligenceFeb-18-2025

Gene expression classification is a pivotal yet challenging task in bioinformatics, primarily due to the high dimensionality of genomic data and the risk of overfitting. To bridge this gap, we propose BOLIMES, a novel feature selection algorithm designed to enhance gene expression classification by systematically refining the feature subset. Unlike conventional methods that rely solely on statistical ranking or classifier-specific selection, we integrate the robustness of Boruta with the interpretability of LIME, ensuring that only the most relevant and influential genes are retained. BOLIMES first employs Boruta to filter out non-informative genes by comparing each feature against its randomized counterpart, thus preserving valuable information. It then uses LIME to rank the remaining genes based on their local importance to the classifier. Finally, an iterative classification evaluation determines the optimal feature subset by selecting the number of genes that maximizes predictive accuracy. By combining exhaustive feature selection with interpretability-driven refinement, our solution effectively balances dimensionality reduction with high classification performance, offering a powerful solution for high-dimensional gene expression analysis.

artificial intelligence, classification, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2502.1308

Genre: Research Report (0.64)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Add feedback

Noise-Augmented Boruta: The Neural Network Perturbation Infusion with Boruta Feature Selection

Gharoun, Hassan, Yazdanjoe, Navid, Khorshidi, Mohammad Sadegh, Gandomi, Amir H.

arXiv.org Artificial IntelligenceSep-18-2023

With the surge in data generation, both vertically (i.e., volume of data) and horizontally (i.e., dimensionality), the burden of the curse of dimensionality has become increasingly palpable. Feature selection, a key facet of dimensionality reduction techniques, has advanced considerably to address this challenge. One such advancement is the Boruta feature selection algorithm, which successfully discerns meaningful features by contrasting them to their permutated counterparts known as shadow features. However, the significance of a feature is shaped more by the data's overall traits than by its intrinsic value, a sentiment echoed in the conventional Boruta algorithm where shadow features closely mimic the characteristics of the original ones. Building on this premise, this paper introduces an innovative approach to the Boruta feature selection algorithm by incorporating noise into the shadow variables. Drawing parallels from the perturbation analysis framework of artificial neural networks, this evolved version of the Boruta method is presented. Rigorous testing on four publicly available benchmark datasets revealed that this proposed technique outperforms the classic Boruta algorithm, underscoring its potential for enhanced, accurate feature selection.

algorithm, dataset, feature selection, (16 more...)

arXiv.org Artificial Intelligence

2309.09694

Country: North America > United States > Ohio (0.04)

Genre:

Research Report > Promising Solution (0.34)
Overview > Innovation (0.34)

Industry:

Health & Medicine > Therapeutic Area (0.72)
Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Learning Machine Learning Part 1: Introduction and Revoke-Obfuscation

#artificialintelligenceDec-20-2022, 04:50:46 GMT

The more a feature decreases the impurity, the more important the feature is. In random forests, the impurity decrease from each feature can be averaged across trees to determine the final importance of the variable.

algorithm, dataset, logistic regression, (15 more...)

#artificialintelligence

Country: North America > United States (0.04)

Industry:

Leisure & Entertainment > Games (0.93)
Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.97)

Add feedback

Feature Selection Using Boruta

#artificialintelligenceNov-20-2021, 19:00:56 GMT

Feature Selection is a crucial step in machine learning. In feature selection we select relevant features to our model. Features which give useful information about the data and improve the accuracy of the model is all a Data Scientist needs. Finding out the relevant features is the tough job in end to end projects. There are a lot of methods for feature selection.

boruta, feature selection, relevant feature, (14 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Feature Importance -- How's and Why's

#artificialintelligenceOct-30-2020, 18:36:26 GMT

In this article, we will be exploring various feature selection techniques that we need to be familiar with, in order to get the best performance out of your model. SelectKbest is a method provided by sklearn to rank features of a dataset by their "importance "with respect to the target variable. This "importance" is calculated using a score function which can be one of the following: All of the above-mentioned scoring functions are based on statistics. For instance, the f_regression function arranges the p_values of each of the variables in increasing order and picks the best K columns with the least p_value. Features with a p_value of less than 0.05 are considered "significant" and only these features should be used in the predictive model.

artificial intelligence, feature importance, machine learning, (13 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.39)

Add feedback

Sequential Feature Classification in the Context of Redundancies

Pfannschmidt, Lukas, Hammer, Barbara

arXiv.org Machine LearningApr-1-2020

The problem of all-relevant feature selection is concerned with finding a relevant feature set with preserved redundancies. There exist several approximations to solve this problem but only one could give a distinction between strong and weak relevance. This approach was limited to the case of linear problems. In this work, we present a new solution for this distinction in the non-linear case through the use of random forest models and statistical methods.

feature selection, importance value, relevant feature, (14 more...)

arXiv.org Machine Learning

2004.00658

Country: Europe > Germany (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.37)

Add feedback

Feature Selection: Beyond feature importance? - KDnuggets

#artificialintelligenceOct-26-2019, 18:28:48 GMT

In machine learning, Feature Selection is the process of choosing features that are most useful for your prediction. Although it sounds simple it is one of the most complex problems in the work of creating a new machine learning model. In this post, I will share with you some of the approaches that were researched during the last project I led at Fiverr. You will get some ideas on the basic method I tried and also the more complex approach, which got the best results -- removing over 60% of the features, while maintaining accuracy and achieving more stability for our model. I'll also be sharing our improvement to this algorithm.

algorithm, feature selection, random feature, (13 more...)

#artificialintelligence

Country: Europe > Poland > Masovia Province > Warsaw (0.05)

Genre: Research Report (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.33)

Add feedback

varrank: an R package for variable ranking based on mutual information with applications to observed systemic datasets

Kratzer, Gilles, Furrer, Reinhard

arXiv.org Machine LearningApr-19-2018

This article describes the R package varrank. It has a flexible implementation of heuristic approaches which perform variable ranking based on mutual information. The package is particularly suitable for exploring multivariate datasets requiring a holistic analysis. The core functionality is a general implementation of the minimum redundancy maximum relevance (mRMRe) model. This approach is based on information theory metrics. It is compatible with discrete and continuous data which are discretised using a large choice of possible rules. The two main problems that can be addressed by this package are the selection of the most representative variables for modeling a collection of variables of interest, i.e., dimension reduction, and variable ranking with respect to a set of variables of interest.

artificial intelligence, machine learning, varrank, (16 more...)

arXiv.org Machine Learning

1804.07134

Country:

Europe > Austria (0.28)
Europe > Switzerland > Zürich > Zürich (0.15)

Genre: Research Report (0.50)

Industry: Health & Medicine > Epidemiology (0.96)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Information Management (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

R Addict Blog

#artificialintelligenceJun-22-2016, 22:33:33 GMT

Feature selection is a process of extracting valuable features that have significant influence on dependent variable. This is still an active field of research and machine wandering. In this post I compare few feature selection algorithms: traditional GLM with regularization, computationally demanding Boruta and entropy based filter from FSelectorRcpp (free of Java/Weka) package. Check out the comparison on Venn Diagram carried out on data from the RTCGA factory of R data packages. I would like to thank Magda Sobiczewska and pbiecek for inspiration for this comparison.

algorithm, artificial intelligence, machine learning, (17 more...)

#artificialintelligence

Industry: Health & Medicine > Therapeutic Area > Oncology (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback