AITopics | nominal feature

Collaborating Authors

nominal feature

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

RO-FIGS: Efficient and Expressive Tree-Based Ensembles for Tabular Data

Matjašec, Urška, Simidjievski, Nikola, Jamnik, Mateja

arXiv.org Artificial IntelligenceMay-8-2025

Tree-based models are often robust to uninformative features and can accurately capture non-smooth, complex decision boundaries. Consequently, they often outperform neural network-based models on tabular datasets at a significantly lower computational cost. Nevertheless, the capability of traditional tree-based ensembles to express complex relationships efficiently is limited by using a single feature to make splits. To improve the efficiency and expressiveness of tree-based methods, we propose Random Oblique Fast Interpretable Greedy-Tree Sums (RO-FIGS). RO-FIGS builds on Fast Interpretable Greedy-Tree Sums, and extends it by learning trees with oblique or multivariate splits, where each split consists of a linear combination learnt from random subsets of features. This helps uncover interactions between features and improves performance. The proposed method is suitable for tabular datasets with both numerical and categorical features. We evaluate RO-FIGS on 22 real-world tabular datasets, demonstrating superior performance and much smaller models over other tree- and neural network-based methods. Additionally, we analyse their splits to reveal valuable insights into feature interactions, enriching the information learnt from SHAP summary plots, and thereby demonstrating the enhanced interpretability of RO-FIGS models. The proposed method is well-suited for applications, where balance between accuracy and interpretability is essential.

artificial intelligence, decision tree learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/CITREx64975.2025.10974939

2504.06927

Country: North America > United States (0.93)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area (0.69)
Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)

Add feedback

R2VF: A Two-Step Regularization Algorithm to Cluster Categories in GLMs

Dror, Yuval Ben

arXiv.org Artificial IntelligenceMar-3-2025

Over recent decades, extensive research has aimed to overcome the restrictive underlying assumptions required for a Generalized Linear Model to generate accurate and meaningful predictions. These efforts include regularizing coefficients, selecting features, and clustering ordinal categories, among other approaches. Despite these advances, efficiently clustering nominal categories in GLMs without incurring high computational costs remains a challenge. This paper introduces Ranking to Variable Fusion (R2VF), a two-step method designed to efficiently fuse nominal and ordinal categories in GLMs. By first transforming nominal features into an ordinal framework via regularized regression and then applying variable fusion, R2VF strikes a balance between model complexity and interpretability. We demonstrate the effectiveness of R2VF through comparisons with other methods, highlighting its performance in addressing overfitting and finding a proper set of covariates.

categorical feature, category, coefficient, (16 more...)

arXiv.org Artificial Intelligence

2503.01521

Country: Asia > Middle East > Israel (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

In to Decision Trees Part: 2. Hi! Hello and thanks for reading this…

#artificialintelligenceDec-5-2022, 20:45:06 GMT

If you missed the previous part of this blog "In to Decision Trees Part: 1", please visit it. In this blog, we will explore more about Decision Trees Algorithm and its capability. The Process of Decision trees on Numerical (Discrete/Continuous) features is slightly different than categorical features. While the Decision trees can handle categorical variables with ease. There are two types of categorical values.

categorical feature, decision tree part, overfitting, (7 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Generalization Properties of Decision Trees on Real-valued and Categorical Features

Leboeuf, Jean-Samuel, LeBlanc, Frédéric, Marchand, Mario

arXiv.org Artificial IntelligenceOct-18-2022

We revisit binary decision trees from the perspective of partitions of the data. We introduce the notion of partitioning function, and we relate it to the growth function and to the VC dimension. We consider three types of features: real-valued, categorical ordinal and categorical nominal, with different split rules for each. For each feature type, we upper bound the partitioning function of the class of decision stumps before extending the bounds to the class of general decision tree (of any fixed structure) using a recursive approach. Using these new results, we are able to find the exact VC dimension of decision stumps on examples of $\ell$ real-valued features, which is given by the largest integer $d$ such that $2\ell \ge \binom{d}{\lfloor\frac{d}{2}\rfloor}$. Furthermore, we show that the VC dimension of a binary tree structure with $L_T$ leaves on examples of $\ell$ real-valued features is in $O(L_T \log(L_T\ell))$. Finally, we elaborate a pruning algorithm based on these results that performs better than the cost-complexity and reduced-error pruning algorithms on a number of data sets, with the advantage that no cross-validation is required.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2210.10781

Country:

North America > Canada > Quebec (0.04)
North America > United States > Wisconsin (0.04)
Europe > Hungary > Hajdú-Bihar County > Debrecen (0.04)

Genre: Research Report > New Finding (0.87)

Industry: Health & Medicine > Therapeutic Area (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

SMOTE-ENC: A novel SMOTE-based method to generate synthetic data for nominal and continuous features

Mukherjee, Mimi, Khushi, Matloob

arXiv.org Artificial IntelligenceMar-12-2021

Real world datasets are heavily skewed where some classes are significantly outnumbered by the other classes. In these situations, machine learning algorithms fail to achieve substantial efficacy while predicting these under-represented instances. To solve this problem, many variations of synthetic minority over-sampling methods (SMOTE) have been proposed to balance the dataset which deals with continuous features. However, for datasets with both nominal and continuous features, SMOTE-NC is the only SMOTE-based over-sampling technique to balance the data. In this paper, we present a novel minority over-sampling method, SMOTE-ENC (SMOTE - Encoded Nominal and Continuous), in which, nominal features are encoded as numeric values and the difference between two such numeric value reflects the amount of change of association with minority class. Our experiments show that the classification model using SMOTE-ENC method offers better prediction than model using SMOTE-NC when the dataset has a substantial number of nominal features and also when there is some association between the categorical features and the target class. Additionally, our proposed method addressed one of the major limitations of SMOTE-NC algorithm. SMOTE-NC can be applied only on mixed datasets that have features consisting of both continuous and nominal features and cannot function if all the features of the dataset are nominal. Our novel method has been generalized to be applied on both mixed datasets and on nominal only datasets. The code is available from mkhushi.github.io

dataset, nominal feature, smote-enc, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.3390/asi4010018

2103.07612

Country:

South America > Paraguay > Asunción > Asunción (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Network Intrusion Detection based on LSTM and Feature Embedding

Gwon, Hyeokmin, Lee, Chungjun, Keum, Rakun, Choi, Heeyoul

arXiv.org Machine LearningNov-26-2019

Growing number of network devices and services have led to increasing demand for protective measures as hackers launch attacks to paralyze or steal information from victim systems. Intrusion Detection System (IDS) is one of the essential elements of network perimeter security which detects the attacks by inspecting network traffic packets or operating system logs. While existing works demonstrated effectiveness of various machine learning techniques, only few of them utilized the time-series information of network traffic data. Also, categorical information has not been included in neural network based approaches. In this paper, we propose network intrusion detection models based on sequential information using long short-term memory (LSTM) network and categorical information using the embedding technique. We have experimented the models with UNSW-NB15, which is a comprehensive network traffic dataset. The experiment results confirm that the proposed method improve the performance, observing binary classification accuracy of 99.72\%.

dataset, information, packet, (13 more...)

arXiv.org Machine Learning

1911.11552

Country:

Oceania > Australia (0.04)
North America > United States > Indiana > Marion County > Indianapolis (0.04)
Asia > South Korea > Gyeongsangbuk-do > Pohang (0.04)

Genre: Research Report > New Finding (0.66)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Why You Should Build XGBoost Models Within H2O - Sefik Ilkin Serengil

#artificialintelligenceNov-7-2019, 15:10:43 GMT

XGBoost triggered the rise of the tree based models in the machine learning world. It earns reputation with its robust models. Its built models mostly get almost 2% more accuracy. On the other hand, it is a fact that XGBoost is almost 10 times slower than LightGBM. Speed means a lot in a data challenge.

building id, regular xgboost, xgboost, (11 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Add feedback

Introduction to Machine Learning - CodeProject

#artificialintelligenceOct-23-2016, 06:21:27 GMT

"Machine learning is the field of study that gives computers the ability to learn without being explicitly programmed" Arthur Smauel. We can think of machine learning as approach to automate tasks like predictions or modelling. For example, consider an email spam filter system, instead of having programmers manually looking at the emails and coming up with spam rules. We can use a machine learning algorithm and feed it input data (emails) and it will automatically discover rules that are powerful enough to distinguish spam emails. Machine learning is used in many application nowadays like spam detection in emails or movie recommendation systems that tells you movies that you might like based on your viewing history.

artificial intelligence, dataset, machine learning, (14 more...)

#artificialintelligence

Industry: Education > Curriculum > Subject-Specific Education (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.54)

Add feedback

A Hierarchical Multi-Output Nearest Neighbor Model for Multi-Output Dependence Learning

Morris, Richard G., Martinez, Tony, Smith, Michael R.

arXiv.org Machine LearningOct-17-2014

Multi-Output Dependence (MOD) learning is a generalization of standard classification problems that allows for multiple outputs that are dependent on each other. A primary issue that arises in the context of MOD learning is that for any given input pattern there can be multiple correct output patterns. This changes the learning task from function approximation to relation approximation. Previous algorithms do not consider this problem, and thus cannot be readily applied to MOD problems. To perform MOD learning, we introduce the Hierarchical Multi-Output Nearest Neighbor model (HMONN) that employs a basic learning model for each output and a modified nearest neighbor approach to refine the initial results.

artificial intelligence, machine learning, mod problem, (19 more...)

arXiv.org Machine Learning

1410.4777

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.83)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback