AITopics | categorical data

Collaborating Authors

categorical data

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

4620a66570e554a3ff0e39dc59bcb07a-Paper-Conference.pdf

Neural Information Processing SystemsFeb-8-2026, 16:26:16 GMT

By design, COLP is a parsimonious classifier, which gives rise to a provably identifiable causal model.

artificial intelligence, causal model, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Oceania > Australia > Tasmania (0.04)
North America > United States > Virginia > Arlington County > Arlington (0.04)
North America > United States > Texas > Brazos County > College Station (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.47)

Add feedback

Transferable Adversarial Robustness for Categorical Data via Universal Robust Embeddings

Neural Information Processing SystemsDec-24-2025, 05:17:59 GMT

Research on adversarial robustness is primarily focused on image and text data. Yet, many scenarios in which lack of robustness can result in serious risks, such as fraud detection, medical diagnosis, or recommender systems often do not rely on images or text but instead on tabular data. Adversarial robustness in tabular data poses two serious challenges. First, tabular datasets often contain categorical features, and therefore cannot be tackled directly with existing optimization procedures. Second, in the tabular domain, algorithms that are not based on deep networks are widely used and offer great performance, but algorithms to enhance robustness are tailored to neural networks (e.g.

categorical data, tabular data, transferable adversarial robustness, (7 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.60)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.60)

Add feedback

Bivariate Causal Discovery for Categorical Data via Classification with Optimal Label Permutation

Neural Information Processing SystemsDec-24-2025, 03:22:36 GMT

Causal discovery for quantitative data has been extensively studied but less is known for categorical data. We propose a novel causal model for categorical data based on a new classification model, termed classification with optimal label permutation (COLP). By design, COLP is a parsimonious classifier, which gives rise to a provably identifiable causal model. A simple learning algorithm via comparing likelihood functions of causal and anti-causal models suffices to learn the causal direction. Through experiments with synthetic and real data, we demonstrate the favorable performance of the proposed COLP-based causal model compared to state-of-the-art methods. We also make available an accompanying R package COLP, which contains the proposed causal discovery algorithm and a benchmark dataset of categorical cause-effect pairs.

bivariate causal discovery, categorical data, classification, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.80)

Add feedback

Break the Tie: Learning Cluster-Customized Category Relationships for Categorical Data Clustering

Zhao, Mingjie, Huang, Zhanpei, Lu, Yang, Li, Mengke, Zhang, Yiqun, Su, Weifeng, Cheung, Yiu-ming

arXiv.org Artificial IntelligenceNov-13-2025

Categorical attributes with qualitative values are ubiquitous in cluster analysis of real datasets. Unlike the Euclidean distance of numerical attributes, the categorical attributes lack well-defined relationships of their possible values (also called categories interchangeably), which hampers the exploration of compact categorical data clusters. Although most attempts are made for developing appropriate distance metrics, they typically assume a fixed topological relationship between categories when learning distance metrics, which limits their adaptability to varying cluster structures and often leads to suboptimal clustering performance. This paper, therefore, breaks the intrinsic relationship tie of attribute categories and learns customized distance metrics suitable for flexibly and accurately revealing various cluster distributions. As a result, the fitting ability of the clustering algorithm is significantly enhanced, benefiting from the learnable category relationships. Moreover, the learned category relationships are proved to be Euclidean distance metric-compatible, enabling a seamless extension to mixed datasets that include both numerical and categorical attributes. Comparative experiments on 12 real benchmark datasets with significance tests show the superior clustering accuracy of the proposed method with an average ranking of 1.25, which is significantly higher than the 5.21 ranking of the current best-performing method. Code and extended version with detailed proofs are provided below.

artificial intelligence, category relationship, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2511.09049

Country: Asia > China > Guangdong Province (0.46)

Genre: Research Report > Experimental Study (0.34)

Industry:

Education > Educational Setting (0.68)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.67)
Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

CADM: Cluster-customized Adaptive Distance Metric for Categorical Data Clustering

Chen, Taixi, Cheung, Yiu-ming, Zhang, Yiqun

arXiv.org Machine LearningNov-11-2025

ABSTRACT An appropriate distance metric is crucial for categorical data clustering, as the distance between categorical data cannot be directly calculated. However, the distances between attribute values usually vary in different clusters induced by their different distributions, which has not been taken into account, thus leading to unreasonable distance measurement. Therefore, we propose a cluster-customized distance metric for categorical data clustering, which can competitively update distances based on different distributions of attributes in each cluster. In addition, we extend the proposed distance metric to the mixed data that contains both numerical and categorical attributes. Experiments demonstrate the efficacy of the proposed method, i.e., achieving an average ranking of around first in fourteen datasets. The source code is available at https://anonymous.4open.science/r/CADM-47D8/

artificial intelligence, categorical data, machine learning, (15 more...)

arXiv.org Machine Learning

2511.05826

Country:

North America > United States > New York > Broome County > Binghamton (0.04)
Asia > China > Hong Kong (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Europe > Middle East > Malta > Port Region > Southern Harbour District > Floriana (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.94)

Add feedback

Exploratory Analysis of Cyberattack Patterns on E-Commerce Platforms Using Statistical Methods

Adeniya, Fatimo Adenike

arXiv.org Artificial IntelligenceNov-11-2025

Cyberattacks on e-commerce platforms have grown in sophistication, threatening consumer trust and operational continuity. This research presents a hybrid analytical framework that integrates statistical modelling and machine learning for detecting and forecasting cyberattack patterns in the e-commerce domain. Using the Verizon Community Data Breach (VCDB) dataset, the study applies Auto ARIMA for temporal forecasting and significance testing, including a Mann-Whitney U test (U = 2579981.5, p = 0.0121), which confirmed that holiday shopping events experienced significantly more severe cyberattacks than non-holiday periods. ANOVA was also used to examine seasonal variation in threat severity, while ensemble machine learning models (XGBoost, LightGBM, and CatBoost) were employed for predictive classification. Results reveal recurrent attack spikes during high-risk periods such as Black Friday and holiday seasons, with breaches involving Personally Identifiable Information (PII) exhibiting elevated threat indicators. Among the models, CatBoost achieved the highest performance (accuracy = 85.29%, F1 score = 0.2254, ROC AUC = 0.8247). The framework uniquely combines seasonal forecasting with interpretable ensemble learning, enabling temporal risk anticipation and breach-type classification. Ethical considerations, including responsible use of sensitive data and bias assessment, were incorporated. Despite class imbalance and reliance on historical data, the study provides insights for proactive cybersecurity resource allocation and outlines directions for future real-time threat detection research.

data mining, machine learning, pattern recognition, (20 more...)

arXiv.org Artificial Intelligence

2511.0302

Country:

North America > United States > California (0.27)
North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.26)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Information Technology > Services > e-Commerce Services (1.00)
Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.34)

Technology:

Information Technology > e-Commerce (1.00)
Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
(5 more...)

Add feedback

Bivariate Causal Discovery for Categorical Data via Classification with Optimal Label Permutation

Neural Information Processing SystemsAug-14-2025, 13:40:06 GMT

By design, COLP is a parsimonious classifier, which gives rise to a provably identifiable causal model.

causal model, ordinal regression, regression, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas > Brazos County > College Station (0.14)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.05)
Oceania > Australia > Tasmania (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.31)

Add feedback

Fiducial Matching: Differentially Private Inference for Categorical Data

Romanus, Ogonnaya Michael, Boulaguiem, Younes, Molinari, Roberto

arXiv.org Machine LearningJul-17-2025

The task of statistical inference, which includes the building of confidence intervals and tests for parameters and effects of interest to a researcher, is still an open area of investigation in a differentially private (DP) setting. Indeed, in addition to the randomness due to data sampling, DP delivers another source of randomness consisting of the noise added to protect an individual's data from being disclosed to a potential attacker. As a result of this convolution of noises, in many cases it is too complicated to determine the stochastic behavior of the statistics and parameters resulting from a DP procedure. In this work, we contribute to this line of investigation by employing a simulation-based matching approach, solved through tools from the fiducial framework, which aims to replicate the data generation pipeline (including the DP step) and retrieve an approximate distribution of the estimates resulting from this pipeline. For this purpose, we focus on the analysis of categorical (nominal) data that is common in national surveys, for which sensitivity is naturally defined, and on additive privacy mechanisms. We prove the validity of the proposed approach in terms of coverage and highlight its good computational and statistical performance for different inferential tasks in simulated and applied data settings.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Machine Learning

2507.11762

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Alabama > Lee County > Auburn (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.96)
Health & Medicine > Therapeutic Area > Immunology > HIV (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.68)

Add feedback

Filters

Collaborating Authors

categorical data

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

67d96d458abdef21792e6d8e590244e7-Paper.pdf

4620a66570e554a3ff0e39dc59bcb07a-Paper-Conference.pdf

Transferable Adversarial Robustness for Categorical Data via Universal Robust Embeddings

Bivariate Causal Discovery for Categorical Data via Classification with Optimal Label Permutation

Break the Tie: Learning Cluster-Customized Category Relationships for Categorical Data Clustering

CADM: Cluster-customized Adaptive Distance Metric for Categorical Data Clustering

Exploratory Analysis of Cyberattack Patterns on E-Commerce Platforms Using Statistical Methods

67d96d458abdef21792e6d8e590244e7-Paper.pdf

Bivariate Causal Discovery for Categorical Data via Classification with Optimal Label Permutation

Fiducial Matching: Differentially Private Inference for Categorical Data