AITopics | mixed data

Collaborating Authors

mixed data

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

PretopoMD: Pretopology-based Mixed Data Hierarchical Clustering

Levy, Loup-Noe, Guerard, Guillaume, Djebali, Sonia, Amor, Soufian Ben

arXiv.org Artificial IntelligenceDec-4-2025

This article presents a novel pretopology-based algorithm designed to address the challenges of clustering mixed data without the need for dimensionality reduction. Leveraging Disjunctive Normal Form, our approach formulates customizable logical rules and adjustable hyperparameters that allow for user-defined hierarchical cluster construction and facilitate tailored solutions for heterogeneous datasets. Through hierarchical dendrogram analysis and comparative clustering metrics, our method demonstrates superior performance by accurately and interpretably delineating clusters directly from raw data, thus preserving data integrity. Empirical findings highlight the algorithm's robustness in constructing meaningful clusters and reveal its potential in overcoming issues related to clustered data explainability. The novelty of this work lies in its departure from traditional dimensionality reduction techniques and its innovative use of logical rules that enhance both cluster formation and clarity, thereby contributing a significant advancement to the discourse on clustering mixed data.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/s10489-025-06770-1

2512.03071

Country:

Europe (0.28)
Asia (0.28)

Genre:

Research Report (0.82)
Overview (0.67)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Mixed Data Clustering Survey and Challenges

Guerard, Guillaume, Djebali, Sonia

arXiv.org Artificial IntelligenceDec-4-2025

This paradigm challenges traditional data management and analysis techniques by demanding innovative solutions capable of processing, analyzing, and deriving insights from vast and diverse datasets. In particular, the inclusion of mixed data types, such as numerical and categorical variables, poses significant challenges to conventional methodologies, necessitating the development of novel approaches to effectively leverage the wealth of information available [2]. Traditionally, data handling methods were designed around homogeneous datasets, typically consisting of numerical values. However, the big data paradigm introduces a multitude of data types, including structured, unstructured, and semi-structured data, which demand a departure from traditional approaches. Moreover, the three primary characteristics of big data--volume, velocity, and variety--amplify the complexity of data analysis, requiring scalable and adaptable solutions capable of processing large volumes of data at high speeds while accommodating diverse data formats and structures. These methods for handling mixed data often involve separate analyses of categorical and numerical variables, treating them as distinct entities rather than integrating their interdependencies.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/s42979-025-04439-7

2512.0307

Country: Asia (0.28)

Genre:

Research Report > Promising Solution (1.00)
Overview (1.00)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Modeling Psychological Profiles in Volleyball via Mixed-Type Bayesian Networks

Iannario, Maria, Lee, Dae-Jin, Leonelli, Manuele

arXiv.org Artificial IntelligenceSep-29-2025

Psychological attributes rarely operate in isolation: coaches reason about networks of related traits. We analyze a new dataset of 164 female volleyball players from Italy's C and D leagues that combines standardized psychological profiling with background information. To learn directed relationships among mixed-type variables (ordinal questionnaire scores, categorical demographics, continuous indicators), we introduce latent MMHC, a hybrid structure learner that couples a latent Gaussian copula and a constraint-based skeleton with a constrained score-based refinement to return a single DAG. We also study a bootstrap-aggregated variant for stability. In simulations spanning sample size, sparsity, and dimension, latent Max-Min Hill-Climbing (MMHC) attains lower structural Hamming distance and higher edge recall than recent copula-based learners while maintaining high specificity. Applied to volleyball, the learned network organizes mental skills around goal setting and self-confidence, with emotional arousal linking motivation and anxiety, and locates Big-Five traits (notably neuroticism and extraversion) upstream of skill clusters. Scenario analyses quantify how improvements in specific skills propagate through the network to shift preparation, confidence, and self-esteem. The approach provides an interpretable, data-driven framework for profiling psychological traits in sport and for decision support in athlete development.

artificial intelligence, latent mmhc, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2509.22111

Country: Europe > Italy (0.34)

Genre: Research Report > New Finding (0.67)

Industry:

Leisure & Entertainment > Sports (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.50)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Identity Disorder (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

8a7e7f5ed2aee24e98d65b5efdde8e1f-Paper-Conference.pdf

Neural Information Processing SystemsAug-16-2025, 19:02:08 GMT

data mining, imputation, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > France (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)
Information Technology > Data Science > Data Mining (0.94)

Add feedback

Density Ratio-based Causal Discovery from Bivariate Continuous-Discrete Data

Maeda, Takashi Nicholas, Shimizu, Shohei, Matsui, Hidetoshi

arXiv.org Machine LearningMay-19-2025

This paper proposes a causal discovery method for mixed bivariate data consisting of one continuous and one discrete variable. Existing constraint-based approaches are ineffective in the bivariate setting, as they rely on conditional independence tests that are not suited to bivariate data. Score-based methods either impose strong distributional assumptions or face challenges in fairly comparing causal directions between variables of different types, due to differences in their information content. We introduce a novel approach that determines causal direction by analyzing the monotonicity of the conditional density ratio of the continuous variable, conditioned on different values of the discrete variable. Our theoretical analysis shows that the conditional density ratio exhibits monotonicity when the continuous variable causes the discrete variable, but not in the reverse direction. This property provides a principled basis for comparing causal directions between variables of different types, free from strong distributional assumptions and bias arising from differences in their information content. We demonstrate its effectiveness through experiments on both synthetic and real-world datasets, showing superior accuracy compared to existing methods.

artificial intelligence, causal relationship, machine learning, (15 more...)

arXiv.org Machine Learning

2505.08371

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States > New York > New York County > New York City (0.04)
Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.04)
(4 more...)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.68)
Health & Medicine > Therapeutic Area > Endocrinology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.46)

Add feedback

Spectral Clustering of Categorical and Mixed-type Data via Extra Graph Nodes

Soemitro, Dylan, Neto, Jeova Farias Sales Rocha

arXiv.org Machine LearningMar-8-2024

Clustering data objects into homogeneous groups is one of the most important tasks in data mining. Spectral clustering is arguably one of the most important algorithms for clustering, as it is appealing for its theoretical soundness and is adaptable to many real-world data settings. For example, mixed data, where the data is composed of numerical and categorical features, is typically handled via numerical discretization, dummy coding, or similarity computation that takes into account both data types. This paper explores a more natural way to incorporate both numerical and categorical information into the spectral clustering algorithm, avoiding the need for data preprocessing or the use of sophisticated similarity functions. We propose adding extra nodes corresponding to the different categories the data may belong to and show that it leads to an interpretable clustering objective function. Furthermore, we demonstrate that this simple framework leads to a linear-time spectral clustering algorithm for categorical-only data. Finally, we compare the performance of our algorithms against other related methods and show that it provides a competitive alternative to them in terms of performance and runtime.

algorithm, categorical data, category, (15 more...)

arXiv.org Machine Learning

2403.05669

Country:

South America > Paraguay > Asunción > Asunción (0.04)
Asia > Vietnam (0.04)
Asia > China > Shaanxi Province > Xi'an (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Explainable Machine Learning for Categorical and Mixed Data with Lossless Visualization

Kovalerchuk, Boris, McCoy, Elijah

arXiv.org Artificial IntelligenceNov-22-2023

Building accurate and interpretable Machine Learning (ML) models for heterogeneous/mixed data is a long-standing challenge for algorithms designed for numeric data. This work focuses on developing numeric coding schemes for non-numeric attributes for ML algorithms to support accurate and explainable ML models, methods for lossless visualization of n-D non-numeric categorical data with visual rule discovery in these visualizations, and accurate and explainable ML models for categorical data. This study proposes a classification of mixed data types and analyzes their important role in Machine Learning. It presents a toolkit for enforcing interpretability of all internal operations of ML algorithms on mixed data with a visual data exploration on mixed data. A new Sequential Rule Generation (SRG) algorithm for explainable rule generation with categorical data is proposed and successfully evaluated in multiple computational experiments. This work is one of the steps to the full scope ML algorithms for mixed data supported by lossless visualization of n-D data in General Line Coordinates beyond Parallel Coordinates.

algorithm, data type, visualization, (17 more...)

arXiv.org Artificial Intelligence

2305.18437

Country:

North America > United States > California > San Diego County > Carlsbad (0.04)
North America > United States > California > Orange County > Irvine (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)

Add feedback

Conditional Feature Importance for Mixed Data

Blesch, Kristin, Watson, David S., Wright, Marvin N.

arXiv.org Artificial IntelligenceMay-2-2023

Despite the popularity of feature importance (FI) measures in interpretable machine learning, the statistical adequacy of these methods is rarely discussed. From a statistical perspective, a major distinction is between analyzing a variable's importance before and after adjusting for covariates - i.e., between $\textit{marginal}$ and $\textit{conditional}$ measures. Our work draws attention to this rarely acknowledged, yet crucial distinction and showcases its implications. Further, we reveal that for testing conditional FI, only few methods are available and practitioners have hitherto been severely restricted in method application due to mismatching data requirements. Most real-world data exhibits complex feature dependencies and incorporates both continuous and categorical data (mixed data). Both properties are oftentimes neglected by conditional FI measures. To fill this gap, we propose to combine the conditional predictive impact (CPI) framework with sequential knockoff sampling. The CPI enables conditional FI measurement that controls for any feature dependencies by sampling valid knockoffs - hence, generating synthetic data with similar statistical properties - for the data to be analyzed. Sequential knockoffs were deliberately designed to handle mixed data and thus allow us to extend the CPI approach to such datasets. We demonstrate through numerous simulations and a real-world example that our proposed workflow controls type I error, achieves high power and is in line with results given by other conditional FI measures, whereas marginal FI metrics result in misleading interpretations. Our findings highlight the necessity of developing statistically adequate, specialized methods for mixed data.

conditional feature importance, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/s10182-023-00477-9

2210.03047

Country:

Europe > Germany > Bremen > Bremen (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > Denmark > Capital Region > Copenhagen (0.04)
(4 more...)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (1.00)
(4 more...)

Add feedback

Model Based Co-clustering of Mixed Numerical and Binary Data

Bouchareb, Aichetou, Boullé, Marc, Clérot, Fabrice, Rossi, Fabrice

arXiv.org Artificial IntelligenceDec-22-2022

The goal of co-clustering is to jointly perform a clustering of rows and a clustering of columns of a data table. Proposed by [Good, 1965] then by [Hartigan, 1975], co-clustering is an extension of the standard clustering that extracts the underlying structure in the data in the form of clusters of row and clusters of columns. The advantage of this technique, over the standard clustering, lies in the joint (simultaneous) analysis of the rows and columns which enables extracting the maximum of information about the interdependence between the two entities. The utility of co-clustering lies in its capacity to create easily interpretable clusters and its capability to reduce a large data table into a significantly smaller matrix having the same structure as the orig-Aichetou Bouchareb, Marc Boullé and Fabrice Clérot: Orange Labs, 2 Avenue Pierre Marzin 22300 Lannion - France, e-mail: firstname.

artificial intelligence, data mining, machine learning, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-030-18129-1_1

2212.11725

Country:

Europe > France > Île-de-France > Paris > Paris (0.14)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Data Science > Data Mining (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)

Add feedback

Why We Should Be Careful When Developing AI

#artificialintelligenceAug-23-2022, 23:31:13 GMT

Artificial intelligence offers a lot of advantages for organisations by creating better and more efficient organisations, improving customer services with conversational AI and reducing a wide variety of risks in different industries. Although we are only at the start of the AI revolution, we can already see that artificial intelligence will have a profound effect on our lives, both positively and negatively. The financial impact of AI on the global economy is estimated to reach US$15.7 trillion by 2030, with 40% of jobs expected to be lost due to artificial intelligence, and global venture capital investment in AI is growing to greater than US$27 billion in 2018. Such estimates of AI potential relate to a broad understanding of its nature and applicability. AI will eventually consist of entirely novel and unrecognisable forms of intelligence, and we can see the first signals of this in the rapid developments of AI. In 2017, Google's Deepmind developed AlphaGo Zero, an AI agent that learned the abstract strategy board game Go with a far more expansive range of moves than chess.

algorithm, artificial intelligence, ethics, (13 more...)

#artificialintelligence

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China (0.04)

Industry: Leisure & Entertainment > Games > Go (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback