AITopics | mixed-type data

Collaborating Authors

mixed-type data

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

e8dbeb1c947a30576c699e7f5c73d3e3-Paper-Conference.pdf

Neural Information Processing SystemsFeb-12-2026, 14:22:34 GMT

However, within this specific application domain, existing VAE methods are restricted by using only one layer of latent variables andstrictly Gaussian posterior approximations.

artificial intelligence, machine learning, pmlr, (15 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
Europe > Spain > Galicia > Madrid (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

Clustering Approaches for Mixed-Type Data: A Comparative Study

Ghattas, Badih, San-Benito, Alvaro Sanchez

arXiv.org Machine LearningNov-26-2025

Clustering is widely used in unsupervised learning to find homogeneous groups of observations within a dataset. However, clustering mixed-type data remains a challenge, as few existing approaches are suited for this task. This study presents the state-of-the-art of these approaches and compares them using various simulation models. The compared methods include the distance-based approaches k-prototypes, PDQ, and convex k-means, and the probabilistic methods KAy-means for MIxed LArge data (KAMILA), the mixture of Bayesian networks (MBNs), and latent class model (LCM). The aim is to provide insights into the behavior of different methods across a wide range of scenarios by varying some experimental factors such as the number of clusters, cluster overlap, sample size, dimension, proportion of continuous variables in the dataset, and clusters' distribution. The degree of cluster overlap and the proportion of continuous variables in the dataset and the sample size have a significant impact on the observed performances. When strong interactions exist between variables alongside an explicit dependence on cluster membership, none of the evaluated methods demonstrated satisfactory performance. In our experiments KAMILA, LCM, and k-prototypes exhibited the best performance, with respect to the adjusted rand index (ARI). All the methods are available in R.

categorical variable, continuous variable, dataset, (14 more...)

arXiv.org Machine Learning

doi: 10.1155/jpas/2242100

2511.19755

Country:

Europe > Austria > Vienna (0.14)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
North America > United States > New Jersey (0.04)

Genre:

Overview (1.00)
Research Report > New Finding (0.66)

Industry: Aerospace & Defense (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

MMM: Clustering Multivariate Longitudinal Mixed-type Data

Amato, Francesco, Jacques, Julien

arXiv.org Machine LearningSep-16-2025

Multivariate longitudinal data of mixed-type are increasingly collected in many science domains. However, algorithms to cluster this kind of data remain scarce, due to the challenge to simultaneously model the within- and between-time dependence structures for multivariate data of mixed kind. We introduce the Mixture of Mixed-Matrices (MMM) model: reorganizing the data in a three-way structure and assuming that the non-continuous variables are observations of underlying latent continuous variables, the model relies on a mixture of matrix-variate normal distributions to perform clustering in the latent dimension. The MMM model is thus able to handle continuous, ordinal, binary, nominal and count data and to concurrently model the heterogeneity, the association among the responses and the temporal dependence structure in a parsimonious way and without assuming conditional independence. The inference is carried out through an MCMC-EM algorithm, which is detailed. An evaluation of the model through synthetic data shows its inference abilities. A real-world application on financial data is presented.

algorithm, journal, longitudinal data, (17 more...)

arXiv.org Machine Learning

2509.12166

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Industry:

Information Technology (1.00)
Health & Medicine (1.00)
Banking & Finance > Trading (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.88)

Add feedback

A Deterministic Information Bottleneck Method for Clustering Mixed-Type Data

Costa, Efthymios, Papatsouma, Ioanna, Markos, Angelos

arXiv.org Machine LearningJul-3-2024

In this paper, we present an information-theoretic method for clustering mixed-type data, that is, data consisting of both continuous and categorical variables. The method is a variant of the Deterministic Information Bottleneck algorithm which optimally compresses the data while retaining relevant information about the underlying structure. We compare the performance of the proposed method to that of three well-established clustering methods (KAMILA, K-Prototypes, and Partitioning Around Medoids with Gower's dissimilarity) on simulated and real-world datasets. The results demonstrate that the proposed approach represents a competitive alternative to conventional clustering techniques under specific conditions.

algorithm, deterministic information bottleneck method, mixed-type data, (9 more...)

arXiv.org Machine Learning

2407.03389

Country:

Europe > United Kingdom > England > Greater London > London (0.05)
North America > United States > California > Orange County > Irvine (0.04)
Europe > Greece (0.04)
Asia (0.04)

Genre: Research Report (0.84)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Towards Generating Adversarial Examples on Mixed-type Data

Xu, Han, Pan, Menghai, Jiang, Zhimeng, Chen, Huiyuan, Li, Xiaoting, Das, Mahashweta, Yang, Hao

arXiv.org Artificial IntelligenceOct-17-2022

The existence of adversarial attacks (or adversarial examples) brings huge concern about the machine learning (ML) model's safety issues. For many safety-critical ML tasks, such as financial forecasting, fraudulent detection, and anomaly detection, the data samples are usually mixed-type, which contain plenty of numerical and categorical features at the same time. However, how to generate adversarial examples with mixed-type data is still seldom studied. In this paper, we propose a novel attack algorithm M-Attack, which can effectively generate adversarial examples in mixed-type data. Based on M-Attack, attackers can attempt to mislead the targeted classification model's prediction, by only slightly perturbing both the numerical and categorical features in the given data samples. More importantly, by adding designed regularizations, our generated adversarial examples can evade potential detection models, which makes the attack indeed insidious. Through extensive empirical studies, we validate the effectiveness and efficiency of our attack method and evaluate the robustness of existing classification models against our proposed attack. The experimental results highlight the feasibility of generating adversarial examples toward machine learning models in real-world applications.

adversarial example, artificial intelligence, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2210.09405

Country:

North America > United States > Texas > Brazos County > College Station (0.14)
North America > United States > Michigan > Ingham County > Lansing (0.04)
North America > United States > Michigan > Ingham County > East Lansing (0.04)
(6 more...)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Who wins the Miss Contest for Imputation Methods? Our Vote for Miss BooPF

Ramosaj, Burim, Pauly, Markus

arXiv.org Machine LearningNov-30-2017

Missing data is an expected issue when large amounts of data is collected, and several imputation techniques have been proposed to tackle this problem. Beneath classical approaches such as MICE, the application of Machine Learning techniques is tempting. Here, the recently proposed missForest imputation method has shown high imputation accuracy under the Missing (Completely) at Random scheme with various missing rates. In its core, it is based on a random forest for classification and regression, respectively. In this paper we study whether this approach can even be enhanced by other methods such as the stochastic gradient tree boosting method, the C5.0 algorithm or modified random forest procedures. In particular, other resampling strategies within the random forest protocol are suggested. In an extensive simulation study, we analyze their performances for continuous, categorical as well as mixed-type data. Therein, MissBooPF, a combination of the stochastic gradient tree boosting method together with the parametrically bootstrapped random forest method, appeared to be promising. Finally, an empirical analysis focusing on credit information and Facebook data is conducted.

artificial intelligence, machine learning, stochastic gradient tree, (16 more...)

arXiv.org Machine Learning

1711.11394

Country: Europe (0.68)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.57)

Add feedback

Causal Inference on Multivariate and Mixed-Type Data

Marx, Alexander, Vreeken, Jilles

arXiv.org Machine LearningOct-16-2017

Given data over the joint distribution of two random variables $X$ and $Y$, we consider the problem of inferring the most likely causal direction between $X$ and $Y$. In particular, we consider the general case where both $X$ and $Y$ may be univariate or multivariate, and of the same or mixed data types. We take an information theoretic approach, based on Kolmogorov complexity, from which it follows that first describing the data over cause and then that of effect given cause is shorter than the reverse direction. The ideal score is not computable, but can be approximated through the Minimum Description Length (MDL) principle. Based on MDL, we propose two scores, one for when both $X$ and $Y$ are of the same single data type, and one for when they are mixed-type. We model dependencies between $X$ and $Y$ using classification and regression trees. As inferring the optimal model is NP-hard, we propose Crack, a fast greedy algorithm to determine the most likely causal direction directly from the data. Empirical evaluation on a wide range of data shows that Crack reliably, and with high accuracy, infers the correct causal direction on both univariate and multivariate cause-effect pairs over both single and mixed-type data.

artificial intelligence, dependency, machine learning, (16 more...)

arXiv.org Machine Learning

1702.06385

Country: Europe > Germany (0.68)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Outlier Detection on Mixed-Type Data: An Energy-based Approach

Do, Kien, Tran, Truyen, Phung, Dinh, Venkatesh, Svetha

arXiv.org Machine LearningAug-16-2016

Outlier detection amounts to finding data points that differ significantly from the norm. Classic outlier detection methods are largely designed for single data type such as continuous or discrete. However, real world data is increasingly heterogeneous, where a data point can have both discrete and continuous attributes. Handling mixed-type data in a disciplined way remains a great challenge. In this paper, we propose a new unsupervised outlier detection method for mixed-type data based on Mixed-variate Restricted Boltzmann Machine (Mv.RBM). The Mv.RBM is a principled probabilistic method that models data density. We propose to use \emph{free-energy} derived from Mv.RBM as outlier score to detect outliers as those data points lying in low density regions. The method is fast to learn and compute, is scalable to massive datasets. At the same time, the outlier score is identical to data negative log-density up-to an additive constant. We evaluate the proposed method on synthetic and real-world datasets and demonstrate that (a) a proper handling mixed-types is necessary in outlier detection, and (b) free-energy of Mv.RBM is a powerful and efficient outlier scoring method, which is highly competitive against state-of-the-arts.

data mining, detection, machine learning, (16 more...)

arXiv.org Machine Learning

1608.0483

Country: North America > United States (0.28)

Genre: Research Report (0.40)

Industry: Health & Medicine (0.47)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.50)

Add feedback

A Generalized Student-t Based Approach to Mixed-Type Anomaly Detection

Lu, Yen-Cheng (Virginia Tech) | Chen, Feng (Carnegie Mellon University) | Chen, Yang (Virginia Tech) | Lu, Chang-Tien (Virginia Tech)

AAAI ConferencesJul-9-2013

Anomaly detection for mixed-type data is an important problem that has not been well addressed in the machine learning field. There are two challenging issues for mixed-type datasets, namely modeling mutual correlations between mixed-type attributes and capturing large variations due to anomalies. This paper presents BuffDetect, a robust error buffering approach for anomaly detection in mixed-type datasets. A new variant of the generalized linear model is proposed to model the dependency between mixed-type attributes. The model incorporates an error buffering component based on Student-t distribution to absorb the variations caused by anomalies. However, because of the non- Gaussian design, the problem becomes analytically intractable. We propose a novel Bayesian inference approach, which integrates Laplace approximation and several computational optimizations, and is able to efficiently approximate the posterior of high dimensional latent variables by iteratively updating the latent variables in groups. Extensive experimental evaluations based on 13 benchmark datasets demonstrate the effectiveness and efficiency of BuffDetect.

anomaly, dataset, detection, (16 more...)

AAAI Conferences

Twenty-Seventh AAAI Conference on Artificial Intelligence

Country:

South America > Paraguay > Asunción > Asunción (0.04)
North America > United States > New York (0.04)
North America > United States > Virginia (0.04)
(3 more...)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.99)

Add feedback