main cluster
Hierarchical topological clustering
Topological methods have the potential of exploring data clouds without making assumptions on their the structure. Here we propose a hierarchical topological clustering algorithm that can be implemented with any distance choice. The persistence of outliers and clusters of arbitrary shape is inferred from the resulting hierarchy. We demonstrate the potential of the algorithm on selected datasets in which outliers play relevant roles, consisting of images, medical and economic data. These methods can provide meaningful clusters in situations in which other techniques fail to do so.
GPT-HTree: A Decision Tree Framework Integrating Hierarchical Clustering and Large Language Models for Explainable Classification
Pei, Te, Alican, Fuat, Yin, Aaron Ontoyin, Ihlamur, Yigit
Decision trees are fundamental tools in machine learning (ML), prized for their interpretability and simplicity in classification tasks. By providing clear decision paths, they enable users to understand and trust the reasoning behind predictions. However, their effectiveness diminishes when applied to heterogeneous datasets comprising entities with varying characteristics. Uniform decision paths often fail to account for the nuanced differences among diverse segments, leading to oversimplified or misleading classifications. Unsupervised clustering methods, on the other hand, excel in discovering latent structures within complex datasets. These methods, including hierarchical clustering, k-means, and DBSCAN, are powerful tools for segmenting populations into meaningful clusters without requiring predefined labels. While they are effective for uncovering hidden patterns, their primary drawback is a lack of explainability. Clusters produced by unsupervised methods often lack intuitive descriptions or actionable insights, making it difficult to interpret their relevance or apply them in practical decision-making scenarios.
- North America > United States > New York (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Information Technology (0.68)
- Banking & Finance (0.48)
A New Robust Multivariate Mode Estimator for Eye-tracking Calibration
Brilhault, Adrien, Neuenschwander, Sergio, Rios, Ricardo Araujo
We propose in this work a new method for estimating the main mode of multivariate distributions, with application to eye-tracking calibrations. When performing eye-tracking experiments with poorly cooperative subjects, such as infants or monkeys, the calibration data generally suffer from high contamination. Outliers are typically organized in clusters, corresponding to the time intervals when subjects were not looking at the calibration points. In this type of multimodal distributions, most central tendency measures fail at estimating the principal fixation coordinates (the first mode), resulting in errors and inaccuracies when mapping the gaze to the screen coordinates. Here, we developed a new algorithm to identify the first mode of multivariate distributions, named BRIL, which rely on recursive depth-based filtering. This novel approach was tested on artificial mixtures of Gaussian and Uniform distributions, and compared to existing methods (conventional depth medians, robust estimators of location and scatter, and clustering-based approaches). We obtained outstanding performances, even for distributions containing very high proportions of outliers, both grouped in clusters and randomly distributed. Finally, we demonstrate the strength of our method in a real-world scenario using experimental data from eye-tracking calibrations with Capuchin monkeys, especially for distributions where other algorithms typically lack accuracy.
- North America > Canada > Ontario > Toronto (0.14)
- South America > Brazil > Rio Grande do Norte > Natal (0.04)
- South America > Brazil > Bahia > Salvador (0.04)
- (6 more...)
- Research Report > New Finding (0.93)
- Research Report > Experimental Study (0.68)