GPT-HTree: A Decision Tree Framework Integrating Hierarchical Clustering and Large Language Models for Explainable Classification

Pei, Te, Alican, Fuat, Yin, Aaron Ontoyin, Ihlamur, Yigit

arXiv.org Artificial Intelligence 

Decision trees are fundamental tools in machine learning (ML), prized for their interpretability and simplicity in classification tasks. By providing clear decision paths, they enable users to understand and trust the reasoning behind predictions. However, their effectiveness diminishes when applied to heterogeneous datasets comprising entities with varying characteristics. Uniform decision paths often fail to account for the nuanced differences among diverse segments, leading to oversimplified or misleading classifications. Unsupervised clustering methods, on the other hand, excel in discovering latent structures within complex datasets. These methods, including hierarchical clustering, k-means, and DBSCAN, are powerful tools for segmenting populations into meaningful clusters without requiring predefined labels. While they are effective for uncovering hidden patterns, their primary drawback is a lack of explainability. Clusters produced by unsupervised methods often lack intuitive descriptions or actionable insights, making it difficult to interpret their relevance or apply them in practical decision-making scenarios.