An Improved Deep Learning Model for Word Embeddings Based Clustering for Large Text Datasets

Feb-22-2025–arXiv.org Artificial Intelligence

In this paper, an improved clustering technique for large textual datasets by leveraging fine-tuned word embeddings is presented. WEClustering technique is used as the base model. WEClustering model is fur-ther improvements incorporating fine-tuning contextual embeddings, advanced dimensionality reduction methods, and optimization of clustering algorithms. Experimental results on benchmark datasets demon-strate significant improvements in clustering metrics such as silhouette score, purity, and adjusted rand index (ARI). An increase of 45% and 67% of median silhouette score is reported for the proposed WE-Clustering_K++ (based on K-means) and WEClustering_A++ (based on Agglomerative models), respec-tively. The proposed technique will help to bridge the gap between semantic understanding and statistical robustness for large-scale text-mining tasks.

dataset, metric, representation, (12 more...)

arXiv.org Artificial Intelligence

Feb-22-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - New York (0.05)
- Europe
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
  - Netherlands > North Holland
    - Amsterdam (0.04)
- Asia
  - Vietnam > Long An Province (0.04)
  - India > Karnataka
    - Bengaluru (0.04)

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Statistical Learning > Clustering (1.00)
  - Neural Networks > Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found