AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Improving Emotion Recognition Accuracy with Personalized Clustering

Gutierrez-Martin, Laura, Ongil, Celia Lopez, Lanza-Gutierrez, Jose M., Calero, Jose A. Miranda

arXiv.org Artificial IntelligenceSep-23-2024

Emotion recognition through artificial intelligence and smart sensing of physical and physiological signals (Affective Computing) is achieving very interesting results in terms of accuracy, inference times, and user-independent models. In this sense, there are applications related to the safety and well-being of people (sexual aggressions, gender-based violence, children and elderly abuse, mental health, etc.) that require even more improvements. Emotion detection should be done with fast, discrete, and non-luxurious systems working in real-time and real life (wearable devices, wireless communications, battery-powered). Furthermore, emotional reactions to violence are not equal in all people. Then, large general models cannot be applied to a multiuser system for people protection, and customized and simple AI models would be welcomed by health and social workers and law enforcement agents. These customized models will be applicable to clusters of subjects sharing similarities in their emotional reactions to external stimuli. This customization requires several steps: creating clusters of subjects with similar behaviors, creating AI models for every cluster, continually updating these models with new data, and enrolling new subjects in clusters when required. A methodology for clustering data compiled (physical and physiological data, together with emotional labels) is presented in this work, as well as the method for including new subjects once the AI model is generated. Experimental results demonstrate an improvement of 4% in accuracy and 3% in f1-score w.r.t. the general model, along with a 14% reduction in variability.

artificial intelligence, emotion, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2410.03696

Country:

Europe > Spain > Galicia > Madrid (0.05)
Europe > Switzerland > Vaud > Lausanne (0.04)
Europe > Montenegro (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.88)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science > Emotion (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Qualitative Insights Tool (QualIT): LLM Enhanced Topic Modeling

Kapoor, Satya, Gil, Alex, Bhaduri, Sreyoshi, Mittal, Anshul, Mulkar, Rutu

arXiv.org Artificial IntelligenceSep-23-2024

Topic modeling is a widely used technique for uncovering thematic structures from large text corpora. However, most topic modeling approaches e.g. Latent Dirichlet Allocation (LDA) struggle to capture nuanced semantics and contextual understanding required to accurately model complex narratives. Recent advancements in this area include methods like BERTopic, which have demonstrated significantly improved topic coherence and thus established a new standard for benchmarking. In this paper, we present a novel approach, the Qualitative Insights Tool (QualIT) that integrates large language models (LLMs) with existing clustering-based topic modeling approaches. Our method leverages the deep contextual understanding and powerful language generation capabilities of LLMs to enrich the topic modeling process using clustering. We evaluate our approach on a large corpus of news articles and demonstrate substantial improvements in topic coherence and topic diversity compared to baseline topic modeling techniques. On the 20 ground-truth topics, our method shows 70% topic coherence (vs 65% & 57% benchmarks) and 95.5% topic diversity (vs 85% & 72% benchmarks). Our findings suggest that the integration of LLMs can unlock new opportunities for topic modeling of dynamic and complex text data, as is common in talent management research contexts.

bertopic, llm enhanced topic modeling, qualitative insight tool, (8 more...)

arXiv.org Artificial Intelligence

2409.15626

Country:

North America > United States > Virginia > Arlington County > Arlington (0.05)
Asia > Middle East > Jordan (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(2 more...)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)

Add feedback

Reducing the Footprint of Multi-Vector Retrieval with Minimal Performance Impact via Token Pooling

Clavié, Benjamin, Chaffin, Antoine, Adams, Griffin

arXiv.org Artificial IntelligenceSep-22-2024

Over the last few years, multi-vector retrieval methods, spearheaded by ColBERT, have become an increasingly popular approach to Neural IR. By storing representations at the token level rather than at the document level, these methods have demonstrated very strong retrieval performance, especially in out-of-domain settings. However, the storage and memory requirements necessary to store the large number of associated vectors remain an important drawback, hindering practical adoption. In this paper, we introduce a simple clustering-based token pooling approach to aggressively reduce the number of vectors that need to be stored. This method can reduce the space & memory footprint of ColBERT indexes by 50% with virtually no retrieval performance degradation. This method also allows for further reductions, reducing the vector count by 66%-to-75% , with degradation remaining below 5% on a vast majority of datasets. Importantly, this approach requires no architectural change nor query-time processing, and can be used as a simple drop-in during indexation with any ColBERT-like model.

dataset, degradation, vector, (14 more...)

arXiv.org Artificial Intelligence

2409.14683

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Alameda County > Oakland (0.04)
Europe > Greece > Central Macedonia > Thessaloniki (0.04)
(3 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.95)
Information Technology > Information Management (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

A High-Performance External Validity Index for Clustering with a Large Number of Clusters

Karbasian, Mohammad Yasin, Javadi, Ramin

arXiv.org Artificial IntelligenceSep-22-2024

This paper introduces the Stable Matching Based Pairing (SMBP) algorithm, a high-performance external validity index for clustering evaluation in large-scale datasets with a large number of clusters. SMBP leverages the stable matching framework to pair clusters across different clustering methods, significantly reducing computational complexity to $O(N^2)$, compared to traditional Maximum Weighted Matching (MWM) with $O(N^3)$ complexity. Through comprehensive evaluations on real-world and synthetic datasets, SMBP demonstrates comparable accuracy to MWM and superior computational efficiency. It is particularly effective for balanced, unbalanced, and large-scale datasets with a large number of clusters, making it a scalable and practical solution for modern clustering tasks. Additionally, SMBP is easily implementable within machine learning frameworks like PyTorch and TensorFlow, offering a robust tool for big data applications. The algorithm is validated through extensive experiments, showcasing its potential as a powerful alternative to existing methods such as Maximum Match Measure (MMM) and Centroid Ratio (CR).

dataset, matching, stable matching, (11 more...)

arXiv.org Artificial Intelligence

2409.14455

Country:

Asia > Middle East > Iran > Isfahan Province > Isfahan (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Michigan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

VLM-Vac: Enhancing Smart Vacuums through VLM Knowledge Distillation and Language-Guided Experience Replay

Mirjalili, Reihaneh, Krawez, Michael, Walter, Florian, Burgard, Wolfram

arXiv.org Artificial IntelligenceSep-21-2024

In this paper, we propose VLM-Vac, a novel framework designed to enhance the autonomy of smart robot vacuum cleaners. Our approach integrates the zero-shot object detection capabilities of a Vision-Language Model (VLM) with a Knowledge Distillation (KD) strategy. By leveraging the VLM, the robot can categorize objects into actionable classes -- either to avoid or to suck -- across diverse backgrounds. However, frequently querying the VLM is computationally expensive and impractical for real-world deployment. To address this issue, we implement a KD process that gradually transfers the essential knowledge of the VLM to a smaller, more efficient model. Our real-world experiments demonstrate that this smaller model progressively learns from the VLM and requires significantly fewer queries over time. Additionally, we tackle the challenge of continual learning in dynamic home environments by exploiting a novel experience replay method based on language-guided sampling. Our results show that this approach is not only energy-efficient but also surpasses conventional vision-based clustering methods, particularly in detecting small objects across diverse backgrounds.

detection, learning, robot, (15 more...)

arXiv.org Artificial Intelligence

2409.14096

Country: Europe > Germany > Bavaria > Middle Franconia > Nuremberg (0.04)

Genre: Research Report > New Finding (0.86)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
(3 more...)

Add feedback

Quantum enhanced stratification of Breast Cancer: exploring quantum expressivity for real omics data

Repetto, Valeria, Ceroni, Elia Giuseppe, Buonaiuto, Giuseppe, D'Aurizio, Romina

arXiv.org Artificial IntelligenceSep-21-2024

Quantum Machine Learning (QML) is considered one of the most promising applications of Quantum Computing in the Noisy Intermediate Scale Quantum (NISQ) era for the impact it is thought to have in the near future. Although promising theoretical assumptions, the exploration of how QML could foster new discoveries in Medicine and Biology fields is still in its infancy with few examples. In this study, we aimed to assess whether Quantum Kernels (QK) could effectively classify subtypes of Breast Cancer (BC) patients on the basis of molecular characteristics. We performed an heuristic exploration of encoding configurations with different entanglement levels to determine a trade-off between kernel expressivity and performances. Our results show that QKs yield comparable clustering results with classical methods while using fewer data points, and are able to fit the data with a higher number of clusters. Additionally, we conducted the experiments on the Quantum Processing Unit (QPU) to evaluate the effect of noise on the outcome. We found that less expressive encodings showed a higher resilience to noise, indicating that the computational pipeline can be reliably implemented on the NISQ devices. Our findings suggest that QK methods show promises for application in Precision Oncology, especially in scenarios where the dataset is limited in size and a granular non-trivial stratification of complex molecular data cannot be achieved classically.

feature map, quantum, stratification, (16 more...)

arXiv.org Artificial Intelligence

2409.14089

Country:

Europe > Italy > Piedmont > Turin Province > Turin (0.14)
Asia > India > Maharashtra > Mumbai (0.05)
Europe > Italy > Tuscany > Pisa Province > Pisa (0.04)
Europe > Italy > Campania > Naples (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (0.71)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

Add feedback

A New Perspective on ADHD Research: Knowledge Graph Construction with LLMs and Network Based Insights

Otal, Hakan T., Faraone, Stephen V., Canbaz, M. Abdullah

arXiv.org Artificial IntelligenceSep-19-2024

To explore how we can gain deeper insights on this topic, we performed a network analysis on a comprehensive knowledge graph (KG) of ADHD, constructed by integrating scientific literature and clinical data with the help of cutting-edge large language models. The analysis, including k-core techniques, identified critical nodes and relationships that are central to understanding the disorder. Building on these findings, we developed a context-aware chatbot using Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG), enabling accurate and informed interactions. Our knowledge graph not only advances the understanding of ADHD but also provides a powerful tool for research and clinical applications.

adhd, graph, knowledge graph, (15 more...)

arXiv.org Artificial Intelligence

2409.12853

Country:

North America > United States (0.28)
Europe > Netherlands > South Holland > Leiden (0.05)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Therapeutic Area > Neurology > Attention Deficit/Hyperactivity Disorder (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Government (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)

Add feedback

Hybrid Ensemble Deep Graph Temporal Clustering for Spatiotemporal Data

Nji, Francis Ndikum, Faruque, Omar, Cham, Mostafa, Vandana, Janeja, Wang, Jianwu

arXiv.org Artificial IntelligenceSep-19-2024

Classifying subsets based on spatial and temporal features is crucial to the analysis of spatiotemporal data given the inherent spatial and temporal variability. Since no single clustering algorithm ensures optimal results, researchers have increasingly explored the effectiveness of ensemble approaches. Ensemble clustering has attracted much attention due to increased diversity, better generalization, and overall improved clustering performance. While ensemble clustering may yield promising results on simple datasets, it has not been fully explored on complex multivariate spatiotemporal data. For our contribution to this field, we propose a novel hybrid ensemble deep graph temporal clustering (HEDGTC) method for multivariate spatiotemporal data. HEDGTC integrates homogeneous and heterogeneous ensemble methods and adopts a dual consensus approach to address noise and misclassification from traditional clustering. It further applies a graph attention autoencoder network to improve clustering performance and stability. When evaluated on three real-world multivariate spatiotemporal data, HEDGTC outperforms state-of-the-art ensemble clustering models by showing improved performance and stability with consistent results. This indicates that HEDGTC can effectively capture implicit temporal patterns in complex spatiotemporal data.

algorithm, ensemble, matrix, (14 more...)

arXiv.org Artificial Intelligence

2409.1259

Country:

North America > United States > Maryland > Baltimore County (0.14)
North America > United States > Maryland > Baltimore (0.14)
Asia > Singapore (0.04)
(2 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Counterfactual Explanations for Clustering Models

Spagnol, Aurora, Sokol, Kacper, Barbiero, Pietro, Langheinrich, Marc, Gjoreski, Martin

arXiv.org Artificial IntelligenceSep-19-2024

Clustering algorithms rely on complex optimisation processes that may be difficult to comprehend, especially for individuals who lack technical expertise. While many explainable artificial intelligence techniques exist for supervised machine learning, unsupervised learning -- and clustering in particular -- has been largely neglected. To complicate matters further, the notion of a ``true'' cluster is inherently challenging to define. These facets of unsupervised learning and its explainability make it difficult to foster trust in such methods and curtail their adoption. To address these challenges, we propose a new, model-agnostic technique for explaining clustering algorithms with counterfactual statements. Our approach relies on a novel soft-scoring method that captures the spatial information utilised by clustering models. It builds upon a state-of-the-art Bayesian counterfactual generator for supervised learning to deliver high-quality explanations. We evaluate its performance on five datasets and two clustering algorithms, and demonstrate that introducing soft scores to guide counterfactual search significantly improves the results.

algorithm, counterfactual, dataset, (17 more...)

arXiv.org Artificial Intelligence

2409.12632

Country:

North America > Canada > Alberta (0.14)
Europe > Austria > Vienna (0.14)
Europe > Switzerland > Zürich > Zürich (0.04)
(2 more...)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Sustainable Visions: Unsupervised Machine Learning Insights on Global Development Goals

García-Rodríguez, Alberto, Núñez, Matias, Pérez, Miguel Robles, Govezensky, Tzipe, Barrio, Rafael A., Gershenson, Carlos, Kaski, Kimmo K., Tagüeña, Julia

arXiv.org Artificial IntelligenceSep-18-2024

The United Nations 2030 Agenda for Sustainable Development outlines 17 goals to address global challenges. However, progress has been slower than expected and, consequently, there is a need to investigate the reasons behind this fact. In this study, we used a novel data-driven methodology to analyze data from 107 countries (2000$-$2022) using unsupervised machine learning techniques. Our analysis reveals strong positive and negative correlations between certain SDGs. The findings show that progress toward the SDGs is heavily influenced by geographical, cultural and socioeconomic factors, with no country on track to achieve all goals by 2030. This highlights the need for a region specific, systemic approach to sustainable development that acknowledges the complex interdependencies of the goals and the diverse capacities of nations. Our approach provides a robust framework for developing efficient and data-informed strategies, to promote cooperative and targeted initiatives for sustainable progress.

artificial intelligence, machine learning, sdg, (17 more...)

arXiv.org Artificial Intelligence

2409.12427

Country:

South America > Uruguay (0.04)
North America > Mexico > Mexico City > Coyoacan (0.04)
North America > Haiti (0.04)
(101 more...)

Genre: Research Report > New Finding (1.00)

Industry: Government (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)

Add feedback