AITopics

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Canada (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Neural Information Processing SystemsDec-23-2025, 21:37:21 GMT

On the Power of Louvain in the Stochastic Block Model

A classic problem in machine learning and data analysis is to partition the vertices of a network in such a way that vertices in the same set are densely connected and vertices in different sets are loosely connected. In practice, the most popular approaches rely on local search algorithms; not only for the ease of implementation and the efficiency, but also because of the accuracy of these methods on many real world graphs. For example, the Louvain algorithm -- a local search based algorithm -- has quickly become the method of choice for clustering in social networks. However, explaining the success of these methods remains an open problem: in the worst-case, the runtime can be up to \Omega(n^2), much worse than what is typically observed in practice, and no guarantee on the quality of its output can be established. The goal of this paper is to shed light on the inner-workings of Louvain; only if we understand Louvain, can we rely on it and further improve it. To achieve this goal, we study the behavior of Louvain in the famous two-bloc Stochastic Block Model, which has a clear ground-truth and serves as the standard testbed for graph clustering algorithms. We provide valuable tools for the analysis of Louvain, but also for many other combinatorial algorithms. For example, we show that the probability for a node to have more edges towards its own community is 1/2 + \Omega( \min( \Delta(p-q)/\sqrt{np},1)) in the SBM(n,p,q), where \Delta is the imbalance. Note that this bound is asymptotically tight and useful for the analysis of a wide range of algorithms (Louvain, Kernighan-Lin, Simulated Annealing etc).

algorithm, louvain, stochastic block model, (9 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.58)

Brisson, Laurent, Bothorel, Cécile, Duminy, Nicolas

DynBenchmark: Customizable Ground Truths to Benchmark Community Detection and Tracking in Temporal Networks

arXiv.org Artificial IntelligenceOct-9-2025

Graph models help understand network dynamics and evolution. Creating graphs with controlled topology and embedded partitions is a common strategy for evaluating community detection algorithms. However, existing benchmarks often overlook the need to track the evolution of communities in real-world networks. To address this, a new community-centered model is proposed to generate customizable evolving community structures where communities can grow, shrink, merge, split, appear or disappear. This benchmark also generates the underlying temporal network, where nodes can appear, disappear, or move between communities. The benchmark has been used to test three methods, measuring their performance in tracking nodes' cluster membership and detecting community evolution. Python libraries, drawing utilities, and validation metrics are provided to compare ground truth with algorithm results for detecting dynamic communities.

algorithm, artificial intelligence, data mining, (14 more...)

2510.06245

Country: North America > United States (0.28)

Genre:

Research Report (0.50)
Workflow (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence (1.00)
Information Technology > Communications (0.94)

Neural Information Processing SystemsOct-2-2025, 13:13:25 GMT

On the Power of Louvain in the Stochastic Block Model Vincent Cohen-Addad

A classic problem in machine learning and data analysis is to partition the vertices of a network in such a way that vertices in the same set are densely connected and vertices in different sets are loosely connected. In practice, the most popular approaches rely on local search algorithms; not only for the ease of implementation and the efficiency, but also because of the accuracy of these methods on many real world graphs. For example, the Louvain algorithm - a local search based algorithm - has quickly become the method of choice for clustering in social networks.

algorithm, artificial intelligence, machine learning, (15 more...)

Country:

Europe (0.93)
North America > United States (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Pham, Ninh, Zheng, Yingtao, Phibbs, Hugo

Scalable Varied-Density Clustering via Graph Propagation

arXiv.org Artificial IntelligenceAug-6-2025

We propose a novel perspective on varied-density clustering for high-dimensional data by framing it as a label propagation process in neighborhood graphs that adapt to local density variations. Our method formally connects density-based clustering with graph connectivity, enabling the use of efficient graph propagation techniques developed in network science. To ensure scalability, we introduce a density-aware neighborhood propagation algorithm and leverage advanced random projection methods to construct approximate neighborhood graphs. Our approach significantly reduces computational cost while preserving clustering quality. Empirically, it scales to datasets with millions of points in minutes and achieves competitive accuracy compared to existing baselines.

artificial intelligence, data mining, machine learning, (19 more...)

2508.02989

Country: Oceania > New Zealand (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Huong, Vu Thi, Koch, Thorsten

Clustering scientific publications: lessons learned through experiments with a real citation network

arXiv.org Artificial IntelligenceMay-27-2025

Clustering scientific publications can reveal underlying research structures within bibliographic databases. Graph-based clustering methods, such as spectral, Louvain, and Leiden algorithms, are frequently utilized due to their capacity to effectively model citation networks. However, their performance may degrade when applied to real-world data. This study evaluates the performance of these clustering algorithms on a citation graph comprising approx. 700,000 papers and 4.6 million citations extracted from Web of Science. The results show that while scalable methods like Louvain and Leiden perform efficiently, their default settings often yield poor partitioning. Meaningful outcomes require careful parameter tuning, especially for large networks with uneven structures, including a dense core and loosely connected papers. These findings highlight practical lessons about the challenges of large-scale data, method selection and tuning based on specific structures of bibliometric clustering tasks.

algorithm, artificial intelligence, machine learning, (16 more...)

2505.1818

Country:

Europe > Netherlands > South Holland > Leiden (0.50)
Europe > Germany (0.15)
Asia > Vietnam (0.15)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.89)

Neural Information Processing SystemsJan-22-2025, 22:40:13 GMT

Review for NeurIPS paper: On the Power of Louvain in the Stochastic Block Model

However, in the main paper you only use the two-block version of it. This should be clarified in the abstract 3) In the abstract you mention "many other combinatorial algorithms".

algorithm, louvain, stochastic block model, (6 more...)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.37)

Neural Information Processing SystemsOct-9-2024, 20:12:23 GMT

On the Power of Louvain in the Stochastic Block Model

A classic problem in machine learning and data analysis is to partition the vertices of a network in such a way that vertices in the same set are densely connected and vertices in different sets are loosely connected. In practice, the most popular approaches rely on local search algorithms; not only for the ease of implementation and the efficiency, but also because of the accuracy of these methods on many real world graphs. For example, the Louvain algorithm -- a local search based algorithm -- has quickly become the method of choice for clustering in social networks. However, explaining the success of these methods remains an open problem: in the worst-case, the runtime can be up to \Omega(n 2), much worse than what is typically observed in practice, and no guarantee on the quality of its output can be established. The goal of this paper is to shed light on the inner-workings of Louvain; only if we understand Louvain, can we rely on it and further improve it.

algorithm, louvain, stochastic block model, (4 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Achitouv, Ixandra, Chavalarias, David, Gaume, Bruno

Testing network clustering algorithms with Natural Language Processing

arXiv.org Artificial IntelligenceJun-24-2024

The advent of online social networks has led to the development of an abundant literature on the study of online social groups and their relationship to individuals' personalities as revealed by their textual productions. Social structures are inferred from a wide range of social interactions. Those interactions form complex -- sometimes multi-layered -- networks, on which community detection algorithms are applied to extract higher order structures. The choice of the community detection algorithm is however hardily questioned in relation with the cultural production of the individual they classify. In this work, we assume the entangled nature of social networks and their cultural production to propose a definition of cultural based online social groups as sets of individuals whose online production can be categorized as social group-related. We take advantage of this apparently self-referential description of online social groups with a hybrid methodology that combines a community detection algorithm and a natural language processing classification algorithm. A key result of this analysis is the possibility to score community detection algorithms using their agreement with the natural language processing classification. A second result is that we can assign the opinion of a random user at >85% accuracy.

category, correspond, tweet, (17 more...)

2406.17135

Country:

North America > United States (1.00)
Europe > France > Île-de-France > Paris > Paris (0.14)
Europe > United Kingdom (0.04)

Genre: Research Report (0.50)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Information Technology > Services (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.64)

arXiv.org Artificial IntelligenceJul-26-2022

An Adaptive Deep Clustering Pipeline to Inform Text Labeling at Scale

Chen, Xinyu, Beaver, Ian

Mining the latent intentions from large volumes of natural language inputs is a key step to help data analysts design and refine Intelligent Virtual Assistants (IVAs) for customer service and sales support. We created a flexible and scalable clustering pipeline within the Verint Intent Manager (VIM) that integrates the fine-tuning of language models, a high performing k-NN library and community detection techniques to help analysts quickly surface and organize relevant user intentions from conversational texts. The fine-tuning step is necessary because pre-trained language models cannot encode texts to efficiently surface particular clustering structures when the target texts are from an unseen domain or the clustering task is not topic detection. We describe the pipeline and demonstrate its performance and ability to scale on three real-world text mining tasks. As deployed in the VIM application, this clustering pipeline produces high quality results, improving the performance of data analysts and reducing the time it takes to surface intentions from customer service data, thereby reducing the time it takes to build and deploy IVAs in new domains.

data mining, machine learning, natural language, (15 more...)