AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Spectral Modification of Graphs for Improved Spectral Clustering

Ioannis Koutis, Huong Le

Neural Information Processing SystemsAug-20-2025, 05:27:09 GMT

This section collects a number of required notions from spectral graph theory and puts spectral modification in perspective with important recent discoveries that inspire it.

graph, proceedings, spectral, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > New Jersey > Essex County > Newark (0.04)
North America > United States > District of Columbia > Washington (0.04)
North America > Canada (0.04)
Europe > Spain > Andalusia > Cádiz Province > Cadiz (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Data Science > Data Mining (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

d98c1545b7619bd99b817cb3169cdfde-Paper.pdf

Neural Information Processing SystemsAug-20-2025, 05:03:50 GMT

artificial intelligence, data mining, machine learning, (16 more...)

Neural Information Processing Systems

Country:

Asia > Afghanistan > Parwan Province > Charikar (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New York > New York County > New York City (0.04)
(4 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Making AI Forget You: Data Deletion in Machine Learning

Antonio Ginart, Melody Guan, Gregory Valiant, James Y. Zou

Neural Information Processing SystemsAug-20-2025, 02:22:30 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, dataset, deletion, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Genre: Research Report (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Law (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Variance Reduction in Bipartite Experiments through Correlation Clustering

Jean Pouget-Abadie, Kevin Aydin, Warren Schudy, Kay Brodersen, Vahab Mirrokni

Neural Information Processing SystemsAug-20-2025, 00:38:13 GMT

Causal inference in randomized experiments typically assumes that the units of randomization and the units of analysis are one and the same. In some applications, however, these two roles are played by distinct entities linked by a bipartite graph. The key challenge in such bipartite settings is how to avoid interference bias, which would typically arise if we simply randomized the treatment at the level of analysis units. One effective way of minimizing interference bias in standard experiments is through cluster randomization, but this design has not been studied in the bipartite setting where conventional clustering schemes can lead to poorly powered experiments. This paper introduces a novel clustering objective and a corresponding algorithm that partitions a bipartite graph so as to maximize the statistical power of a bipartite experiment on that graph. Whereas previous work relied on balanced partitioning, our formulation suggests the use of a correlation clustering objective. We use a publicly-available graph of Amazon user-item reviews to validate our solution and illustrate how it substantially increases the statistical power in bipartite experiments.

diversion unit, experiment, graph, (17 more...)

Neural Information Processing Systems

Country:

Asia > Afghanistan > Parwan Province > Charikar (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Maine (0.04)
(5 more...)

Genre:

Research Report > Strength High (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.95)

Add feedback

Ultrametric Fitting by Gradient Descent

Giovanni Chierchia, Benjamin Perret

Neural Information Processing SystemsAug-20-2025, 00:13:53 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, optimization problem, (19 more...)

Neural Information Processing Systems

Country: North America > United States (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.73)

Add feedback

the liberty to group and reword some of the reviewers comment (in blue italic) to save space. 3 General answer on the usefulness of gradient descent, its theoretical guarantees, and its scalability

Neural Information Processing SystemsAug-20-2025, 00:13:37 GMT

We thank the reviewers for the time they spent evaluating our manuscript and for their valuable comments. We agree that having theoretical guarantees would be a big plus. As for scalability, the bottleneck of our method is the single-linkage algorithm. Similarly to Monath et al. (NeurIPS 2017), our idea consists Given the significant body of additional material, we feel that this topic is best left to a future publication. Line 8,56,70,93: I would suggest a more cautious usage of the word "equivalent".

dasgupta, gradient descent, theoretical guarantee, (14 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.44)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.33)

Add feedback

A Risk Manager for Intrusion Tolerant Systems: Enhancing HAL 9000 with New Scoring and Data Sources

Freitas, Tadeu, Novo, Carlos, Dutra, Inês, Soares, João, Correia, Manuel, Shariati, Benham, Martins, Rolando

arXiv.org Artificial IntelligenceAug-20-2025

Intrusion Tolerant Systems (ITSs) have become increasingly critical due to the rise of multi-domain adversaries exploiting diverse attack surfaces. ITS architectures aim to tolerate intrusions, ensuring system compromise is prevented or mitigated even with adversary presence. Existing ITS solutions often employ Risk Managers leveraging public security intelligence to adjust system defenses dynamically against emerging threats. However, these approaches rely heavily on databases like NVD and ExploitDB, which require manual analysis for newly discovered vulnerabilities. This dependency limits the system's responsiveness to rapidly evolving threats. HAL 9000, an ITS Risk Manager introduced in our prior work, addressed these challenges through machine learning. By analyzing descriptions of known vulnerabilities, HAL 9000 predicts and assesses new vulnerabilities automatically. To calculate the risk of a system, it also incorporates the Exploitability Probability Scoring system to estimate the likelihood of exploitation within 30 days, enhancing proactive defense capabilities. Despite its success, HAL 9000's reliance on NVD and ExploitDB knowledge is a limitation, considering the availability of other sources of information. This extended work introduces a custom-built scraper that continuously mines diverse threat sources, including security advisories, research forums, and real-time exploit proofs-of-concept. This significantly expands HAL 9000's intelligence base, enabling earlier detection and assessment of unverified vulnerabilities. Our evaluation demonstrates that integrating scraper-derived intelligence with HAL 9000's risk management framework substantially improves its ability to address emerging threats. This paper details the scraper's integration into the architecture, its role in providing additional information on new threats, and the effects on HAL 9000's management.

data mining, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2508.13364

Country: North America > United States > Maryland (0.46)

Genre:

Research Report > New Finding (1.00)
Overview (0.93)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
(6 more...)

Add feedback

A Recurrent Neural Network based Clustering Method for Binary Data Sets in Education

Ohira, Mizuki, Saito, Toshimichi

arXiv.org Artificial IntelligenceAug-20-2025

This paper studies an application of a recurrent neural network to clustering method for the S-P chart: a binary data set used widely in education. As the number of students increases, the S-P chart becomes hard to handle. In order to classify the large chart into smaller charts, we present a simple clustering method based on the network dynamics. In the method, the network has multiple fixed points and basins of attraction give clusters corresponding to small S-P charts. In order to evaluate the clustering performance, we present an important feature quantity: average caution index that characterizes singularity of students answer oatterns. Performing fundamental experiments, effectiveness of the method is confirmed.

artificial intelligence, machine learning, student, (17 more...)

arXiv.org Artificial Intelligence

2508.13224

Country: Asia > Japan > Honshū (0.15)

Genre: Research Report (0.70)

Industry: Education (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Understanding Distribution Structure on Calibrated Recommendation Systems

da Silva, Diego Correa, Boaventura, Denis Robson Dantas, Oliveira, Mayki dos Santos, da Silva, Eduardo Ferreira, Pires, Joel Machado, Durão, Frederico Araújo

arXiv.org Artificial IntelligenceAug-20-2025

--Traditional recommender systems aim to generate a recommendation list comprising the most relevant or similar items to the user's profile. These approaches can create recommendation lists that omit item genres from the less prominent areas of a user's profile, thereby undermining the user's experience. T o solve this problem, the calibrated recommendation system provides a guarantee of including less representative areas in the recommended list. The calibrated context works with three distributions. The first is from the user's profile, the second is from the candidate items, and the last is from the recommendation list. These distributions are G-dimensional, where G is the total number of genres in the system. This high dimensionality requires a different evaluation method, considering that traditional recommenders operate in a one-dimensional data space. In this sense, we implement fifteen models that help to understand how these distributions are structured. We evaluate the users' patterns in three datasets from the movie domain. The results indicate that the models of outlier detection provide a better understanding of the structures. The calibrated system creates recommendation lists that act similarly to traditional recommendation lists, allowing users to change their groups of preferences to the same degree. Commonly, traditional recommender systems generate recommendations with miscalibration [1]. Miscalibration means that the recommendation lists do not follow the user preferences distribution, instead suggesting items from user's dominant area of interest. It creates an overspecialized recommendation list in which the items from the less dominant area are overwhelmed. This effect puts the user in a filter bubble or an echo chamber problem [2]. For instance, when a specific area dominates the recommended list, the user likely has few other options to interact with, aside from items within that dominant area. Then, the subsequent lists are recommended, with the dominant area becoming more overspecialized. In recent years, calibrated recommendation systems have attracted attention [3]-[8] from the recommender system community to overcome this issue. This type of system demonstrates the capacity to improve several objectives, such as diversity [3], control of popularity bias [4], item coverage [5], precision [6], and the reduction of miscalibration [7]. To illustrate how calibrated recommendation works, consider a scenario: if a user's preferences distribution indicates Corresponding author is Diego Corr ˆ ea da Silva.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2508.13568

Country: South America > Brazil (0.28)

Genre: Research Report > New Finding (1.00)

Industry: