AITopics | new cluster

Collaborating Authors

new cluster

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ABeyond-Worst-Case Analysis of Greedy k-means + +

Neural Information Processing SystemsJun-20-2026, 10:27:45 GMT

Greedy k-means++ is a generalization of k-means++ where, in each iteration, a new seed is greedily chosen among multiple ℓ 2points sampled, as opposed to a single seed being sampled in k-means++. While empirical studies consistently show the superior performance of greedy k-means++, making it a preferred method in practice, a discrepancy exists between theory and practice. No theoretical justification currently explains this improved performance. Indeed, the prevailing theory suggests that greedy k-means++ exhibits worse performance than k-means++ in worst-case scenarios. This paper presents an analysis demonstrating the outperformance of the greedy algorithm compared to k-means++ for a natural class of well-separated instances with exponentially decaying distributions, such as Gaussian, specifically when ℓ = lnk +Θ(1), a common parameter setting in practical applications.

artificial intelligence, machine learning, probability, (19 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.94)

Add feedback

Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing SystemsFeb-6-2025, 20:14:22 GMT

Review Summary Score: 6, Marginally above the acceptance threshold The proposed method for streaming, distributed inference of DP mixture models presents a nice solution to the cluster identification problem, backed by experiments that are convincing though not rock solid. I'm hesitant to recommend unconditional acceptance, because basic information about how new clusters are created at each minibatch are totally absent, hurting reproducibility. Summary of Paper This paper develops a new algorithm for streaming, distributed variational inference for the DP mixture model, with some supplementary material suggesting how to use these insights for many other BNP models. Using a mean-field approximation, the authors consider how to allow multiple worker nodes to process data batches in parallel and then aggregate these results asynchronously. In particular, the authors offer a new solution to the "component identification" problem: how to find correspondence between new clusters created independently by two separate worker nodes.

author feedback and meta-review, minibatch, new cluster, (13 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Jordan (0.06)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.49)
Information Technology > Artificial Intelligence > Machine Learning (0.31)

Add feedback

Graph Community Augmentation with GMM-based Modeling in Latent Space

Fukushima, Shintaro, Yamanishi, Kenji

arXiv.org Machine LearningDec-2-2024

This study addresses the issue of graph generation with generative models. In particular, we are concerned with graph community augmentation problem, which refers to the problem of generating unseen or unfamiliar graphs with a new community out of the probability distribution estimated with a given graph dataset. The graph community augmentation means that the generated graphs have a new community. There is a chance of discovering an unseen but important structure of graphs with a new community, for example, in a social network such as a purchaser network. Graph community augmentation may also be helpful for generalization of data mining models in a case where it is difficult to collect real graph data enough. In fact, there are many ways to generate a new community in an existing graph. It is desirable to discover a new graph with a new community beyond the given graph while we keep the structure of the original graphs to some extent for the generated graphs to be realistic. To this end, we propose an algorithm called the graph community augmentation (GCA). The key ideas of GCA are (i) to fit Gaussian mixture model (GMM) to data points in the latent space into which the nodes in the original graph are embedded, and (ii) to add data points in the new cluster in the latent space for generating a new community based on the minimum description length (MDL) principle. We empirically demonstrate the effectiveness of GCA for generating graphs with a new community structure on synthetic and real datasets.

dataset, graph, latent space, (17 more...)

arXiv.org Machine Learning

2412.01163

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology (0.48)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
(2 more...)

Add feedback

Evolving Text Data Stream Mining

Kumar, Jay

arXiv.org Artificial IntelligenceAug-15-2024

A text stream is an ordered sequence of text documents generated over time. A massive amount of such text data is generated by online social platforms every day. Designing an algorithm for such text streams to extract useful information is a challenging task due to unique properties of the stream such as infinite length, data sparsity, and evolution. Thereby, learning useful information from such streaming data under the constraint of limited time and memory has gained increasing attention. During the past decade, although many text stream mining algorithms have proposed, there still exists some potential issues. First, high-dimensional text data heavily degrades the learning performance until the model either works on subspace or reduces the global feature space. The second issue is to extract semantic text representation of documents and capture evolving topics over time. Moreover, the problem of label scarcity exists, whereas existing approaches work on the full availability of labeled data. To deal with these issues, in this thesis, new learning models are proposed for clustering and multi-label learning on text streams.

dataset, electronic science and technology, text stream, (14 more...)

arXiv.org Artificial Intelligence

2409.0001

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.13)
(42 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Research Report > Experimental Study (0.92)
Research Report > Promising Solution (0.67)

Industry:

Information Technology (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
(4 more...)

Add feedback

bca82e41ee7b0833588399b1fcd177c7-Reviews.html

Neural Information Processing SystemsMar-13-2024, 20:00:41 GMT

The authors propose a parallel algorithm for the DPMM that parallelizes a RJMCMC sampler that jumps between finite models. While the parallelization and the RJMCMC sampler are proposed together, I will separate them for the purpose of this review, in order to ask questions about each part separately. First, the RJMCMC algorithm (by which I mean, the algorithm we would have on a single cluster). Here, we use a reversible-jump MCMC algorithm to jump between finite-dimensional Dirichlet distributions. As an aside, since \bar{\pi}_{K 1} is not used in the mixture model (the mixture model is defined on the renormalized occupied K components), it would seem to make more sense to define a K-dimensional, rather than a K-1 - dimensional, Dirichlet distribution; this is valid under marginalization properties of the Dirichlet distribution, since equation 10 samples from a distribution proportional to \pi_1 ... \pi_K To jump between model dimensionalities, the authors propose a split/merge RJMCMC step that is reminiscent of that of Green and Richardson.

algorithm, dirichlet distribution, sampler, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.37)

Add feedback

Dynamic Clustering via Asymptotics of the Dependent Dirichlet Process Mixture Miao Liu MIT

Neural Information Processing SystemsMar-13-2024, 16:33:04 GMT

This paper presents a novel algorithm, based upon the dependent Dirichlet process mixture model (DDPMM), for clustering batch-sequential data containing an unknown number of evolving clusters. The algorithm is derived via a lowvariance asymptotic analysis of the Gibbs sampling algorithm for the DDPMM, and provides a hard clustering with convergence guarantees similar to those of the k-means algorithm. Empirical results from a synthetic test with moving Gaussian clusters and a test with real ADS-B aircraft trajectory data demonstrate that the algorithm requires orders of magnitude less computational time than contemporary probabilistic and hard clustering algorithms, while providing higher accuracy on the examined datasets.

algorithm, mixture model, time step, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > North Carolina > Durham County > Durham (0.04)
Asia > Middle East > Jordan (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.37)

Add feedback

Kernel KMeans clustering splits for end-to-end unsupervised decision trees

Ohl, Louis, Mattei, Pierre-Alexandre, Leclercq, Mickaël, Droit, Arnaud, Precioso, Frédéric

arXiv.org Machine LearningFeb-19-2024

Trees are convenient models for obtaining explainable predictions on relatively small datasets. Although there are many proposals for the end-to-end construction of such trees in supervised learning, learning a tree end-to-end for clustering without labels remains an open challenge. As most works focus on interpreting with trees the result of another clustering algorithm, we present here a novel end-to-end trained unsupervised binary tree for clustering: Kauri. This method performs a greedy maximisation of the kernel KMeans objective without requiring the definition of centroids. We compare this model on multiple datasets with recent unsupervised trees and show that Kauri performs identically when using a linear kernel. For other kernels, Kauri often outperforms the concatenation of kernel KMeans and a CART decision tree.

algorithm, dataset, kernel, (17 more...)

arXiv.org Machine Learning

2402.12232

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > El Salvador (0.04)
North America > Canada > Quebec (0.04)
(4 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.89)

Add feedback

Tensor Dirichlet Process Multinomial Mixture Model for Passenger Trajectory Clustering

Li, Ziyue, Yan, Hao, Zhang, Chen, Wang, Andi, Ketter, Wolfgang, Sun, Lijun, Tsung, Fugee

arXiv.org Artificial IntelligenceJun-23-2023

Passenger clustering based on travel records is essential for transportation operators. However, existing methods cannot easily cluster the passengers due to the hierarchical structure of the passenger trip information, namely: each passenger has multiple trips, and each trip contains multi-dimensional multi-mode information. Furthermore, existing approaches rely on an accurate specification of the clustering number to start, which is difficult when millions of commuters are using the transport systems on a daily basis. In this paper, we propose a novel Tensor Dirichlet Process Multinomial Mixture model (Tensor-DPMM), which is designed to preserve the multi-mode and hierarchical structure of the multi-dimensional trip information via tensor, and cluster them in a unified one-step manner. The model also has the ability to determine the number of clusters automatically by using the Dirichlet Process to decide the probabilities for a passenger to be either assigned in an existing cluster or to create a new cluster: This allows our model to grow the clusters as needed in a dynamic manner. Finally, existing methods do not consider spatial semantic graphs such as geographical proximity and functional similarity between the locations, which may cause inaccurate clustering. To this end, we further propose a variant of our model, namely the Tensor-DPMM with Graph. For the algorithm, we propose a tensor Collapsed Gibbs Sampling method, with an innovative step of "disband and relocating", which disbands clusters with too small amount of members and relocates them to the remaining clustering. This avoids uncontrollable growing amounts of clusters. A case study based on Hong Kong metro passenger data is conducted to demonstrate the automatic process of learning the number of clusters, and the learned clusters are better in within-cluster compactness and cross-cluster separateness.

data mining, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2306.13794

Country:

North America > Canada > Quebec > Montreal (0.14)
Africa > Senegal > Kolda Region > Kolda (0.04)
South America > Paraguay > Asunción > Asunción (0.04)
(7 more...)

Genre: Research Report (0.50)

Industry: Transportation > Passenger (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
(3 more...)

Add feedback

Hierarchical Clustering: A Practical Introduction of Agglomerative and Divisive Methods

#artificialintelligenceJan-6-2023, 05:30:49 GMT

In this article, we are going to talk in detail about hierarchical clustering like Why we need hierarchical clustering?, How hierarchical clustering works?, Types of hierarchical clustering?, On which dataset it is applicable? . Before moving forward to hierarchal clustering, we should know why we are talking about hierarchical clustering? even when we have K Means clustering. If you have studied K Means then you know that this algorithm works on the distance to centroid method to create a cluster. Although it works well if you have well defined boundaries type dataset that has less outliers. In above picture, K Means is working well but when we move towards some complex datasets then the problem arises and K Means don't work properly. As you can see in below picture, K Means is failing in making clusters.

artificial intelligence, machine learning, matrix, (18 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

5 Clustering Algorithms Data Scientists Need To Know - The Key Is Always To Understand The Basic Approach Of Any Algorithm You Want To Use – Fly Spaceships With Your Mind

#artificialintelligenceSep-17-2021, 11:45:19 GMT

As a data scientist, you have several basic tools at your disposal, which you can also apply in combination to a data set. More and more complex dependencies are formed. This makes it all the more difficult to recognize these similar properties and to assign the data to so-called clusters in a way that can be evaluated. You have certainly heard of these algorithms and maybe used one or the other, but do you really know what clustering algorithms are? So let's first clarify what these algorithms are in the first place.

algorithm, basic approach, clustering algorithm data scientist, (10 more...)

#artificialintelligence

Industry:

Government > Military > Air Force (0.40)
Aerospace & Defense (0.40)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.88)

Add feedback