AITopics

2006.07115

Country:

Europe > United Kingdom (0.04)
Europe > Portugal > Porto > Porto (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Genre: Research Report > Promising Solution (0.48)

Industry:

Government (1.00)
Energy > Renewable (1.00)
Energy > Power Industry > Utilities (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Rigon, Tommaso, Herring, Amy H., Dunson, David B.

A generalized Bayes framework for probabilistic clustering

arXiv.org Machine LearningJun-9-2020

Loss-based clustering methods, such as k-means and its variants, are standard tools for finding groups in data. However, the lack of quantification of uncertainty in the estimated clusters is a disadvantage. Model-based clustering based on mixture models provides an alternative, but such methods face computational problems and large sensitivity to the choice of kernel. This article proposes a generalized Bayes framework that bridges between these two paradigms through the use of Gibbs posteriors. In conducting Bayesian updating, the log likelihood is replaced by a loss function for clustering, leading to a rich family of clustering methods. The Gibbs posterior represents a coherent updating of Bayesian beliefs without needing to specify a likelihood for the data, and can be used for characterizing uncertainty in clustering. We consider losses based on Bregman divergence and pairwise similarities, and develop efficient deterministic algorithms for point estimation along with sampling algorithms for uncertainty quantification. Several existing clustering algorithms, including k-means, can be interpreted as generalized Bayes estimators under our framework, and hence we provide a method of uncertainty quantification for these approaches.

algorithm, probability, uncertainty quantification, (15 more...)

2006.05451

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)

Genre: Research Report (0.63)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Biabiany, Emmanuel, Page, Vincent, Bernard, Didier, Paugam-Moisy, Hélène

Using an expert deviation carrying the knowledge of climate data in usual clustering algorithms

arXiv.org Machine LearningJun-9-2020

In order to help physicists to expand their knowledge of the climate in the Lesser Antilles, we aim to identify the spatio-temporal configurations using clustering analysis on wind speed and cumulative rainfall datasets. But we show that using the L2 norm in conventional clustering methods as K-Means (KMS) and Hierarchical Agglomerative Clustering (HAC) can induce undesirable effects. So, we propose to replace Euclidean distance (L2) by a dissimilarity measure named Expert Deviation (ED). Based on the symmetrized Kullback-Leibler divergence, the ED integrates the properties of the observed physical parameters and climate knowledge. This measure helps comparing histograms of four patches, corresponding to geographical zones, that are influenced by atmospheric structures. The combined evaluation of the internal homogeneity and the separation of the clusters obtained using ED and L2 was performed. The results, which are compared using the silhouette index, show five clusters with high indexes. For the two available datasets one can see that, unlike KMS-L2, KMS-ED discriminates the daily situations favorably, giving more physical meaning to the clusters discovered by the algorithm. The effect of patches is observed in the spatial analysis of representative elements for KMS-ED. The ED is able to produce different configurations which makes the usual atmospheric structures clearly identifiable. Atmospheric physicists can interpret the locations of the impact of each cluster on a specific zone according to atmospheric structures. KMS-L2 does not lead to such an interpretability, because the situations represented are spatially quite smooth. This climatological study illustrates the advantage of using ED as a new approach.

artificial intelligence, data mining, machine learning, (14 more...)

2006.05603

Country:

Europe > France (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > Martinique (0.04)
(4 more...)

Genre: Research Report (0.64)

Industry: Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Kleindessner, Matthäus, Awasthi, Pranjal, Morgenstern, Jamie

A Notion of Individual Fairness for Clustering

A common distinction in fair machine learning, in particular in fair classification, is between group fairness and individual fairness. In the context of clustering, group fairness has been studied extensively in recent years; however, individual fairness for clustering has hardly been explored. In this paper, we propose a natural notion of individual fairness for clustering. Our notion asks that every data point, on average, is closer to the points in its own cluster than to the points in any other cluster. We study several questions related to our proposed notion of individual fairness. On the negative side, we show that deciding whether a given data set allows for such an individually fair clustering in general is NP-hard. On the positive side, for the special case of a data set lying on the real line, we propose an efficient dynamic programming approach to find an individually fair clustering. For general data sets, we investigate heuristics aimed at minimizing the number of individual fairness violations and compare them to standard clustering approaches on real data sets.

artificial intelligence, data mining, machine learning, (15 more...)

2006.0496

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

arXiv.org Artificial IntelligenceJun-8-2020

CAST: A Correlation-based Adaptive Spectral Clustering Algorithm on Multi-scale Data

Li, Xiang, Kao, Ben, Shan, Caihua, Yin, Dawei, Ester, Martin

We study the problem of applying spectral clustering to cluster multi-scale data, which is data whose clusters are of various sizes and densities. Traditional spectral clustering techniques discover clusters by processing a similarity matrix that reflects the proximity of objects. For multi-scale data, distance-based similarity is not effective because objects of a sparse cluster could be far apart while those of a dense cluster have to be sufficiently close. Following [16], we solve the problem of spectral clustering on multi-scale data by integrating the concept of objects' "reachability similarity" with a given distance-based similarity to derive an objects' coefficient matrix. We propose the algorithm CAST that applies trace Lasso to regularize the coefficient matrix. We prove that the resulting coefficient matrix has the "grouping effect" and that it exhibits "sparsity". We show that these two characteristics imply very effective spectral clustering. We evaluate CAST and 10 other clustering methods on a wide range of datasets w.r.t. various measures. Experimental results show that CAST provides excellent performance and is highly robust across test cases of multi-scale data.

artificial intelligence, machine learning, matrix, (17 more...)

arXiv.org Artificial Intelligence

2006.04435

Country:

Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Hong Kong (0.04)
(3 more...)

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

arXiv.org Artificial IntelligenceJun-8-2020

Outlier Detection Using a Novel method: Quantum Clustering

Liu, Ding, Li, Hui

We propose a new assumption in outlier detection: Normal data instances are commonly located in the area that there is hardly any fluctuation on data density, while outliers are often appeared in the area that there is violent fluctuation on data density. And based on this hypothesis, we apply a novel density-based approach to unsupervised outlier detection. This approach, called Quantum Clustering (QC), deals with unlabeled data processing and constructs a potential function to find the centroids of clusters and the outliers. The experiments show that the potential function could clearly find the hidden outliers in data points effectively. Besides, by using QC, we could find more subtle outliers by adjusting the parameter $\sigma$. Moreover, our approach is also evaluated on two datasets (Air Quality Detection and Darwin Correspondence Project) from two different research areas, and the results show the wide applicability of our method.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2006.0476

Country:

Asia > China > Shanghai > Shanghai (0.05)
Asia > China > Tianjin Province > Tianjin (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Italy (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology (0.48)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Randomized spectral co-clustering for large-scale directed networks

Guo, Xiao, Qiu, Yixuan, Zhang, Hai, Chang, Xiangyu

Directed networks are generally used to represent asymmetric relationships among units. Co-clustering aims to cluster the senders and receivers of directed networks simultaneously. In particular, the well-known spectral clustering algorithm could be modified as the spectral co-clustering to co-cluster directed networks. However, large-scale networks pose computational challenge to it. In this paper, we leverage randomized sketching techniques to accelerate the spectral co-clustering algorithms in order to co-cluster large-scale directed networks more efficiently. Specifically, we derive two series of randomized spectral co-clustering algorithms, one is random-projection-based and the other is random-samplingbased. Theoretically, we analyze the resulting algorithms under two generative models-the stochastic co-block model and the degree corrected stochastic co-block model. The approximation error rates and misclustering error rates of proposed two randomized spectral co-clustering algorithms are established, which indicate better bounds than the state-ofthe-art results of co-clustering literature. Numerically, we conduct simulations to support our theoretical results and test the efficiency of the algorithms on real networks with up to tens of millions of nodes. In order to use the proposed algorithms more conveniently, a new R package called RandClust is developed and made available to the public.

algorithm, matrix, node, (13 more...)

2004.12164

Country:

North America > United States (0.28)
Asia > China > Shaanxi Province > Xi'an (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Bhuvaneswari, Revathi, Segalini, Antonio

Determining Secondary Attributes for Credit Evaluation in P2P Lending

There has been an increased need for secondary means of credit evaluation by both traditional banking organizations as well as peer-to-peer lending entities. This is especially important in the present technological era where sticking with strict primary credit histories doesn't help distinguish between a 'good' and a 'bad' borrower, and ends up hurting both the individual borrower as well as the investor as a whole. We utilized machine learning classification and clustering algorithms to accurately predict a borrower's creditworthiness while identifying specific secondary attributes that contribute to this score. While extensive research has been done in predicting when a loan would be fully paid, the area of feature selection for lending is relatively new. We achieved 65% F1 and 73% AUC on the LendingClub data while identifying key secondary attributes.

artificial intelligence, machine learning, test 0, (18 more...)

2006.13921

Country:

North America > United States > New York (0.04)
South America > Uruguay > Maldonado > Maldonado (0.04)

Genre: Research Report (0.54)

Industry:

Banking & Finance > Loans (1.00)
Information Technology > Services > e-Commerce Services (0.71)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

Gonzalez-Torres, Bernardo A.

An Algorithmic Introduction to Clustering

This paper tries to present a more unified view of clustering, by identifying the relationships between five different clustering algorithms. Some of the results are not new, but they are presented in a cleaner, simpler and more concise way. To the best of my knowledge, the interpretation of DBSCAN as a climbing procedure, which introduces a theoretical connection between DBSCAN and Mean shift, is a novel result.

artificial intelligence, data mining, machine learning, (20 more...)

2006.04916

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Data Science > Data Mining (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

#artificialintelligenceJun-7-2020, 21:26:33 GMT

Introduction to Image Segmentation with K-Means clustering

Image segmentation is an important step in image processing, and it seems everywhere if we want to analyze what's inside the image. Many kinds of research have been done in the area of image segmentation using clustering. There are different methods and one of the most popular methods is K-Means clustering algorithm. Image segmentation is an important step in image processing, and it seems everywhere if we want to analyze what's inside the image. For example, if we seek to find if there is a chair or person inside an indoor image, we may need image segmentation to separate objects and analyze each object individually to check what it is.

algorithm, artificial intelligence, machine learning, (16 more...)

#artificialintelligence

Industry: Health & Medicine (0.96)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)