AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Interpret Principal Component Analysis (PCA)

#artificialintelligenceMar-12-2020, 12:12:15 GMT

Data can tell us stories. That's what I've been told anyway. As a Data Scientist working for Fortune 300 clients, I deal with tons of data daily, I can tell you that data can tell us stories. You can apply a regression, classification or a clustering algorithm on the data, but feature selection and engineering can be a daunting task. A lot of times, I have seen data scientists take an automated approach to feature selection such as Recursive Feature Elimination (RFE) or leverage Feature Importance algorithms using Random Forest or XGBoost.

interpret principal component analysis, principal component analysis, variance, (9 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.43)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.36)

Add feedback

A Global Constraint for the Exact Cover Problem: Application to Conceptual Clustering

Chabert, Maxime | Solnon, Christine (LIRIS, INSA Lyon)

Journal of Artificial Intelligence ResearchMar-12-2020

We introduce the exactCover global constraint dedicated to the exact cover problem, the goal of which is to select subsets such that each element of a given set belongs to exactly one selected subset. This NP-complete problem occurs in many applications, and we more particularly focus on a conceptual clustering application. We introduce three propagation algorithms for exactCover, called Basic, DL, and DL+: Basic ensures the same level of consistency as arc consistency on a classical decomposition of exactCover into binary constraints, without using any specific data structure; DL ensures the same level of consistency as Basic but uses Dancing Links to efficiently maintain the relation between elements and subsets; and DL+ is a stronger propagator which exploits an extra property to filter more values than DL. We also consider the case where the number of selected subsets is constrained to be equal to a given integer variable k, and we show that this may be achieved either by combining exactCover with existing constraints, or by designing a specific propagator that integrates algorithms designed for the NValues constraint. These different propagators are experimentally evaluated on conceptual clustering problems, and they are compared with state-of-the-art declarative approaches. In particular, we show that our global constraint is competitive with recent ILP and CP models for mono-criterion problems, and it has better scale-up properties for multi-criteria problems.

constraint, ecq dl, subset, (15 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1.11870

AI Access Foundation

11870

Journal of Artificial Intelligence Research

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > United States > New York > New York County > New York City (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(8 more...)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.35)

Add feedback

Autoencoders

Bank, Dor, Koenigstein, Noam, Giryes, Raja

arXiv.org Machine LearningMar-12-2020

An autoencoder is a specific type of a neural network, which is mainlydesigned to encode the input into a compressed and meaningful representation, andthen decode it back such that the reconstructed input is similar as possible to theoriginal one. This chapter surveys the different types of autoencoders that are mainlyused today. It also describes various applications and use-cases of autoencoders.

autoencoder, international conference, representation, (13 more...)

arXiv.org Machine Learning

2003.05991

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
North America > United States > New York > New York County > New York City (0.04)
(5 more...)

Genre: Research Report (0.41)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Classical Statistics and Statistical Learning in Imaging Neuroscience

#artificialintelligenceMar-10-2020, 04:29:03 GMT

Single subject prediction of brain disorders in neuroimaging: promises and pitfalls.

algorithm, hypothesis, inference, (16 more...)

#artificialintelligence

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
(12 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Instructional Material (1.00)
Overview (0.67)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
(4 more...)

Add feedback

An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs

Rozemberczki, Benedek, Kiss, Oliver, Sarkar, Rik

arXiv.org Machine LearningMar-10-2020

We present Karate Club a Python framework combining more than 30 state-of-the-art graph mining algorithms which can solve unsupervised machine learning tasks. The primary goal of the package is to make community detection, node and whole graph embedding available to a wide audience of machine learning researchers and practitioners. We designed Karate Club with an emphasis on a consistent application interface, scalability, ease of use, sensible out of the box model behaviour, standardized dataset ingestion, and output generation. This paper discusses the design principles behind this framework with practical examples. We show Karate Club's efficiency with respect to learning performance on a wide range of real world clustering problems, classification tasks and support evidence with regards to its competitive speed.

algorithm, graph, karate club, (14 more...)

arXiv.org Machine Learning

2003.04819

Country:

Europe > United Kingdom > England (0.05)
North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Sets Clustering

Jubran, Ibrahim, Tukan, Murad, Maalouf, Alaa, Feldman, Dan

arXiv.org Machine LearningMar-9-2020

The input to the \emph{sets-$k$-means} problem is an integer $k\geq 1$ and a set $\mathcal{P}=\{P_1,\cdots,P_n\}$ of sets in $\mathbb{R}^d$. The goal is to compute a set $C$ of $k$ centers (points) in $\mathbb{R}^d$ that minimizes the sum $\sum_{P\in \mathcal{P}} \min_{p\in P, c\in C}\left\| p-c \right\|^2$ of squared distances to these sets. An \emph{$\varepsilon$-core-set} for this problem is a weighted subset of $\mathcal{P}$ that approximates this sum up to $1\pm\varepsilon$ factor, for \emph{every} set $C$ of $k$ centers in $\mathbb{R}^d$. We prove that such a core-set of $O(\log^2{n})$ sets always exists, and can be computed in $O(n\log{n})$ time, for every input $\mathcal{P}$ and every fixed $d,k\geq 1$ and $\varepsilon \in (0,1)$. The result easily generalized for any metric space, distances to the power of $z>0$, and M-estimators that handle outliers. Applying an inefficient but optimal algorithm on this coreset allows us to obtain the first PTAS ($1+\varepsilon$ approximation) for the sets-$k$-means problem that takes time near linear in $n$. This is the first result even for sets-mean on the plane ($k=1$, $d=2$). Open source code and experimental results for document classification and facility locations are also provided.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

2003.04135

Country:

North America > United States (0.46)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Israel > Haifa District > Haifa (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Add feedback

Deep Inverse Feature Learning: A Representation Learning of Error

Ghazanfari, Behzad, Afghah, Fatemeh

arXiv.org Machine LearningMar-9-2020

This paper introduces a novel perspective about error in machine learning and proposes inverse feature learning (IFL) as a representation learning approach that learns a set of high-level features based on the representation of error for classification or clustering purposes. The proposed perspective about error representation is fundamentally different from current learning methods, where in classification approaches they interpret the error as a function of the differences between the true labels and the predicted ones or in clustering approaches, in which the clustering objective functions such as compactness are used. Inverse feature learning method operates based on a deep clustering approach to obtain a qualitative form of the representation of error as features. The performance of the proposed IFL method is evaluated by applying the learned features along with the original features, or just using the learned features in different classification and clustering techniques for several data sets. The experimental results show that the proposed method leads to promising results in classification and especially in clustering. In classification, the proposed features along with the primary features improve the results of most of the classification methods on several popular data sets. In clustering, the performance of different clustering methods is considerably improved on different data sets. There are interesting results that show some few features of the representation of error capture highly informative aspects of primary features. We hope this paper helps to utilize the error representation learning in different feature learning domains.

classification, learning, representation, (16 more...)

arXiv.org Machine Learning

2003.04285

Country:

North America > United States > Colorado (0.04)
North America > United States > California > Alameda County > Oakland (0.04)
North America > United States > Arizona > Coconino County > Flagstaff (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Collaborative Learning of Semi-Supervised Clustering and Classification for Labeling Uncurated Data

Mousavi, Sara, Lee, Dylan, Griffin, Tatianna, Steadman, Dawnie, Mockus, Audris

arXiv.org Machine LearningMar-9-2020

Domain-specific image collections present potential value in various areas of science and business but are often not curated nor have any way to readily extract relevant content. To employ contemporary supervised image analysis methods on such image data, they must first be cleaned and organized, and then manually labeled for the nomenclature employed in the specific domain, which is a time consuming and expensive endeavor. To address this issue, we designed and implemented the Plud system. Plud provides an iterative semi-supervised workflow to minimize the effort spent by an expert and handles realistic large collections of images. We believe it can support labeling datasets regardless of their size and type. Plud is an iterative sequence of unsupervised clustering, human assistance, and supervised classification. With each iteration 1) the labeled dataset grows, 2) the generality of the classification method and its accuracy increases, and 3) manual effort is reduced. We evaluated the effectiveness of our system, by applying it on over a million images documenting human decomposition. In our experiment comparing manual labeling with labeling conducted with the support of Plud, we found that it reduces the time needed to label data and produces highly accurate models for this new domain.

classifier, interface, iteration, (15 more...)

arXiv.org Machine Learning

2003.04261

Country: North America > United States > Tennessee > Knox County > Knoxville (0.04)

Genre: Research Report > New Finding (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Nearly Optimal Risk Bounds for Kernel K-Means

Liu, Yong, Ding, Lizhong, Zhang, Hua, Ren, Wenqi, Zhang, Xiao, Jiang, Shali, Liu, Xinwang, Wang, Weiping

arXiv.org Machine LearningMar-8-2020

In this paper, we study the statistical properties of the kernel $k$-means and obtain a nearly optimal excess risk bound, substantially improving the state-of-art bounds in the existing clustering risk analyses. We further analyze the statistical effect of computational approximations of the Nystr\"{o}m kernel $k$-means, and demonstrate that it achieves the same statistical accuracy as the exact kernel $k$-means considering only $\sqrt{nk}$ Nystr\"{o}m landmark points. To the best of our knowledge, such sharp excess risk bounds for kernel (or approximate kernel) $k$-means have never been seen before.

calandriello & rosasco, excess risk, kernel k-means, (10 more...)

arXiv.org Machine Learning

2003.03888

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Add feedback

Inverse Feature Learning: Feature learning based on Representation Learning of Error

Ghazanfari, Behzad, Afghah, Fatemeh, Hajiaghayi, MohammadTaghi

arXiv.org Machine LearningMar-7-2020

This paper proposes inverse feature learning as a novel supervised feature learning technique that learns a set of high-level features for classification based on an error representation approach. The key contribution of this method is to learn the representation of error as high-level features, while current representation learning methods interpret error by loss functions which are obtained as a function of differences between the true labels and the predicted ones. One advantage of such learning method is that the learned features for each class are independent of learned features for other classes; therefore, this method can learn simultaneously meaning that it can learn new classes without retraining. Error representation learning can also help with generalization and reduce the chance of over-fitting by adding a set of impactful features to the original data set which capture the relationships between each instance and different classes through an error generation and analysis process. This method can be particularly effective in data sets, where the instances of each class have diverse feature representations or the ones with imbalanced classes. The experimental results show that the proposed method results in significantly better performance compared to the state-of-the-art classification techniques for several popular data sets. We hope this paper can open a new path to utilize the proposed perspective of error representation learning in different feature learning domains.

inverse feature, learning, representation, (13 more...)

arXiv.org Machine Learning

2003.03689

Country:

North America > United States > Maryland > Prince George's County > College Park (0.14)
North America > United States > Colorado (0.04)
North America > United States > Arizona > Coconino County > Flagstaff (0.04)
Asia > India (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.95)

Add feedback