AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

MaxMin Linear Initialization for Fuzzy C-Means

Oztürk, Aybükë, Lallich, Stéphane, Darmont, Jérôme, Waksman, Sylvie Yona

arXiv.org Machine LearningAug-1-2018

Clustering is an extensive research area in data science. The aim of clustering is to discover groups and to identify interesting patterns in datasets. Crisp (hard) clustering considers that each data point belongs to one and only one cluster. However, it is inadequate as some data points may belong to several clusters, as is the case in text categorization. Thus, we need more flexible clustering. Fuzzy clustering methods, where each data point can belong to several clusters, are an interesting alternative. Yet, seeding iterative fuzzy algorithms to achieve high quality clustering is an issue. In this paper, we propose a new linear and efficient initialization algorithm MaxMin Linear to deal with this problem. Then, we validate our theoretical results through extensive experiments on a variety of numerical real-world and artificial datasets. We also test several validity indices, including a new validity index that we propose, Transformed Standardized Fuzzy Difference (TSFD).

artificial intelligence, initialization method, machine learning, (16 more...)

arXiv.org Machine Learning

1808.00197

Country:

North America > United States > Wisconsin (0.04)
Europe > France (0.04)
Asia > Middle East > Saudi Arabia > Northern Borders Province > Arar (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Fusion Subspace Clustering: Full and Incomplete Data

Pimentel-Alarcón, Daniel L., Mahmood, Usman

arXiv.org Machine LearningAug-1-2018

Inferring low-dimensional structures that explain high-dimensional data has become a cornerstone of discovery in virtually all fields of science. Principal component analysis (PCA), which identifies the low-dimensional linear subspace that best explains a dataset, is arguably the most prominent technique for this purpose. However, in many applications -- computer vision, image processing, bioinformatics, linguistics, networks analysis, and more [1-10] -- data is often composed of a mixture of several classes, each of which can be explained with a different subspace. Clustering and inferring subspaces that explain data is an important unsupervised learning problem that has received tremendous 1 attention in recent years, producing theory and algorithms to handle outliers, noisy measurements, privacy concerns, and data constraints, among other difficulties [11-22]. However, one major challenge in contemporary problems is that data is often incomplete. For example, in image inpainting, the values of some pixels are missing due to faulty sensors and image contamination [23]; in computer vision features are often missing due to occlusions and tracking algorithms malfunctions [24]; in recommender systems each user only rates a limited number of items [25]; in a network, most nodes communicate in subsets, producing only a handful of all the possible measurements [7].

artificial intelligence, machine learning, subspace, (10 more...)

arXiv.org Machine Learning

1808.00628

Genre:

Research Report (0.50)
Workflow (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

Add feedback

Using Feature Grouping as a Stochastic Regularizer for High-Dimensional Noisy Data

Aydore, Sergul, Thirion, Bertrand, Grisel, Olivier, Varoquaux, Gael

arXiv.org Machine LearningJul-31-2018

The use of complex models --with many parameters-- is challenging with high-dimensional small-sample problems: indeed, they face rapid overfitting. Such situations are common when data collection is expensive, as in neuroscience, biology, or geology. Dedicated regularization can be crafted to tame overfit, typically via structured penalties. But rich penalties require mathematical expertise and entail large computational costs. Stochastic regularizers such as dropout are easier to implement: they prevent overfitting by random perturbations. Used inside a stochastic optimizer, they come with little additional cost. We propose a structured stochastic regularization that relies on feature grouping. Using a fast clustering algorithm, we define a family of groups of features that capture feature covariations. We then randomly select these groups inside a stochastic gradient descent loop. This procedure acts as a structured regularizer for high-dimensional correlated data without additional computational cost and it has a denoising effect. We demonstrate the performance of our approach for logistic regression both on a sample-limited face image dataset with varying additive noise and on a typical high-dimensional learning problem, brain image classification.

artificial intelligence, machine learning, regularizer, (11 more...)

arXiv.org Machine Learning

1807.11718

Country:

Europe > France (0.04)
North America > United States (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.91)
Health & Medicine > Health Care Technology (0.73)
Health & Medicine > Diagnostic Medicine > Imaging (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

Add feedback

K-medoids Clustering of Data Sequences with Composite Distributions

Wang, Tiexing, Li, Qunwei, Bucci, Donald J., Liang, Yingbin, Chen, Biao, Varshney, Pramod K.

arXiv.org Machine LearningJul-30-2018

This paper studies clustering of data sequences using the k-medoids algorithm. All the data sequences are assumed to be generated from \emph{unknown} continuous distributions, which form clusters with each cluster containing a composite set of closely located distributions (based on a certain distance metric between distributions). The maximum intra-cluster distance is assumed to be smaller than the minimum inter-cluster distance, and both values are assumed to be known. The goal is to group the data sequences together if their underlying generative distributions (which are unknown) belong to one cluster. Distribution distance metrics based k-medoids algorithms are proposed for known and unknown number of distribution clusters. Upper bounds on the error probability and convergence results in the large sample regime are also provided. It is shown that the error probability decays exponentially fast as the number of samples in each data sequence goes to infinity. The error exponent has a simple form regardless of the distance metric applied when certain conditions are satisfied. In particular, the error exponent is characterized when either the Kolmogrov-Smirnov distance or the maximum mean discrepancy are used as the distance metric. Simulation results are provided to validate the analysis.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Machine Learning

1807.1162

Country:

North America > United States > Ohio > Franklin County > Columbus (0.04)
North America > United States > New York > Onondaga County > Syracuse (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
(7 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.96)
Information Technology > Data Science > Data Mining (0.94)
Information Technology > Modeling & Simulation (0.67)

Add feedback

A Group-Theoretic Approach to Abstraction: Hierarchical, Interpretable, and Task-Free Clustering

Yu, Haizi, Mineyev, Igor, Varshney, Lav R.

arXiv.org Machine LearningJul-30-2018

Abstraction plays a key role in concept learning and knowledge discovery. While pervasive in both human and artificial intelligence, it remains mysterious how concepts are abstracted in the first place. We study the nature of abstraction through a group-theoretic approach, formalizing it as a hierarchical, interpretable, and task-free clustering problem. This clustering framework is data-free, feature-free, similarity-free, and globally hierarchical---the four key features that distinguish it from common clustering models. Beyond a theoretical foundation for abstraction, we also present a top-down and a bottom-up approach to establish an algorithmic foundation for practical abstraction-generating methods. Lastly, using both a theoretical explanation and a real-world application, we show that the coupling of our abstraction framework with statistics realizes Shannon's information lattice and even further, brings learning into the picture. This gives a first step towards a principled and cognitive way of automatic concept learning and knowledge discovery.

abstraction, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

1807.11167

Country:

North America > United States > Illinois > Champaign County > Urbana (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
North America > United States > Wisconsin (0.04)
(4 more...)

Genre: Research Report (0.63)

Industry:

Leisure & Entertainment (1.00)
Media > Music (0.92)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Call Detail Records Driven Anomaly Detection and Traffic Prediction in Mobile Cellular Networks

Sultan, Kashif, Ali, Hazrat, Zhang, Zhongshan

arXiv.org Artificial IntelligenceJul-30-2018

Mobile networks possess information about the users as well as the network. Such information is useful for making the network end-to-end visible and intelligent. Big data analytics can efficiently analyze user and network information, unearth meaningful insights with the help of machine learning tools. Utilizing big data analytics and machine learning, this work contributes in three ways. First, we utilize the call detail records (CDR) data to detect anomalies in the network. For authentication and verification of anomalies, we use k-means clustering, an unsupervised machine learning algorithm. Through effective detection of anomalies, we can proceed to suitable design for resource distribution as well as fault detection and avoidance. Second, we prepare anomaly-free data by removing anomalous activities and train a neural network model. By passing anomaly and anomaly-free data through this model, we observe the effect of anomalous activities in training of the model and also observe mean square error of anomaly and anomaly free data. Lastly, we use an autoregressive integrated moving average (ARIMA) model to predict future traffic for a user. Through simple visualization, we show that anomaly free data better generalizes the learning models and performs better on prediction task.

artificial intelligence, data mining, machine learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ACCESS.2018.2859756

1807.11545

Country:

North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.26)
North America > Haiti (0.14)
Asia > China > Beijing > Beijing (0.05)
(5 more...)

Genre: Research Report (0.50)

Industry:

Telecommunications (1.00)
Information Technology > Networks (1.00)
Education > Educational Setting (0.68)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)

Add feedback

Understanding V2V Driving Scenarios through Traffic Primitives

Wang, Wenshuo, Zhang, Weiyang, Zhao, Ding

arXiv.org Machine LearningJul-26-2018

Semantically understanding complex drivers' encountering behavior, wherein two or multiple vehicles are spatially close to each other, does potentially benefit autonomous car's decision-making design. This paper presents a framework of analyzing various encountering behaviors through decomposing driving encounter data into small building blocks, called driving primitives, using nonparametric Bayesian learning (NPBL) approaches, which offers a flexible way to gain an insight into the complex driving encounters without any prerequisite knowledge. The effectiveness of our proposed primitive-based framework is validated based on 976 naturalistic driving encounters, from which more than 4000 driving primitives are learned using NPBL - a sticky HDP-HMM, combined a hidden Markov model (HMM) with a hierarchical Dirichlet process (HDP). After that, a dynamic time warping method integrated with k-means clustering is then developed to cluster all these extracted driving primitives into groups. Experimental results find that there exist 20 kinds of driving primitives capable of representing the basic components of driving encounters in our database. This primitive-based analysis methodology potentially reveals underlying information of vehicle-vehicle encounters for self-driving applications.

artificial intelligence, machine learning, vehicle, (15 more...)

arXiv.org Machine Learning

1807.10422

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.15)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
Asia > Middle East > Jordan (0.04)
(2 more...)

Genre: Research Report > New Finding (0.34)

Industry:

Transportation > Ground > Road (1.00)
Automobiles & Trucks (1.00)
Information Technology (0.88)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.88)

Add feedback

Selective Clustering Annotated using Modes of Projections

Greene, Evan, Finak, Greg, Gottardo, Raphael

arXiv.org Machine LearningJul-26-2018

Selective clustering annotated using modes of projections (SCAMP) is a new clustering algorithm for data in $\mathbb{R}^p$. SCAMP is motivated from the point of view of non-parametric mixture modeling. Rather than maximizing a classification likelihood to determine cluster assignments, SCAMP casts clustering as a search and selection problem. One consequence of this problem formulation is that the number of clusters is $\textbf{not}$ a SCAMP tuning parameter. The search phase of SCAMP consists of finding sub-collections of the data matrix, called candidate clusters, that obey shape constraints along each coordinate projection. An extension of the dip test of Hartigan and Hartigan (1985) is developed to assist the search. Selection occurs by scoring each candidate cluster with a preference function that quantifies prior belief about the mixture composition. Clustering proceeds by selecting candidates to maximize their total preference score. SCAMP concludes by annotating each selected cluster with labels that describe how cluster-level statistics compare to certain dataset-level quantities. SCAMP can be run multiple times on a single data matrix. Comparison of annotations obtained across iterations provides a measure of clustering uncertainty. Simulation studies and applications to real data are considered. A C++ implementation with R interface is $\href{https://github.com/RGLab/scamp}{available\ online}$.

artificial intelligence, data matrix, machine learning, (15 more...)

arXiv.org Machine Learning

1807.10328

Country:

Europe > Italy > Apulia (0.04)
Europe > Italy > Sicily (0.04)
Europe > Italy > Calabria (0.04)
(7 more...)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Semantically Meaningful View Selection

Guérin, Joris, Gibaru, Olivier, Nyiri, Eric, Thiery, Stéphane, Boots, Byron

arXiv.org Artificial IntelligenceJul-26-2018

An understanding of the nature of objects could help robots to solve both high-level abstract tasks and improve performance at lower-level concrete tasks. Although deep learning has facilitated progress in image understanding, a robot's performance in problems like object recognition often depends on the angle from which the object is observed. Traditionally, robot sorting tasks rely on a fixed top-down view of an object. By changing its viewing angle, a robot can select a more semantically informative view leading to better performance for object recognition. In this paper, we introduce the problem of semantic view selection, which seeks to find good camera poses to gain semantic knowledge about an observed object. We propose a conceptual formulation of the problem, together with a solvable relaxation based on clustering. We then present a new image dataset consisting of around 10k images representing various views of 144 objects under different poses. Finally we use this dataset to propose a first solution to the problem by training a neural network to predict a "semantic score" from a top view image and camera pose. The views predicted to have higher scores are then shown to provide better clustering results than fixed top-down views.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

1807.10303

Country:

North America > United States > Georgia > Fulton County > Atlanta (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.31)

Add feedback

Decentralized Task Allocation in Multi-Robot Systems via Bipartite Graph Matching Augmented with Fuzzy Clustering

Ghassemi, Payam, Chowdhury, Souma

arXiv.org Artificial IntelligenceJul-20-2018

Robotic systems, working together as a team, are becoming valuable players in different real-world applications, from disaster response to warehouse fulfillment services. Centralized solutions for coordinating multi-robot teams often suffer from poor scalability and vulnerability to communication disruptions. This paper develops a decentralized multi-agent task allocation (Dec-MATA) algorithm for multi-robot applications. The task planning problem is posed as a maximum-weighted matching of a bipartite graph, the solution of which using the blossom algorithm allows each robot to autonomously identify the optimal sequence of tasks it should undertake. The graph weights are determined based on a soft clustering process, which also plays a problem decomposition role seeking to reduce the complexity of the individual-agents' task assignment problems. To evaluate the new Dec-MATA algorithm, a series of case studies (of varying complexity) are performed, with tasks being distributed randomly over an observable 2D environment. A centralized approach, based on a state-of-the-art MILP formulation of the multi-Traveling Salesman problem is used for comparative analysis. While getting within 7-28% of the optimal cost obtained by the centralized algorithm, the Dec-MATA algorithm is found to be 1-3 orders of magnitude faster and minimally sensitive to task-to-robot ratios, unlike the centralized algorithm.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

1807.07957

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New York > Erie County > Buffalo (0.04)

Genre:

Overview (0.93)
Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback