AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Successive Convex Approximation Algorithms for Sparse Signal Estimation with Nonconvex Regularizations

Yang, Yang, Pesavento, Marius, Chatzinotas, Symeon, Ottersten, Björn

arXiv.org Machine LearningJun-28-2018

In this paper, we propose a successive convex approximation framework for sparse optimization where the nonsmooth regularization function in the objective function is nonconvex and it can be written as the difference of two convex functions. The proposed framework is based on a nontrivial combination of the majorization-minimization framework and the successive convex approximation framework proposed in literature for a convex regularization function. The proposed framework has several attractive features, namely, i) flexibility, as different choices of the approximate function lead to different type of algorithms; ii) fast convergence, as the problem structure can be better exploited by a proper choice of the approximate function and the stepsize is calculated by the line search; iii) low complexity, as the approximate function is convex and the line search scheme is carried out over a differentiable function; iv) guaranteed convergence to a stationary point. We demonstrate these features by two example applications in subspace learning, namely, the network anomaly detection problem and the sparse subspace clustering problem. Customizing the proposed framework by adopting the best-response type approximation, we obtain soft-thresholding with exact line search algorithms for which all elements of the unknown parameter are updated in parallel according to closed-form expressions. The attractive features of the proposed algorithms are illustrated numerically.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Machine Learning

1806.10773

Country:

Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
North America > United States > New York (0.04)

Genre:

Overview (0.46)
Research Report (0.40)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

Add feedback

A probabilistic constrained clustering for transfer learning and image category discovery

Hsu, Yen-Chang, Lv, Zhaoyang, Schlosser, Joel, Odom, Phillip, Kira, Zsolt

arXiv.org Artificial IntelligenceJun-28-2018

Neural network-based clustering has recently gained popularity, and in particular a constrained clustering formulation has been proposed to perform transfer learning and image category discovery using deep learning. The core idea is to formulate a clustering objective with pairwise constraints that can be used to train a deep clustering network; therefore the cluster assignments and their underlying feature representations are jointly optimized end-to-end. In this work, we provide a novel clustering formulation to address scalability issues of previous work in terms of optimizing deeper networks and larger amounts of categories. The proposed objective directly minimizes the negative log-likelihood of cluster assignment with respect to the pairwise constraints, has no hyper-parameters, and demonstrates improved scalability and performance on both supervised learning and unsupervised transfer learning.

artificial intelligence, image category discovery, machine learning, (15 more...)

arXiv.org Artificial Intelligence

1806.11078

Country: North America > United States > California > Alameda County > Oakland (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.72)

Add feedback

Deep $k$-Means: Jointly Clustering with $k$-Means and Learning Representations

Fard, Maziar Moradi, Thonet, Thibaut, Gaussier, Eric

arXiv.org Machine LearningJun-26-2018

We study in this paper the problem of jointly clustering and learning representations. As several previous studies have shown, learning representations that are both faithful to the data to be clustered and adapted to the clustering algorithm can lead to better clustering performance, all the more so that the two tasks are performed jointly. We propose here such an approach for $k$-Means clustering based on a continuous reparametrization of the objective function that leads to a truly joint solution. The behavior of our approach is illustrated on various datasets showing its efficacy in learning representations for objects while clustering them.

artificial intelligence, data mining, machine learning, (16 more...)

arXiv.org Machine Learning

1806.10069

Country: Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.05)

Genre:

Research Report > New Finding (0.94)
Research Report > Experimental Study (0.94)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)

Add feedback

Hierarchical Graph Representation Learning with Differentiable Pooling

Ying, Rex, You, Jiaxuan, Morris, Christopher, Ren, Xiang, Hamilton, William L., Leskovec, Jure

arXiv.org Machine LearningJun-26-2018

Recently, graph neural networks (GNNs) have revolutionized the field of graph representation learning through effectively learned node embeddings, and achieved state-of-the-art results in tasks such as node classification and link prediction. However, current GNN methods are inherently flat and do not learn hierarchical representations of graphs---a limitation that is especially problematic for the task of graph classification, where the goal is to predict the label associated with an entire graph. Here we propose DiffPool, a differentiable graph pooling module that can generate hierarchical representations of graphs and can be combined with various graph neural network architectures in an end-to-end fashion. DiffPool learns a differentiable soft cluster assignment for nodes at each layer of a deep GNN, mapping nodes to a set of clusters, which then form the coarsened input for the next GNN layer. Our experimental results show that combining existing GNN methods with DiffPool yields an average improvement of 5-10% accuracy on graph classification benchmarks, compared to all existing pooling approaches, achieving a new state-of-the-art on four out of five benchmark data sets.

artificial intelligence, graph, machine learning, (18 more...)

arXiv.org Machine Learning

1806.08804

Country: North America > United States > California > Santa Clara County > Palo Alto (0.05)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Data Science in 90 seconds, Lesson 4: K-Means Clustering

#artificialintelligenceJun-25-2018, 23:22:01 GMT

Data Science doesn't have to be boring or confusing.

artificial intelligence, k-means clustering, social media, (3 more...)

#artificialintelligence

Technology:

Information Technology > Data Science (1.00)
Information Technology > Communications > Social Media (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.40)

Add feedback

Deep $k$-Means: Re-Training and Parameter Sharing with Harder Cluster Assignments for Compressing Deep Convolutions

Wu, Junru, Wang, Yue, Wu, Zhenyu, Wang, Zhangyang, Veeraraghavan, Ashok, Lin, Yingyan

arXiv.org Machine LearningJun-24-2018

The current trend of pushing CNNs deeper with convolutions has created a pressing demand to achieve higher compression gains on CNNs where convolutions dominate the computation and parameter amount (e.g., GoogLeNet, ResNet and Wide ResNet). Further, the high energy consumption of convolutions limits its deployment on mobile devices. To this end, we proposed a simple yet effective scheme for compressing convolutions though applying k-means clustering on the weights, compression is achieved through weightsharing, by only recording K cluster centers and weight assignment indexes. We then introduced a novel spectrally relaxed k-means regularization, which tends to make hard assignments of convolutional layer weights to K learned cluster centers during retraining. We additionally propose an improved set of metrics to estimate energy consumption of CNN hardware implementations, whose estimation results are verified to be consistent with previously proposed energy estimation tool extrapolated from actual hardware measurements. We finally evaluated Deep k-Means across several CNN models in terms of both compression ratio and energy consumption reduction, observing promising results without incurring accuracy loss. The code is available at https://github.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Machine Learning

1806.09228

Country:

North America > United States > Texas > Brazos County > College Station (0.14)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Texas > Harris County > Houston (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Research Report (0.64)

Industry: Energy (0.76)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

Add feedback

Hierarchical Graph Clustering using Node Pair Sampling

Bonald, Thomas, Charpentier, Bertrand, Galland, Alexis, Hollocou, Alexandre

arXiv.org Artificial IntelligenceJun-22-2018

Many datasets can be represented as graphs, being the graph explicitely embedded in data (e.g., the friendship relation of a social network) or built through some suitable similarity measure between data items (e.g., the number of papers coauthored by two researchers). Such graphs often exhibit a complex, multi-scale community structure where each node is invoved in many groups of nodes, so-called communities, of different sizes. One of the most popular graph clustering algorithm is known as Louvain in name of the university of its inventors [Blondel et al., 2008]. It is based on the greedy maximization of the modularity, a classical objective function introduced in [Newman and Girvan, 2004]. The Louvain algorithm is fast, memory-efficient, and provides meaningful clusters in practice. It does not enable an analysis of the graph at different scales, however [Fortunato and Barthelemy, 2007, Lancichinetti and Fortunato, 2011].

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

1806.01664

Country:

Europe > United Kingdom > England (0.05)
Europe > France > Île-de-France > Paris > Paris (0.04)
North America > United States > Tennessee (0.04)
(15 more...)

Genre: Research Report (0.50)

Industry:

Leisure & Entertainment (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Media > Music (0.93)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Clustering App Attacks with Machine Learning Part 3: Algorithm Results - Security Boulevard

#artificialintelligenceJun-20-2018, 03:46:50 GMT

In the previous blog posts in this series, we discussed the motivation for clustering attacks and the data used and how to calculate the distance between two attacks using different methods on each feature we extracted. In this final blog post, we'll discuss the clustering algorithm itself – how to use the distance we calculated to create clusters from the data. We will discuss clustering in real time when only a small amount of data can be stored in memory. Finally, we'll show some results of the algorithm based on real data from Imperva customers. Now we have all the basic ingredients to input into the algorithm.

algorithm, artificial intelligence, machine learning, (15 more...)

#artificialintelligence

Country:

South America (0.05)
Europe (0.05)
Asia > China (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.38)

Add feedback

A Scalable Framework for Trajectory Prediction

Rathore, Punit, Kumar, Dheeraj, Rajasegarar, Sutharshan, Palaniswami, Marimuthu, Bezdek, James C.

arXiv.org Artificial IntelligenceJun-20-2018

Trajectory prediction (TP) is of great importance for a wide range of location-based applications in intelligent transport systems such as location-based advertising, route planning, traffic management, and early warning systems. In the last few years, the widespread use of GPS navigation systems and wireless communication technology enabled vehicles has resulted in huge volumes of trajectory data. The task of utilizing this data employing spatio-temporal techniques for trajectory prediction in an efficient and accurate manner is an ongoing research problem. Existing TP approaches are limited to short-term predictions. Moreover, they cannot handle a large volume of trajectory data for long-term prediction. To address these limitations, we propose a scalable clustering and Markov chain based hybrid framework, called Traj-clusiVAT-based TP, for both short-term and long-term trajectory prediction, which can handle a large number of overlapping trajectories in a dense road network. In addition, Traj-clusiVAT can also determine the number of clusters, which represent different movement behaviours in input trajectory data. In our experiments, we compare our proposed approach with a mixed Markov model (MMM)-based scheme, and a trajectory clustering, NETSCAN-based TP method for both short- and long-term trajectory predictions. We performed our experiments on two real, vehicle trajectory datasets, including a large-scale trajectory dataset consisting of 3.28 million trajectories obtained from 15,061 taxis in Singapore over a period of one month. Experimental results on two real trajectory datasets show that our proposed approach outperforms the existing approaches in terms of both short- and long-term prediction performances, based on prediction accuracy and distance error (in km).

data mining, machine learning, trajectory, (19 more...)

arXiv.org Artificial Intelligence

1806.03582

Country:

Asia > Singapore (0.27)
Asia > China > Beijing > Beijing (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(2 more...)

Genre:

Workflow (0.68)
Research Report > New Finding (0.34)

Industry:

Transportation > Ground > Road (0.92)
Transportation > Infrastructure & Services (0.71)

Technology:

Information Technology > Geographic Information Systems (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
(3 more...)

Add feedback

Multi-View Multi-Graph Embedding for Brain Network Clustering Analysis

Liu, Ye, He, Lifang, Cao, Bokai, Yu, Philip S., Ragin, Ann B., Leow, Alex D.

arXiv.org Machine LearningJun-19-2018

Network analysis of human brain connectivity is critically important for understanding brain function and disease states. Embedding a brain network as a whole graph instance into a meaningful low-dimensional representation can be used to investigate disease mechanisms and inform therapeutic interventions. Moreover, by exploiting information from multiple neuroimaging modalities or views, we are able to obtain an embedding that is more useful than the embedding learned from an individual view. Therefore, multi-view multi-graph embedding becomes a crucial task. Currently, only a few studies have been devoted to this topic, and most of them focus on the vector-based strategy which will cause structural information contained in the original graphs lost. As a novel attempt to tackle this problem, we propose Multi-view Multi-graph Embedding (M2E) by stacking multi-graphs into multiple partially-symmetric tensors and using tensor techniques to simultaneously leverage the dependencies and correlations among multi-view and multi-graph brain networks. Extensive experiments on real HIV and bipolar disorder brain network datasets demonstrate the superior performance of M2E on clustering brain networks by leveraging the multi-view multi-graph interactions.

artificial intelligence, brain network, machine learning, (18 more...)

arXiv.org Machine Learning

1806.07703

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Data Science (0.93)

Add feedback