AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Reinforcement Learning with Policy Mixture Model for Temporal Point Processes Clustering

Wu, Weichang, Yan, Junchi, Yang, Xiaokang, Zha, Hongyuan

arXiv.org Artificial IntelligenceJun-4-2019

Temporal point process is an expressive tool for modeling event sequences over time. In this paper, we take a reinforcement learning view whereby the observed sequences are assumed to be generated from a mixture of latent policies. The purpose is to cluster the sequences with different temporal patterns into the underlying policies while learning each of the policy model. The flexibility of our model lies in: i) all the components are networks including the policy network for modeling the intensity function of temporal point process; ii) to handle varying-length event sequences, we resort to inverse reinforcement learning by decomposing the observed sequence into states (RNN hidden embedding of history) and actions (time interval to next event) in order to learn the reward function, thus achieving better performance or increasing efficiency compared to existing methods using rewards over the entire sequence such as log-likelihood or Wasserstein distance. We adopt an expectation-maximization framework with the E-step estimating the cluster labels for each sequence, and the M-step aiming to learn the respective policy. Extensive experiments show the efficacy of our method against state-of-the-arts.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

1905.12345

Country:

Asia (0.28)
Europe (0.28)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

The Most Important Data Science Tool for Market and Customer Segmentation

#artificialintelligenceJun-3-2019, 08:20:36 GMT

Market and customer segmentation are some of the most important tasks in any company. The segmentation done will influence marketing and sales decisions, and potentially the survival of a company. Surprisingly, despite the advance in machine learning, few marketers are using such technology to augment their all-important market and customer segmentation efforts. In this article, I will show you how to augment your segmentation analysis with a simple, yet powerful machine learning technique called K-means. Learning this will give you an edge over your competitors (and colleagues).

artificial intelligence, customer, machine learning, (13 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.30)

Add feedback

K-Means Clustering: Unsupervised Learning for Recommender Systems

#artificialintelligenceJun-3-2019, 02:17:44 GMT

Unsupervised Learning has been called the closest thing we have to "actual" Artificial Intelligence, in the sense of General AI, with K-Means Clustering one of its simplest, but most powerful applications. I am not here to discuss whether those claims are true or not, as I am not an expert nor a philosopher. I will however state, that I am often amazed by how well unsupervised learning techniques, even the most rudimentary, capture patterns in the data that I would expect only people to find. Today we'll apply unsupervised learning on a Dataset I gathered myself. It's a database of professional Magic: The Gathering decks that I crawled from mtgtop8.com, an awesome website if you're into Magic: the Gathering. I scraped the MtgTop8 data from a few years of tournaments, and they're all available to be consumed in this GitHub repository.

artificial intelligence, k-means clustering, machine learning, (16 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.76)

Add feedback

Big-Data Clustering: K-Means or K-Indicators?

Chen, Feiyu, Yang, Yuchen, Xu, Liwei, Zhang, Taiping, Zhang, Yin

arXiv.org Machine LearningJun-3-2019

The K-means algorithm is arguably the most popular data clustering method, commonly applied to processed datasets in some "feature spaces", as is in spectral clustering. Highly sensitive to initializations, however, K-means encounters a scalability bottleneck with respect to the number of clusters K as this number grows in big data applications. In this work, we promote a closely related model called K-indicators model and construct an efficient, semi-convex-relaxation algorithm that requires no randomized initializations. We present extensive empirical results to show advantages of the new algorithm when K is large. In particular, using the new algorithm to start the K-means algorithm, without any replication, can significantly outperform the standard K-means with a large number of currently state-of-the-art random replications.

algorithm, dataset, replication, (15 more...)

arXiv.org Machine Learning

1906.00938

Country:

Asia > China > Chongqing Province > Chongqing (0.05)
Asia > China > Guangdong Province > Shenzhen (0.04)
North America > United States > Texas > Harris County > Houston (0.04)
(5 more...)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Load Balancing for Ultra-Dense Networks: A Deep Reinforcement Learning Based Approach

Xu, Yue, Xu, Wenjun, Wang, Zhi, Lin, Jiaru, Cui, Shuguang

arXiv.org Machine LearningJun-3-2019

In this paper, we propose a deep reinforcement learning (DRL) based mobility load balancing (MLB) algorithm along with a two-layer architecture to solve the large-scale load balancing problem for ultra-dense networks (UDNs). Our contribution is three-fold. First, this work proposes a two-layer architecture to solve the large-scale load balancing problem in a self-organized manner. The proposed architecture can alleviate the global traffic variations by dynamically grouping small cells into self-organized clusters according to their historical loads, and further adapt to local traffic variations through intra-cluster load balancing afterwards. Second, for the intra-cluster load balancing, this paper proposes an off-policy DRL-based MLB algorithm to autonomously learn the optimal MLB policy under an asynchronous parallel learning framework, without any prior knowledge assumed over the underlying UDN environments. Moreover, the algorithm enables joint exploration with multiple behavior policies, such that the traditional MLB methods can be used to guide the learning process thereby improving the learning efficiency and stability. Third, this work proposes an offline-evaluation based safeguard mechanism to ensure that the online system can always operate with the optimal and well-trained MLB policy, which not only stabilizes the online performance but also enables the exploration beyond current policies to make full use of machine learning in a safe way. Empirical results verify that the proposed framework outperforms the existing MLB methods in general UDN environments featured with irregular network topologies, coupled interferences, and random user movements, in terms of the load balancing performance.

behavior policy, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

1906.00767

Country:

North America > United States (0.93)
Asia > China (0.67)

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment (1.00)
Energy > Power Industry (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Clustering by Orthogonal NMF Model and Non-Convex Penalty Optimization

Wang, Shuai, Chang, Tsung-Hui, Cui, Ying, Pang, Jong-Shi

arXiv.org Machine LearningJun-3-2019

The non-negative matrix factorization (NMF) model with an additional orthogonality constraint on one of the factor matrices, called the orthogonal NMF (ONMF), has been found to provide improved clustering performance over the K-means. Solving the ONMF model is a challenging optimization problem due to the existence of both orthogonality and nonnegativity constraints, and most of the existing methods directly deal with the orthogonality constraint in its original form via various optimization techniques. In this paper, we propose a new ONMF based clustering formulation that equivalently transforms the orthogonality constraint into a set of norm-based non-convex equality constraints. We then apply a non-convex penalty (NCP) approach to add the non-convex equality constraints to the objective as penalty terms, leaving simple non-negativity constraints only in the penalized problem. One smooth penalty formulation and one non-smooth penalty formulation are respectively studied, and theoretical conditions for the penalized problems to provide feasible stationary solutions to the ONMF based clustering problem are presented. Experimental results based on both synthetic and real datasets are presented to show that the proposed NCP methods are computationally time efficient, and either match or outperform the existing K-means and ONMF based methods in terms of the clustering performance.

artificial intelligence, constraint, machine learning, (14 more...)

arXiv.org Machine Learning

1906.0057

Country:

Europe (1.00)
Asia (1.00)
North America > United States > California > Los Angeles County > Los Angeles (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

An Effective Multi-Resolution Hierarchical Granular Representation based Classifier using General Fuzzy Min-Max Neural Network

Khuat, Thanh Tung, Chen, Fang, Gabrys, Bogdan

arXiv.org Machine LearningJun-3-2019

Motivated by the practical demands for simplification of data towards being consistent with human thinking and problem solving as well as tolerance of uncertainty, information granules are becoming important entities in data processing at different levels of data abstraction. This paper proposes a method to construct classifiers from multi-resolution hierarchical granular representations (MRHGRC) using hyperbox fuzzy sets. The proposed approach forms a series of granular inferences hierarchically through many levels of abstraction. An attractive characteristic of our classifier is that it can maintain relatively high accuracy at a low degree of granularity based on reusing the knowledge learned from lower levels of abstraction. In addition, our approach can reduce the data size significantly as well as handling the uncertainty and incompleteness associated with data in real-world applications. The construction process of the classifier consists of two phases. The first phase is to formulate the model at the greatest level of granularity, while the later stage aims to reduce the complexity of the constructed model and deduce it from data at higher abstraction levels. Experimental outcomes conducted comprehensively on both synthetic and real datasets indicated the efficiency of our method in terms of training time and predictive performance in comparison to other types of fuzzy min-max neural networks and common machine learning algorithms.

artificial intelligence, fuzzy logic, machine learning, (16 more...)

arXiv.org Machine Learning

1905.1217

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

K Means Clustering with Dask (Image Filters for Cat Pictures) Data Stuff

#artificialintelligenceJun-2-2019, 19:56:17 GMT

Applying filters to images is not a new concept to anyone. We take a picture, make a few changes to it, and now it looks cooler. But where does Artificial Intelligence come in? Let's try out a fun use for Unsupervised Machine Learning with K Means Clustering in Python. I've written before about K Means Clustering, so I will assume you're familiar with the algorithm this time.

artificial intelligence, machine learning, means clustering, (9 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.97)

Add feedback

Similarity Preserving Representation Learning for Time Series Clustering

Lei, Qi, Yi, Jinfeng, Vaculin, Roman, Wu, Lingfei, Dhillon, Inderjit S.

arXiv.org Artificial IntelligenceJun-2-2019

A considerable amount of clustering algorithms take instance-feature matrices as their inputs. As such, they cannot directly analyze time series data due to its temporal nature, usually unequal lengths, and complex properties. This is a great pity since many of these algorithms are effective, robust, efficient, and easy to use. In this paper, we bridge this gap by proposing an efficient representation learning framework that is able to convert a set of time series with various lengths to an instance-feature matrix. In particular, we guarantee that the pairwise similarities between time series are well preserved after the transformation, thus the learned feature representation is particularly suitable for the time series clustering task. Given a set of $n$ time series, we first construct an $n\times n$ partially-observed similarity matrix by randomly sampling $\mathcal{O}(n \log n)$ pairs of time series and computing their pairwise similarities. We then propose an efficient algorithm that solves a non-convex and NP-hard problem to learn new features based on the partially-observed similarity matrix. By conducting extensive empirical studies, we show that the proposed framework is more effective, efficient, and flexible, compared to other state-of-the-art time series clustering methods.

artificial intelligence, machine learning, time sery, (17 more...)

arXiv.org Artificial Intelligence

1702.03584

Country:

North America > United States > Oklahoma > Beaver County (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Learning low-dimensional state embeddings and metastable clusters from time series data

Sun, Yifan, Duan, Yaqi, Gong, Hao, Wang, Mengdi

arXiv.org Machine LearningJun-1-2019

This paper studies how to find compact state embeddings from high-dimensional Markov state trajectories, where the transition kernel has a small intrinsic rank. In the spirit of diffusion map, we propose an efficient method for learning a low-dimensional state embedding and capturing the process's dynamics. This idea also leads to a kernel reshaping method for more accurate nonparametric estimation of the transition function. State embedding can be used to cluster states into metastable sets, thereby identifying the slow dynamics. Sharp statistical error bounds and misclassification rate are proved. Experiment on a simulated dynamical system shows that the state clustering method indeed reveals metastable structures. We also experiment with time series generated by layers of a Deep-Q-Network when playing an Atari game. The embedding method identifies game states to be similar if they share similar future events, even though their raw data are far different.

artificial intelligence, kernel, machine learning, (19 more...)

arXiv.org Machine Learning

1906.00302

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

Add feedback