AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

How to Train a Machine Learning Model in JASP: Clustering - JASP - Free and User-Friendly Statistical Software

#artificialintelligenceFeb-6-2020, 16:47:32 GMT

This is a continuation of our series on machine learning methods that have been implemented in JASP (version 0.11 onwards). In this blog post we train a machine learning model to find clusters within our data set. The goal of a clustering task is to detect structures in the data. To do so, the algorithm needs to (1) identify the number of structures/groups in the data, and (2) figure out how the features are distributed in each group. For instance, clustering can be used to detect subgenres in electronic music, subgroups in a customer database, or to identify areas where there are greater incidences of particular types of crime.

algorithm, centroid, clustering, (13 more...)

#artificialintelligence

Country:

North America > United States > Indiana > Hamilton County > Fishers (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.72)

Add feedback

IIT Madras and Queen's University Belfast develop technology to make Artificial Intelligence fairer

#artificialintelligenceFeb-6-2020, 12:32:34 GMT

Indian Institute of Technology Madras students were part of an international research project led by a Queen's University Belfast Researcher in the U.K. who has developed an innovative new algorithm to make Artificial Intelligence (AI) fairer and less biased when processing data. Dr. Deepak Padmanabhan, Researcher at Queen's University Belfast and Adjunct Faculty Member at IIT Madras, has been leading an international project, working with Ms. Savitha Abraham and Ms. Sowmya Sundaram, PhD Students, Department of Computer Science and Engineering, IIT Madras, to tackle the discrimination problem within clustering algorithms. Companies often use AI technologies to sift through huge amounts of data in situations such as an oversubscribed job vacancy or in policing when there is a large volume of CCTV data linked to a crime. However, while AI can save on time, the process is often biased in terms of race, gender, age, religion and country of origin. Dr. Padmanabhan said that AI techniques for exploratory data analysis, known as'clustering algorithms', are often criticised as being biased in terms of'sensitive attributes' such as race, gender, age, religion and country of origin.

algorithm, iit madra, make artificial intelligence, (9 more...)

#artificialintelligence

Country:

Europe > United Kingdom (0.26)
Asia > India (0.19)
Europe > Denmark > Capital Region > Copenhagen (0.06)

Industry:

Government (0.77)
Media > News (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.60)

Add feedback

Who Is Your Golden Goose? Learn With Cohort Analysis

#artificialintelligenceFeb-5-2020, 23:22:55 GMT

Customer segmentation is the technique of diving customers into groups based on their purchase patterns to identify who are the most profitable groups. In segmenting customers, various criteria can also be used depending on the market such as geographic, demographic characteristics or behavior bases. This technique assumes that groups with different features require different approaches to marketing and wants to figure out the groups who can boost their profitability the most. Today, we are going to discuss how to do customer segmentation analysis with the online retail dataset from UCI ML repo. This analysis will be focused on two steps getting the RFM values and making clusters with K-means algorithms.

customer, recency, rfm value, (15 more...)

#artificialintelligence

Industry: Retail (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.32)

Add feedback

Provable Noisy Sparse Subspace Clustering using Greedy Neighbor Selection: A Coherence-Based Perspective

Wu, Jwo-Yuh, Li, Wen-Hsuan, Huang, Liang-Chi, Lin, Yen-Ping, Liu, Chun-Hung, Gau, Rung-Hung

arXiv.org Machine LearningFeb-2-2020

Sparse subspace clustering (SSC) using greedy-based neighbor selection, such as matching pursuit (MP) and orthogonal matching pursuit (OMP), has been known as a popular computationally-efficient alternative to the conventional L1-minimization based methods. Under deterministic bounded noise corruption, in this paper we derive coherence-based sufficient conditions guaranteeing correct neighbor identification using MP/OMP. Our analyses exploit the maximum/minimum inner product between two noisy data points subject to a known upper bound on the noise level. The obtained sufficient condition clearly reveals the impact of noise on greedy-based neighbor recovery. Specifically, it asserts that, as long as noise is sufficiently small so that the resultant perturbed residual vectors stay close to the desired subspace, both MP and OMP succeed in returning a correct neighbor subset. A striking finding is that, when the ground truth subspaces are well-separated from each other and noise is not large, MP-based iterations, while enjoying lower algorithmic complexity, yield smaller perturbation of residuals, thereby better able to identify correct neighbors and, in turn, achieving higher global data clustering accuracy. Extensive numerical experiments are used to corroborate our theoretical study.

iteration, nullnullnullnull, subspace, (13 more...)

arXiv.org Machine Learning

2002.00401

Country:

North America > United States > Mississippi (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Taiwan (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

Add feedback

Classifying Data Using Artificial Intelligence K-Means Clustering Algorithm

#artificialintelligenceJan-30-2020, 15:59:50 GMT

"The primary aim of clustering is not just to make clusters, but to make good and meaningful ones" – Analytics Vidhya (https://www.analyticsvidhya.com/). An optimal multi-objective clustering is one of the most popular, and, at the same time, curious supervised machine learning problems, that occurs in many fields of computer science such as data and knowledge mining, data compression, vector quantization, patterns detection and classification, Voronoi diagrams, recommender engines (RE), etc. The process of clustering analysis itself allows us to reveal various of trends and insights exhibited on the input dataset. The cluster analysis (CA) process allows us to determine the similarities and differences between specific data, partitioning the data in such as a way that the similar data normally belongs to a specific group or cluster. For example, we can perform the clustering analysis of the data on a credit card customer to reveal what special offers should be given to a specific customer, based on the balance and loan amount criteria. In this case, all that we have to do is to partition all customers data into the number of clusters, and, then give the same offer to the similar customers. This is typically done by performing the multi-variate numerical data the multi-variate numerical data clustering analysis. The main goal of performing the actual clustering is to arrange a set of data items having an associated numeric n-dimensional vector of features into the number of homogeneous groups, called - "clusters".

algorithm, centroid, k-means algorithm, (17 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Blocked Clusterwise Regression

Cytrynbaum, Max

arXiv.org Machine LearningJan-29-2020

Such models have been shown to allow estimation and inference by regression clustering methods. This paper is motivated by the finding that the clustered heterogeneity models studied in this literature can be badly misspecified, even when the panel has significant discrete cross-sectional structure. To address this issue, we generalize previous approaches to discrete unobserved heterogeneity by allowing each unit to have multiple, imperfectly-correlated latent variables that describe its response-type to different covariates. We give inference results for a k-means style estimator of our model and develop information criteria to jointly select the number clusters for each latent variable. Monte Carlo simulations confirm our theoretical results and give intuition about the finite-sample performance of estimation and model selection. We also contribute to the theory of clustering with an over-specified number of clusters and derive new convergence rates for this setting. Our results suggest that over-fitting can be severe in k-means style estimators when the number of clusters is over-specified.

artificial intelligence, assumption 3, machine learning, (18 more...)

arXiv.org Machine Learning

2001.1113

Country:

South America > Venezuela (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (0.86)

Industry: Banking & Finance (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.89)

Add feedback

Python: Implementing a k-means algorithm with sklearn

#artificialintelligenceJan-28-2020, 09:31:19 GMT

Originally posted by Michael Grogan. The below is an example of how sklearn in Python can be used to develop a k-means clustering algorithm. The purpose of k-means clustering is to be able to partition observations in a dataset into a specific number of clusters in order to aid in analysis of the data. From this perspective, it has particular value from a data visualisation perspective. The particular example used here is that of stock returns.

algorithm, sklearn, stock return, (7 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Survey of Network Intrusion Detection Methods from the Perspective of the Knowledge Discovery in Databases Process

Molina-Coronado, Borja, Mori, Usue, Mendiburu, Alexander, Miguel-Alonso, José

arXiv.org Artificial IntelligenceJan-27-2020

The identification of cyberattacks which target information and communication systems has been a focus of the research community for years. Network intrusion detection is a complex problem which presents a diverse number of challenges. Many attacks currently remain undetected, while newer ones emerge due to the proliferation of connected devices and the evolution of communication technology. In this survey, we review the methods that have been applied to network data with the purpose of developing an intrusion detector, but contrary to previous reviews in the area, we analyze them from the perspective of the Knowledge Discovery in Databases (KDD) process. As such, we discuss the techniques used for the capture, preparation and transformation of the data, as well as, the data mining and evaluation methods. In addition, we also present the characteristics and motivations behind the use of each of these techniques and propose more adequate and up-to-date taxonomies and definitions for intrusion detectors based on the terminology used in the area of data mining and KDD. Special importance is given to the evaluation procedures followed to assess the different detectors, discussing their applicability in current real networks. Finally, as a result of this literature review, we investigate some open issues which will need to be considered for further research in the area of network security.

data mining, detection, machine learning, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TNSM.2020.3016246

2001.09697

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Spain > Basque Country (0.04)
Oceania > New Zealand > North Island > Waikato (0.04)
(8 more...)

Genre: Overview (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Information Technology > Networks (1.00)
Government > Military > Cyberwarfare (0.34)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
(4 more...)

Add feedback

Comprehensive Analysis of Time Series Forecasting Using Neural Networks

Tadayon, Manie, Iwashita, Yumi

arXiv.org Machine LearningJan-26-2020

Time series forecasting has gained lots of attention recently; this is because many real-world phenomena can be modeled as time series. The massive volume of data and recent advancements in the processing power of the computers enable researchers to develop more sophisticated machine learning algorithms such as neural networks to forecast the time series data. In this paper, we propose various neural network architectures to forecast the time series data using the dynamic measurements; moreover, we introduce various architectures on how to combine static and dynamic measurements for forecasting. We also investigate the importance of performing techniques such as anomaly detection and clustering on forecasting accuracy. Our results indicate that clustering can improve the overall prediction time as well as improve the forecasting performance of the neural network. Furthermore, we show that feature-based clustering can outperform the distance-based clustering in terms of speed and efficiency. Finally, our results indicate that adding more predictors to forecast the target variable will not necessarily improve the forecasting accuracy.

architecture, neural network, time sery, (15 more...)

arXiv.org Machine Learning

2001.09547

Country:

North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.05)
North America > United States > California > Los Angeles County > Pasadena (0.04)
Asia > Middle East > Republic of Türkiye (0.04)

Genre: Research Report > New Finding (0.54)

Industry: Energy (0.93)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.70)

Add feedback

Understanding K-Means Clustering using Python the easy way

#artificialintelligenceJan-25-2020, 14:15:00 GMT

In the previous article, we studied the k-NN. One thing that I believe is that if we can correlate anything with us or our lives, there are greater chances of understanding the concept. So I will try to explain everything by relating it to humans. It tries to make the inter-cluster data points as similar as possible while also keeping the clusters as different or as far as possible. It assigns data points to a cluster such that the sum of the squared distance between the data points and the cluster's centroid is at the minimum.

algorithm, centroid, classification, (13 more...)

#artificialintelligence

Industry:

Media > Music (0.40)
Leisure & Entertainment (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback