AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Spectral learning of multivariate extremes

Medina, Marco Avella, Davis, Richard A., Samorodnitsky, Gennady

arXiv.org Machine LearningNov-15-2021

We propose a spectral clustering algorithm for analyzing the dependence structure of multivariate extremes. More specifically, we focus on the asymptotic dependence of multivariate extremes characterized by the angular or spectral measure in extreme value theory. Our work studies the theoretical performance of spectral clustering based on a random $k$-nearest neighbor graph constructed from an extremal sample, i.e., the angular part of random vectors for which the radius exceeds a large threshold. In particular, we derive the asymptotic distribution of extremes arising from a linear factor model and prove that, under certain conditions, spectral clustering can consistently identify the clusters of extremes arising in this model. Leveraging this result we propose a simple consistent estimation strategy for learning the angular measure. Our theoretical findings are complemented with numerical experiments illustrating the finite sample performance of our methods.

algorithm, graph, spectral, (13 more...)

arXiv.org Machine Learning

2111.07799

Country:

North America > United States > New York (0.04)
Europe > United Kingdom > England > West Yorkshire > Leeds (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.81)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.66)

Add feedback

MACHINE LEARNING FOR BEGINNERS

#artificialintelligenceNov-14-2021, 04:30:10 GMT

Hi everyone, I'm reposting all of my old blogs, as my account got hacked. This blog was originally published on March 6, 2019. Within the field of machine learning, there are two main types of tasks: supervised, and unsupervised. The main difference between the two types is that supervised learning is done using a ground truth, or in other words, we have prior knowledge of what the output values for our samples should be. Therefore, the goal of supervised learning is to learn a function that, given a sample of data and desired outputs, best approximates the relationship between input and output observable in the data.

complexity, learning, supervised learning, (13 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.32)

Add feedback

Customer Segmentation With Clustering

#artificialintelligenceNov-12-2021, 11:00:42 GMT

Let's say that you work with the sales and marketing team to reach your company's pre-set goals. While your company is doing well in terms of generating revenue and retaining customers, you can not help but think that it can do better. As things stand, the advertisements, promotions, and special offers are homogenous across all customers, which is a serious issue. Engaging with customers in a manner that they won't be receptive to is tantamount to wasting your advertising budget. After all, you don't want your company to spend its limited budget sending diaper coupons to college students or advertising gaming consoles to elderly women.

consumer population, customer, customer segmentation, (15 more...)

#artificialintelligence

Industry: Education > Educational Setting > Higher Education (0.57)

Technology:

Information Technology > Data Science (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.32)

Add feedback

Machine Learning and Deep Learning A-Z: Hands-On Python

#artificialintelligenceNov-11-2021, 02:15:10 GMT

Learn Machine Learning with Hands-On Examples What is Machine Learning? Machine Learning Terminology Evaluation Metrics for Python machine learning, Python Deep learning What are Classification vs Regression? Evaluating Performance-Classification Error Metrics Evaluating Performance-Regression Error Metrics Cross Validation and Bias Variance Trade-Off Use matplotlib and seaborn for data visualizations Machine Learning with SciKit Learn Linear Regression Algorithm Logistic Regresion Algorithm K Nearest Neighbors Algorithm Decision Trees And Random Forest Algorithm Support Vector Machine Algorithm Unsupervised Learning K Means Clustering Algorithm Hierarchical Clustering Algorithm Principal Component Analysis (PCA) Recommender System Algorithm Python, python machine learning and deep learning Machine Learning, machine learning A-Z Deep Learning, Deep learning a-z Machine learning is constantly being applied to new industries and new problems. Whether you're a marketer, video game designer, or programmer Machine learning describes systems that make predictions using a model trained on real-world data. Machine learning is being applied to virtually every field today. That includes medical diagnoses, facial recognition, weather forecasts, image processing It's possible to use machine learning without coding, but building new systems generally requires code. What is the best language for machine learning? Python is the most used language in machine learning. Engineers writing machine learning systems often use Jupyter Notebooks and Python together. Machine learning is generally divided between supervised machine learning and unsupervised machine learning. Python instructors on Udemy specialize in everything from software development to data analysis, and are known for their effective, friendly instruction What are the limitations of Python? Python is a widely used, general-purpose programming language, but it has some limitations.

machine learning, programming language, python, (9 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (0.70)

Industry:

Information Technology (1.00)
Education > Educational Setting > Online (0.91)
Leisure & Entertainment > Games > Computer Games (0.69)
Education > Educational Technology > Educational Software > Computer Based Training (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.91)

Add feedback

Hierarchical clustering by aggregating representatives in sub-minimum-spanning-trees

Xie, Wen-Bo, Liu, Zhen, Srivastava, Jaideep

arXiv.org Artificial IntelligenceNov-11-2021

One of the main challenges for hierarchical clustering is how to appropriately identify the representative points in the lower level of the cluster tree, which are going to be utilized as the roots in the higher level of the cluster tree for further aggregation. However, conventional hierarchical clustering approaches have adopted some simple tricks to select the "representative" points which might not be as representative as enough. Thus, the constructed cluster tree is less attractive in terms of its poor robustness and weak reliability. Aiming at this issue, we propose a novel hierarchical clustering algorithm, in which, while building the clustering dendrogram, we can effectively detect the representative point based on scoring the reciprocal nearest data points in each sub-minimum-spanning-tree. Extensive experiments on UCI datasets show that the proposed algorithm is more accurate than other benchmarks. Meanwhile, under our analysis, the proposed algorithm has O(nlogn) time-complexity and O(logn) space-complexity, indicating that it has the scalability in handling massive data with less time and storage consumptions.

algorithm, iteration, node, (14 more...)

arXiv.org Artificial Intelligence

2111.06968

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)
North America > United States > California > Orange County > Irvine (0.14)
Asia > China > Sichuan Province > Chengdu (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Personalized multi-faceted trust modeling to determine trust links in social media and its potential for misinformation management

Parmentier, Alexandre, Cohen, Robin, Ma, Xueguang, Sahu, Gaurav, Chen, Queenie

arXiv.org Artificial IntelligenceNov-11-2021

In this paper, we present an approach for predicting trust links between peers in social media, one that is grounded in the artificial intelligence area of multiagent trust modeling. In particular, we propose a data-driven multi-faceted trust modeling which incorporates many distinct features for a comprehensive analysis. We focus on demonstrating how clustering of similar users enables a critical new functionality: supporting more personalized, and thus more accurate predictions for users. Illustrated in a trust-aware item recommendation task, we evaluate the proposed framework in the context of a large Yelp dataset. We then discuss how improving the detection of trusted relationships in social media can assist in supporting online users in their battle against the spread of misinformation and rumours, within a social networking environment which has recently exploded in popularity. We conclude with a reflection on a particularly vulnerable user base, older adults, in order to illustrate the value of reasoning about groups of users, looking to some future directions for integrating known preferences with insights gained through data analysis.

agent, indicator, prediction, (15 more...)

arXiv.org Artificial Intelligence

2111.0644

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.04)
North America > United States > Illinois (0.04)
North America > Canada > Ontario > National Capital Region > Ottawa (0.04)
Asia > China (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.67)

Industry:

Media > News (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
(2 more...)

Add feedback

Deep Attention-guided Graph Clustering with Dual Self-supervision

Peng, Zhihao, Liu, Hui, Jia, Yuheng, Hou, Junhui

arXiv.org Artificial IntelligenceNov-10-2021

Existing deep embedding clustering works only consider the deepest layer to learn a feature embedding and thus fail to well utilize the available discriminative information from cluster assignments, resulting performance limitation. To this end, we propose a novel method, namely deep attention-guided graph clustering with dual self-supervision (DAGC). Specifically, DAGC first utilizes a heterogeneity-wise fusion module to adaptively integrate the features of an auto-encoder and a graph convolutional network in each layer and then uses a scale-wise fusion module to dynamically concatenate the multi-scale features in different layers. Such modules are capable of learning a discriminative feature embedding via an attention-based mechanism. In addition, we design a distribution-wise fusion module that leverages cluster assignments to acquire clustering results directly. To better explore the discriminative information from the cluster assignments, we develop a dual self-supervision solution consisting of a soft self-supervision strategy with a triplet Kullback-Leibler divergence loss and a hard self-supervision strategy with a pseudo supervision loss. Extensive experiments validate that our method consistently outperforms state-of-the-art methods on six benchmark datasets. Especially, our method improves the ARI by more than 18.14% over the best baseline.

assignment, information, module, (14 more...)

arXiv.org Artificial Intelligence

2111.05548

Country:

Europe > Austria > Vienna (0.14)
North America > United States > New York > New York County > New York City (0.04)
Asia > South Korea > Seoul > Seoul (0.04)
(12 more...)

Genre: Research Report > Promising Solution (0.68)

Industry: Government (0.31)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Customer Segmentation With Clustering

#artificialintelligenceNov-9-2021, 00:15:06 GMT

consumer population, customer, customer segmentation, (15 more...)

#artificialintelligence

Industry: Education > Educational Setting > Higher Education (0.57)

Technology:

Information Technology > Data Science (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.32)

Add feedback

Clustering of longitudinal data: A tutorial on a variety of approaches

Teuling, Niek Den, Pauws, Steffen, Heuvel, Edwin van den

arXiv.org Machine LearningNov-9-2021

During the past two decades, methods for identifying groups with different trends in longitudinal data have become of increasing interest across many areas of research. To support researchers, we summarize the guidance from the literature regarding longitudinal clustering. Moreover, we present a selection of methods for longitudinal clustering, including group-based trajectory modeling (GBTM), growth mixture modeling (GMM), and longitudinal k-means (KML). The methods are introduced at a basic level, and strengths, limitations, and model extensions are listed. Following the recent developments in data collection, attention is given to the applicability of these methods to intensive longitudinal data (ILD). We demonstrate the application of the methods on a synthetic dataset using packages available in R.

longitudinal data, modeling, trajectory, (16 more...)

arXiv.org Machine Learning

2111.05469

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > Austria > Vienna (0.14)
North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
(11 more...)

Genre:

Research Report > Experimental Study (1.00)
Overview (1.00)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Consumer Health (1.00)
Health & Medicine > Epidemiology (0.67)
(2 more...)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
(3 more...)

Add feedback

Learning Numerical Action Models from Noisy Input Data

Segura-Muros, José Á., Fernández-Olivares, Juan, Pérez, Raúl

arXiv.org Artificial IntelligenceNov-9-2021

This paper presents the PlanMiner-N algorithm, a domain learning technique based on the PlanMiner domain learning algorithm. The algorithm presented here improves the learning capabilities of PlanMiner when using noisy data as input. The PlanMiner algorithm is able to infer arithmetic and logical expressions to learn numerical planning domains from the input data, but it was designed to work under situations of incompleteness making it unreliable when facing noisy input data. In this paper, we propose a series of enhancements to the learning process of PlanMiner to expand its capabilities to learn from noisy data. These methods preprocess the input data by detecting noise and filtering it and study the learned action models learned to find erroneous preconditions/effects in them. The methods proposed in this paper were tested using a set of domains from the International Planning Competition (IPC). The results obtained indicate that PlanMiner-N improves the performance of PlanMiner greatly when facing noisy input data.

algorithm, noise, planminer-n, (16 more...)

arXiv.org Artificial Intelligence

2111.04997

Country:

South America > Peru > Callao Department > Callao (0.04)
North America > United States > California > Alameda County > Oakland (0.04)
Europe > Spain > Andalusia (0.04)
Europe > Italy > Trentino-Alto Adige/Südtirol > Trentino Province > Trento (0.04)

Genre:

Workflow (0.68)
Research Report > New Finding (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)

Add feedback