Goto

Collaborating Authors

 Clustering


Spectral learning of multivariate extremes

arXiv.org Machine Learning

We propose a spectral clustering algorithm for analyzing the dependence structure of multivariate extremes. More specifically, we focus on the asymptotic dependence of multivariate extremes characterized by the angular or spectral measure in extreme value theory. Our work studies the theoretical performance of spectral clustering based on a random $k$-nearest neighbor graph constructed from an extremal sample, i.e., the angular part of random vectors for which the radius exceeds a large threshold. In particular, we derive the asymptotic distribution of extremes arising from a linear factor model and prove that, under certain conditions, spectral clustering can consistently identify the clusters of extremes arising in this model. Leveraging this result we propose a simple consistent estimation strategy for learning the angular measure. Our theoretical findings are complemented with numerical experiments illustrating the finite sample performance of our methods.


MACHINE LEARNING FOR BEGINNERS

#artificialintelligence

Hi everyone, I'm reposting all of my old blogs, as my account got hacked. This blog was originally published on March 6, 2019. Within the field of machine learning, there are two main types of tasks: supervised, and unsupervised. The main difference between the two types is that supervised learning is done using a ground truth, or in other words, we have prior knowledge of what the output values for our samples should be. Therefore, the goal of supervised learning is to learn a function that, given a sample of data and desired outputs, best approximates the relationship between input and output observable in the data.


Customer Segmentation With Clustering

#artificialintelligence

Let's say that you work with the sales and marketing team to reach your company's pre-set goals. While your company is doing well in terms of generating revenue and retaining customers, you can not help but think that it can do better. As things stand, the advertisements, promotions, and special offers are homogenous across all customers, which is a serious issue. Engaging with customers in a manner that they won't be receptive to is tantamount to wasting your advertising budget. After all, you don't want your company to spend its limited budget sending diaper coupons to college students or advertising gaming consoles to elderly women.


Machine Learning and Deep Learning A-Z: Hands-On Python

#artificialintelligence

Learn Machine Learning with Hands-On Examples What is Machine Learning? Machine Learning Terminology Evaluation Metrics for Python machine learning, Python Deep learning What are Classification vs Regression? Evaluating Performance-Classification Error Metrics Evaluating Performance-Regression Error Metrics Cross Validation and Bias Variance Trade-Off Use matplotlib and seaborn for data visualizations Machine Learning with SciKit Learn Linear Regression Algorithm Logistic Regresion Algorithm K Nearest Neighbors Algorithm Decision Trees And Random Forest Algorithm Support Vector Machine Algorithm Unsupervised Learning K Means Clustering Algorithm Hierarchical Clustering Algorithm Principal Component Analysis (PCA) Recommender System Algorithm Python, python machine learning and deep learning Machine Learning, machine learning A-Z Deep Learning, Deep learning a-z Machine learning is constantly being applied to new industries and new problems. Whether you're a marketer, video game designer, or programmer Machine learning describes systems that make predictions using a model trained on real-world data. Machine learning is being applied to virtually every field today. That includes medical diagnoses, facial recognition, weather forecasts, image processing It's possible to use machine learning without coding, but building new systems generally requires code. What is the best language for machine learning? Python is the most used language in machine learning. Engineers writing machine learning systems often use Jupyter Notebooks and Python together. Machine learning is generally divided between supervised machine learning and unsupervised machine learning. Python instructors on Udemy specialize in everything from software development to data analysis, and are known for their effective, friendly instruction What are the limitations of Python? Python is a widely used, general-purpose programming language, but it has some limitations.


Hierarchical clustering by aggregating representatives in sub-minimum-spanning-trees

arXiv.org Artificial Intelligence

One of the main challenges for hierarchical clustering is how to appropriately identify the representative points in the lower level of the cluster tree, which are going to be utilized as the roots in the higher level of the cluster tree for further aggregation. However, conventional hierarchical clustering approaches have adopted some simple tricks to select the "representative" points which might not be as representative as enough. Thus, the constructed cluster tree is less attractive in terms of its poor robustness and weak reliability. Aiming at this issue, we propose a novel hierarchical clustering algorithm, in which, while building the clustering dendrogram, we can effectively detect the representative point based on scoring the reciprocal nearest data points in each sub-minimum-spanning-tree. Extensive experiments on UCI datasets show that the proposed algorithm is more accurate than other benchmarks. Meanwhile, under our analysis, the proposed algorithm has O(nlogn) time-complexity and O(logn) space-complexity, indicating that it has the scalability in handling massive data with less time and storage consumptions.


Personalized multi-faceted trust modeling to determine trust links in social media and its potential for misinformation management

arXiv.org Artificial Intelligence

In this paper, we present an approach for predicting trust links between peers in social media, one that is grounded in the artificial intelligence area of multiagent trust modeling. In particular, we propose a data-driven multi-faceted trust modeling which incorporates many distinct features for a comprehensive analysis. We focus on demonstrating how clustering of similar users enables a critical new functionality: supporting more personalized, and thus more accurate predictions for users. Illustrated in a trust-aware item recommendation task, we evaluate the proposed framework in the context of a large Yelp dataset. We then discuss how improving the detection of trusted relationships in social media can assist in supporting online users in their battle against the spread of misinformation and rumours, within a social networking environment which has recently exploded in popularity. We conclude with a reflection on a particularly vulnerable user base, older adults, in order to illustrate the value of reasoning about groups of users, looking to some future directions for integrating known preferences with insights gained through data analysis.


Deep Attention-guided Graph Clustering with Dual Self-supervision

arXiv.org Artificial Intelligence

Existing deep embedding clustering works only consider the deepest layer to learn a feature embedding and thus fail to well utilize the available discriminative information from cluster assignments, resulting performance limitation. To this end, we propose a novel method, namely deep attention-guided graph clustering with dual self-supervision (DAGC). Specifically, DAGC first utilizes a heterogeneity-wise fusion module to adaptively integrate the features of an auto-encoder and a graph convolutional network in each layer and then uses a scale-wise fusion module to dynamically concatenate the multi-scale features in different layers. Such modules are capable of learning a discriminative feature embedding via an attention-based mechanism. In addition, we design a distribution-wise fusion module that leverages cluster assignments to acquire clustering results directly. To better explore the discriminative information from the cluster assignments, we develop a dual self-supervision solution consisting of a soft self-supervision strategy with a triplet Kullback-Leibler divergence loss and a hard self-supervision strategy with a pseudo supervision loss. Extensive experiments validate that our method consistently outperforms state-of-the-art methods on six benchmark datasets. Especially, our method improves the ARI by more than 18.14% over the best baseline.


Customer Segmentation With Clustering

#artificialintelligence

Let's say that you work with the sales and marketing team to reach your company's pre-set goals. While your company is doing well in terms of generating revenue and retaining customers, you can not help but think that it can do better. As things stand, the advertisements, promotions, and special offers are homogenous across all customers, which is a serious issue. Engaging with customers in a manner that they won't be receptive to is tantamount to wasting your advertising budget. After all, you don't want your company to spend its limited budget sending diaper coupons to college students or advertising gaming consoles to elderly women.


Clustering of longitudinal data: A tutorial on a variety of approaches

arXiv.org Machine Learning

During the past two decades, methods for identifying groups with different trends in longitudinal data have become of increasing interest across many areas of research. To support researchers, we summarize the guidance from the literature regarding longitudinal clustering. Moreover, we present a selection of methods for longitudinal clustering, including group-based trajectory modeling (GBTM), growth mixture modeling (GMM), and longitudinal k-means (KML). The methods are introduced at a basic level, and strengths, limitations, and model extensions are listed. Following the recent developments in data collection, attention is given to the applicability of these methods to intensive longitudinal data (ILD). We demonstrate the application of the methods on a synthetic dataset using packages available in R.


Learning Numerical Action Models from Noisy Input Data

arXiv.org Artificial Intelligence

This paper presents the PlanMiner-N algorithm, a domain learning technique based on the PlanMiner domain learning algorithm. The algorithm presented here improves the learning capabilities of PlanMiner when using noisy data as input. The PlanMiner algorithm is able to infer arithmetic and logical expressions to learn numerical planning domains from the input data, but it was designed to work under situations of incompleteness making it unreliable when facing noisy input data. In this paper, we propose a series of enhancements to the learning process of PlanMiner to expand its capabilities to learn from noisy data. These methods preprocess the input data by detecting noise and filtering it and study the learned action models learned to find erroneous preconditions/effects in them. The methods proposed in this paper were tested using a set of domains from the International Planning Competition (IPC). The results obtained indicate that PlanMiner-N improves the performance of PlanMiner greatly when facing noisy input data.