AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Intelligent Optimization of Diversified Community Prevention of COVID-19 using Traditional Chinese Medicine

Zheng, Yu-Jun, Yu, Si-Lan, Yang, Jun-Chao, Gan, Tie-Er, Song, Qin, Yang, Jun, Karatas, Mumtaz

arXiv.org Artificial IntelligenceJul-27-2020

Traditional Chinese medicine (TCM) has played an important role in the prevention and control of the novel coronavirus pneumonia (COVID-19), and community prevention has become the most essential part in reducing the spread risk and protecting populations. However, most communities use a uniform TCM prevention program for all residents, which violates the "treatment based on syndrome differentiation" principle of TCM and limits the effectiveness of prevention. In this paper, we propose an intelligent optimization method to develop diversified TCM prevention programs for community residents. First, we use a fuzzy clustering method to divide the population based on both modern medicine and TCM health characteristics; we then use an interactive optimization method, in which TCM experts develop different TCM prevention programs for different clusters, and a heuristic algorithm is used to optimize the programs under the resource constraints. We demonstrate the computational efficiency of the proposed method and report its successful application to TCM-based prevention of COVID-19 in 12 communities in Zhejiang province, China, during the peak of the pandemic.

artificial intelligence, evolutionary algorithm, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2007.13926

Country:

Asia > China > Zhejiang Province > Hangzhou (0.06)
North America > United States > New York (0.04)
North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.88)

Add feedback

Graph Neural Network Based Coarse-Grained Mapping Prediction

Li, Zhiheng, Wellawatte, Geemi P., Chakraborty, Maghesree, Gandhi, Heta A., Xu, Chenliang, White, Andrew D.

arXiv.org Machine LearningJul-27-2020

The selection of coarse-grained (CG) mapping operators is a critical step for CG molecular dynamics (MD) simulation. It is still an open question about what is optimal for this choice and there is a need for theory. The current state-of-the art method is mapping operators manually selected by experts. In this work, we demonstrate an automated approach by viewing this problem as supervised learning where we seek to reproduce the mapping operators produced by experts. We present a graph neural network based CG mapping predictor called DEEP SUPERVISED GRAPH PARTITIONING MODEL(DSGPM) that treats mapping operators as a graph segmentation problem. DSGPM is trained on a novel dataset, Human-annotated Mappings (HAM), consisting of 1,206 molecules with expert annotated mapping operators. HAM can be used to facilitate further research in this area. Our model uses a novel metric learning objective to produce high-quality atomic features that are used in spectral clustering. The results show that the DSGPM outperforms state-of-the-art methods in the field of graph segmentation. Finally, we find that predicted CG mapping operators indeed result in good CG MD models when used in simulation.

artificial intelligence, machine learning, mapping, (13 more...)

arXiv.org Machine Learning

2007.04921

Country:

North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
North America > Canada > Quebec > Montreal (0.04)
Asia > Middle East > Qatar > Ad-Dawhah > Doha (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Bounded Fuzzy Possibilistic Method of Critical Objects Processing in Machine Learning

Yazdani, Hossein

arXiv.org Artificial IntelligenceJul-26-2020

Unsatisfying accuracy of learning methods is mostly caused by omitting the influence of important parameters such as membership assignments, type of data objects, and distance or similarity functions. The proposed method, called Bounded Fuzzy Possibilistic Method (BFPM) addresses different issues that previous clustering or classification methods have not sufficiently considered in their membership assignments. In fuzzy methods, the object's memberships should sum to 1. Hence, any data object may obtain full membership in at most one cluster or class. Possibilistic methods relax this condition, but the method can be satisfied with the results even if just an arbitrary object obtains the membership from just one cluster, which prevents the objects' movement analysis. Whereas, BFPM differs from previous fuzzy and possibilistic approaches by removing these restrictions. Furthermore, BFPM provides the flexible search space for objects' movement analysis. Data objects are also considered as fundamental keys in learning methods, and knowing the exact type of objects results in providing a suitable environment for learning algorithms. The Thesis introduces a new type of object, called critical, as well as categorizing data objects into two different categories: structural-based and behavioural-based. Critical objects are considered as causes of miss-classification and miss-assignment in learning procedures. The Thesis also proposes new methodologies to study the behaviour of critical objects with the aim of evaluating objects' movements (mutation) from one cluster or class to another. The Thesis also introduces a new type of feature, called dominant, that is considered as one of the causes of miss-classification and miss-assignments. Then the Thesis proposes new sets of similarity functions, called Weighted Feature Distance (WFD) and Prioritized Weighted Feature Distance (PWFD).

artificial intelligence, evolutionary algorithm, machine learning, (23 more...)

arXiv.org Artificial Intelligence

2007.13077

Country:

Oceania > Australia > Australian Capital Territory > Canberra (0.04)
South America > Paraguay > Asunción > Asunción (0.04)
North America > United States > Massachusetts > Middlesex County > Billerica (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.92)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Oncology (1.00)
Banking & Finance (1.00)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
(6 more...)

Add feedback

Dimensionality Reduction for $k$-means Clustering

Charalambides, Neophytos

arXiv.org Machine LearningJul-26-2020

Along with modern developments and the necessity of large high-dimensional datasets, which due to their nature result in overfitting of many machine learning algorithms, it is crucial that one reduces the complexity of the algorithms involving these datasets. The authors of [BDM09] are of the first to address the issue of clustering in such datasets with provably accurate approximation results, by proposing a simple pre-processing step to the " k-means" clustering algorithm; also known as Lloyd's method [Llo82] -- probably the most widely used and popular clustering algorithm.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

2007.13185

Country: North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Deep Embedded Multi-view Clustering with Collaborative Training

Xu, Jie, Ren, Yazhou, Li, Guofeng, Pan, Lili, Zhu, Ce, Xu, Zenglin

arXiv.org Machine LearningJul-26-2020

Multi-view clustering has attracted increasing attentions recently by utilizing information from multiple views. However, existing multi-view clustering methods are either with high computation and space complexities, or lack of representation capability. To address these issues, we propose deep embedded multi-view clustering with collaborative training (DEMVC) in this paper. Firstly, the embedded representations of multiple views are learned individually by deep autoencoders. Then, both consensus and complementary of multiple views are taken into account and a novel collaborative training scheme is proposed. Concretely, the feature representations and cluster assignments of all views are learned collaboratively. A new consistency strategy for cluster centers initialization is further developed to improve the multi-view clustering performance with collaborative training. Experimental results on several popular multi-view datasets show that DEMVC achieves significant improvements over state-of-the-art methods.

artificial intelligence, machine learning, multiple view, (19 more...)

arXiv.org Machine Learning

2007.13067

Country:

Asia > China > Guangdong Province > Shenzhen (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Sichuan Province > Chengdu (0.04)
Asia > China > Heilongjiang Province > Harbin (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Overview of Clustering Algorithms

#artificialintelligenceJul-25-2020, 07:05:50 GMT

Clustering is an unsupervised technique in which the set of similar data points is grouped together to form a cluster. A Cluster is said to be good if the intra-cluster (the data points within the same cluster) similarity is high and the inter-cluster (the data points outside the cluster) similarity is low. Clustering could also be viewed as a Data Compression technique in which the data points of a cluster can be treated as a group. Clustering is also called Data Segmentation because it partitions the data such that a group of similar data points forms a cluster. Classification Algorithms are good techniques to distinguish between groups and classify.

artificial intelligence, clustering, machine learning, (17 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.74)

Add feedback

Joint Featurewise Weighting and Lobal Structure Learning for Multi-view Subspace Clustering

Lina, Shi-Xun, Zhongb, Guo, Shu, Ting

arXiv.org Machine LearningJul-24-2020

Multi-view clustering integrates multiple feature sets, which reveal distinct aspects of the data and provide complementary information to each other, to improve the clustering performance. It remains challenging to effectively exploit complementary information across multiple views since the original data often contain noise and are highly redundant. Moreover, most existing multi-view clustering methods only aim to explore the consistency of all views while ignoring the local structure of each view. However, it is necessary to take the local structure of each view into consideration, because different views would present different geometric structures while admitting the same cluster structure. To address the above issues, we propose a novel multi-view subspace clustering method via simultaneously assigning weights for different features and capturing local information of data in view-specific self-representation feature spaces. Especially, a common cluster structure regularization is adopted to guarantee consistency among different views. An efficient algorithm based on an augmented Lagrangian multiplier is also developed to solve the associated optimization problem. Experiments conducted on several benchmark datasets demonstrate that the proposed method achieves state-of-the-art performance. We provide the Matlab code on https://github.com/Ekin102003/JFLMSC.

artificial intelligence, machine learning, subspace, (18 more...)

arXiv.org Machine Learning

2007.12829

Country:

Asia > Macao (0.04)
Asia > China (0.04)
Oceania > Australia > Queensland > Brisbane (0.04)
(3 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.88)

Add feedback

Scaling Graph Clustering with Distributed Sketches

Priest, Benjamin W., Dunton, Alec, Sanders, Geoffrey

arXiv.org Machine LearningJul-24-2020

The unsupervised learning of community structure, in particular the partitioning vertices into clusters or communities, is a canonical and well-studied problem in exploratory graph analysis. However, like most graph analyses the introduction of immense scale presents challenges to traditional methods. Spectral clustering in distributed memory, for example, requires hundreds of expensive bulk-synchronous communication rounds to compute an embedding of vertices to a few eigenvectors of a graph associated matrix. Furthermore, the whole computation may need to be repeated if the underlying graph changes some low percentage of edge updates. We present a method inspired by spectral clustering where we instead use matrix sketches derived from random dimension-reducing projections. We show that our method produces embeddings that yield performant clustering results given a fully-dynamic stochastic block model stream using both the fast Johnson-Lindenstrauss and CountSketch transforms. We also discuss the effects of stochastic block model parameters upon the required dimensionality of the subsequent embeddings, and show how random projections could significantly improve the performance of graph clustering in distributed memory.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Machine Learning

2007.12669

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Afghanistan > Parwan Province > Charikar (0.04)
North America > United States > Colorado > Boulder County > Boulder (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Approximately Optimal Binning for the Piecewise Constant Approximation of the Normalized Unexplained Variance (nUV) Dissimilarity Measure

Fazekas, Attila, Kovács, György

arXiv.org Machine LearningJul-24-2020

The recently introduced Matching by Tone Mapping (MTM) dissimilarity measure enables template matching under smooth non-linear distortions and also has a well-established mathematical background. MTM operates by binning the template, but the ideal binning for a particular problem is an open question. By pointing out an important analogy between the well known mutual information (MI) and MTM, we introduce the term "normalized unexplained variance" (nUV) for MTM to emphasize its relevance and applicability beyond image processing. Then, we provide theoretical results on the optimal binning technique for the nUV measure and propose algorithms to find approximate solutions. The theoretical findings are supported by numerical experiments. Using the proposed techniques for binning shows 4-13% increase in terms of AUC scores with statistical significance, enabling us to conclude that the proposed binning techniques have the potential to improve the performance of the nUV measure in real applications.

artificial intelligence, distortion, machine learning, (19 more...)

arXiv.org Machine Learning

2007.12463

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Hungary > Hajdú-Bihar County > Debrecen (0.04)

Genre: Research Report > Experimental Study (0.48)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Add feedback

Cluster Analysis

#artificialintelligenceJul-23-2020, 08:25:05 GMT

We are familiar with most of the supervised learning methods, for example, linear regression, logistic regression, decision trees, SVM so on… where for an input we have an associated output/label. When we have a problem in which we have input but no associated output/label such kind of learning is known as unsupervised learning. One mechanism that we may use in this context is cluster analysis or clustering. Definition 1: Cluster analysis is a multivariate statistical technique. It group's observations on the basis some of their features or variables they are described by.! Definition 2: Cluster analysis observations in a data set can be divided into different groups and is very useful.

artificial intelligence, euclidean distance, machine learning, (9 more...)

#artificialintelligence

Country:

Oceania > Australia (0.06)
North America > United States (0.06)
North America > Canada (0.06)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.77)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.58)

Add feedback