Goto

Collaborating Authors

 Clustering


Intelligent Optimization of Diversified Community Prevention of COVID-19 using Traditional Chinese Medicine

arXiv.org Artificial Intelligence

Traditional Chinese medicine (TCM) has played an important role in the prevention and control of the novel coronavirus pneumonia (COVID-19), and community prevention has become the most essential part in reducing the spread risk and protecting populations. However, most communities use a uniform TCM prevention program for all residents, which violates the "treatment based on syndrome differentiation" principle of TCM and limits the effectiveness of prevention. In this paper, we propose an intelligent optimization method to develop diversified TCM prevention programs for community residents. First, we use a fuzzy clustering method to divide the population based on both modern medicine and TCM health characteristics; we then use an interactive optimization method, in which TCM experts develop different TCM prevention programs for different clusters, and a heuristic algorithm is used to optimize the programs under the resource constraints. We demonstrate the computational efficiency of the proposed method and report its successful application to TCM-based prevention of COVID-19 in 12 communities in Zhejiang province, China, during the peak of the pandemic.


Graph Neural Network Based Coarse-Grained Mapping Prediction

arXiv.org Machine Learning

The selection of coarse-grained (CG) mapping operators is a critical step for CG molecular dynamics (MD) simulation. It is still an open question about what is optimal for this choice and there is a need for theory. The current state-of-the art method is mapping operators manually selected by experts. In this work, we demonstrate an automated approach by viewing this problem as supervised learning where we seek to reproduce the mapping operators produced by experts. We present a graph neural network based CG mapping predictor called DEEP SUPERVISED GRAPH PARTITIONING MODEL(DSGPM) that treats mapping operators as a graph segmentation problem. DSGPM is trained on a novel dataset, Human-annotated Mappings (HAM), consisting of 1,206 molecules with expert annotated mapping operators. HAM can be used to facilitate further research in this area. Our model uses a novel metric learning objective to produce high-quality atomic features that are used in spectral clustering. The results show that the DSGPM outperforms state-of-the-art methods in the field of graph segmentation. Finally, we find that predicted CG mapping operators indeed result in good CG MD models when used in simulation.


Bounded Fuzzy Possibilistic Method of Critical Objects Processing in Machine Learning

arXiv.org Artificial Intelligence

Unsatisfying accuracy of learning methods is mostly caused by omitting the influence of important parameters such as membership assignments, type of data objects, and distance or similarity functions. The proposed method, called Bounded Fuzzy Possibilistic Method (BFPM) addresses different issues that previous clustering or classification methods have not sufficiently considered in their membership assignments. In fuzzy methods, the object's memberships should sum to 1. Hence, any data object may obtain full membership in at most one cluster or class. Possibilistic methods relax this condition, but the method can be satisfied with the results even if just an arbitrary object obtains the membership from just one cluster, which prevents the objects' movement analysis. Whereas, BFPM differs from previous fuzzy and possibilistic approaches by removing these restrictions. Furthermore, BFPM provides the flexible search space for objects' movement analysis. Data objects are also considered as fundamental keys in learning methods, and knowing the exact type of objects results in providing a suitable environment for learning algorithms. The Thesis introduces a new type of object, called critical, as well as categorizing data objects into two different categories: structural-based and behavioural-based. Critical objects are considered as causes of miss-classification and miss-assignment in learning procedures. The Thesis also proposes new methodologies to study the behaviour of critical objects with the aim of evaluating objects' movements (mutation) from one cluster or class to another. The Thesis also introduces a new type of feature, called dominant, that is considered as one of the causes of miss-classification and miss-assignments. Then the Thesis proposes new sets of similarity functions, called Weighted Feature Distance (WFD) and Prioritized Weighted Feature Distance (PWFD).


Dimensionality Reduction for $k$-means Clustering

arXiv.org Machine Learning

Along with modern developments and the necessity of large high-dimensional datasets, which due to their nature result in overfitting of many machine learning algorithms, it is crucial that one reduces the complexity of the algorithms involving these datasets. The authors of [BDM09] are of the first to address the issue of clustering in such datasets with provably accurate approximation results, by proposing a simple pre-processing step to the " k-means" clustering algorithm; also known as Lloyd's method [Llo82] -- probably the most widely used and popular clustering algorithm.


Deep Embedded Multi-view Clustering with Collaborative Training

arXiv.org Machine Learning

Multi-view clustering has attracted increasing attentions recently by utilizing information from multiple views. However, existing multi-view clustering methods are either with high computation and space complexities, or lack of representation capability. To address these issues, we propose deep embedded multi-view clustering with collaborative training (DEMVC) in this paper. Firstly, the embedded representations of multiple views are learned individually by deep autoencoders. Then, both consensus and complementary of multiple views are taken into account and a novel collaborative training scheme is proposed. Concretely, the feature representations and cluster assignments of all views are learned collaboratively. A new consistency strategy for cluster centers initialization is further developed to improve the multi-view clustering performance with collaborative training. Experimental results on several popular multi-view datasets show that DEMVC achieves significant improvements over state-of-the-art methods.


Overview of Clustering Algorithms

#artificialintelligence

Clustering is an unsupervised technique in which the set of similar data points is grouped together to form a cluster. A Cluster is said to be good if the intra-cluster (the data points within the same cluster) similarity is high and the inter-cluster (the data points outside the cluster) similarity is low. Clustering could also be viewed as a Data Compression technique in which the data points of a cluster can be treated as a group. Clustering is also called Data Segmentation because it partitions the data such that a group of similar data points forms a cluster. Classification Algorithms are good techniques to distinguish between groups and classify.


Joint Featurewise Weighting and Lobal Structure Learning for Multi-view Subspace Clustering

arXiv.org Machine Learning

Multi-view clustering integrates multiple feature sets, which reveal distinct aspects of the data and provide complementary information to each other, to improve the clustering performance. It remains challenging to effectively exploit complementary information across multiple views since the original data often contain noise and are highly redundant. Moreover, most existing multi-view clustering methods only aim to explore the consistency of all views while ignoring the local structure of each view. However, it is necessary to take the local structure of each view into consideration, because different views would present different geometric structures while admitting the same cluster structure. To address the above issues, we propose a novel multi-view subspace clustering method via simultaneously assigning weights for different features and capturing local information of data in view-specific self-representation feature spaces. Especially, a common cluster structure regularization is adopted to guarantee consistency among different views. An efficient algorithm based on an augmented Lagrangian multiplier is also developed to solve the associated optimization problem. Experiments conducted on several benchmark datasets demonstrate that the proposed method achieves state-of-the-art performance. We provide the Matlab code on https://github.com/Ekin102003/JFLMSC.


Scaling Graph Clustering with Distributed Sketches

arXiv.org Machine Learning

The unsupervised learning of community structure, in particular the partitioning vertices into clusters or communities, is a canonical and well-studied problem in exploratory graph analysis. However, like most graph analyses the introduction of immense scale presents challenges to traditional methods. Spectral clustering in distributed memory, for example, requires hundreds of expensive bulk-synchronous communication rounds to compute an embedding of vertices to a few eigenvectors of a graph associated matrix. Furthermore, the whole computation may need to be repeated if the underlying graph changes some low percentage of edge updates. We present a method inspired by spectral clustering where we instead use matrix sketches derived from random dimension-reducing projections. We show that our method produces embeddings that yield performant clustering results given a fully-dynamic stochastic block model stream using both the fast Johnson-Lindenstrauss and CountSketch transforms. We also discuss the effects of stochastic block model parameters upon the required dimensionality of the subsequent embeddings, and show how random projections could significantly improve the performance of graph clustering in distributed memory.


Approximately Optimal Binning for the Piecewise Constant Approximation of the Normalized Unexplained Variance (nUV) Dissimilarity Measure

arXiv.org Machine Learning

The recently introduced Matching by Tone Mapping (MTM) dissimilarity measure enables template matching under smooth non-linear distortions and also has a well-established mathematical background. MTM operates by binning the template, but the ideal binning for a particular problem is an open question. By pointing out an important analogy between the well known mutual information (MI) and MTM, we introduce the term "normalized unexplained variance" (nUV) for MTM to emphasize its relevance and applicability beyond image processing. Then, we provide theoretical results on the optimal binning technique for the nUV measure and propose algorithms to find approximate solutions. The theoretical findings are supported by numerical experiments. Using the proposed techniques for binning shows 4-13% increase in terms of AUC scores with statistical significance, enabling us to conclude that the proposed binning techniques have the potential to improve the performance of the nUV measure in real applications.


Cluster Analysis

#artificialintelligence

We are familiar with most of the supervised learning methods, for example, linear regression, logistic regression, decision trees, SVM so on… where for an input we have an associated output/label. When we have a problem in which we have input but no associated output/label such kind of learning is known as unsupervised learning. One mechanism that we may use in this context is cluster analysis or clustering. Definition 1: Cluster analysis is a multivariate statistical technique. It group's observations on the basis some of their features or variables they are described by.! Definition 2: Cluster analysis observations in a data set can be divided into different groups and is very useful.