AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Machine Learning Performance Analysis to Predict Stroke Based on Imbalanced Medical Dataset

Jing, Yuru

arXiv.org Artificial IntelligenceNov-14-2022

Cerebral stroke, the second most substantial cause of death universally, has been a primary public health concern over the last few years. With the help of machine learning techniques, early detection of various stroke alerts is accessible, which can efficiently prevent or diminish the stroke. Medical datasets, however, are frequently unbalanced in their class label, with a tendency to poorly predict minority classes. In this paper, the potential risk factors for stroke are investigated. Moreover, four distinctive approaches are applied to improve the classification of the minority class in the imbalanced stroke dataset, which are the ensemble weight voting classifier, the Synthetic Minority Over-sampling Technique (SMOTE), Principal Component Analysis with K-Means Clustering (PCA-Kmeans), Focal Loss with the Deep Neural Network (DNN) and compare their performance. Through the analysis results, SMOTE and PCA-Kmeans with DNN-Focal Loss work best for the limited size of a large severe imbalanced dataset (e.g., Stroke dataset), which is 2-4 times outperform Kaggle's work.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2211.07652

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Switzerland (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.96)
Health & Medicine > Therapeutic Area > Neurology (0.68)
Health & Medicine > Consumer Health (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)

Add feedback

A Dataset and Baseline Approach for Identifying Usage States from Non-Intrusive Power Sensing With MiDAS IoT-based Sensors

Muppasani, Bharath, Anand, Cheyyur Jaya, Appajigowda, Chinmayi, Srivastava, Biplav, Johri, Lokesh

arXiv.org Artificial IntelligenceNov-14-2022

Authors in (Rajapaksha and The growth in the deployment of Internet of Things (IoT) Bergmeir 2022) focused on providing rule based explanations sensors across different industries has opened several opportunities for a particular forecast, considering the global forecasting for the economy. One of them is the collection of IoT model as a black-box model trained across multivariate data that companies can use to build smarter solutions.

artificial intelligence, machine learning, sensor, (16 more...)

arXiv.org Artificial Intelligence

2209.00987

Country:

North America > United States > South Carolina > Richland County > Columbia (0.14)
Asia > India (0.09)
North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
North America > United States > California > Santa Clara County > San Jose (0.04)

Genre: Research Report (0.64)

Industry: Energy > Power Industry (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

Automated Cancer Subtyping via Vector Quantization Mutual Information Maximization

Chen, Zheng, Zhu, Lingwei, Yang, Ziwei, Matsubara, Takashi

arXiv.org Artificial IntelligenceNov-14-2022

Cancer subtyping is crucial for understanding the nature of tumors and providing suitable therapy. However, existing labelling methods are medically controversial, and have driven the process of subtyping away from teaching signals. Moreover, cancer genetic expression profiles are high-dimensional, scarce, and have complicated dependence, thereby posing a serious challenge to existing subtyping models for outputting sensible clustering. In this study, we propose a novel clustering method for exploiting genetic expression profiles and distinguishing subtypes in an unsupervised manner. The proposed method adaptively learns categorical correspondence from latent representations of expression profiles to the subtypes output by the model. By maximizing the problem -- agnostic mutual information between input expression profiles and output subtypes, our method can automatically decide a suitable number of subtypes. Through experiments, we demonstrate that our proposed method can refine existing controversial labels, and, by further medical analysis, this refinement is proven to have a high correlation with cancer survival rates.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2206.10801

Country:

North America > Canada > Alberta (0.14)
Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.04)

Genre: Research Report > New Finding (0.34)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Data Science > Data Mining (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)

Add feedback

Out-of-Dynamics Imitation Learning from Multimodal Demonstrations

Qiu, Yiwen, Wu, Jialong, Cao, Zhangjie, Long, Mingsheng

arXiv.org Artificial IntelligenceNov-13-2022

Existing imitation learning works mainly assume that the demonstrator who collects demonstrations shares the same dynamics as the imitator. However, the assumption limits the usage of imitation learning, especially when collecting demonstrations for the imitator is difficult. In this paper, we study out-of-dynamics imitation learning (OOD-IL), which relaxes the assumption to that the demonstrator and the imitator have the same state spaces but could have different action spaces and dynamics. OOD-IL enables imitation learning to utilize demonstrations from a wide range of demonstrators but introduces a new challenge: some demonstrations cannot be achieved by the imitator due to the different dynamics. Prior works try to filter out such demonstrations by feasibility measurements, but ignore the fact that the demonstrations exhibit a multimodal distribution since the different demonstrators may take different policies in different dynamics. We develop a better transferability measurement to tackle this newly-emerged challenge. We firstly design a novel sequence-based contrastive clustering algorithm to cluster demonstrations from the same mode to avoid the mutual interference of demonstrations from different modes, and then learn the transferability of each demonstration with an adversarial-learning based algorithm in each cluster. Experiment results on several MuJoCo environments, a driving environment, and a simulated robot environment show that the proposed transferability measurement more accurately finds and down-weights non-transferable demonstrations and outperforms prior works on the final imitation learning performance. We show the videos of our experiment results on our website.

artificial intelligence, demonstration, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2211.06839

Country:

Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Add feedback

Online Correlation Clustering for Dynamic Complete Signed Graphs

Shakiba, Ali

arXiv.org Artificial IntelligenceNov-13-2022

In the correlation clustering problem for complete signed graphs, the input is a complete signed graph with edges weighted as $+1$ (denote recommendation to put this pair in the same cluster) or $-1$ (recommending to put this pair of vertices in separate clusters) and the target is to cluster the set of vertices such that the number of disagreements with these recommendations is minimized. In this paper, we consider the problem of correlation clustering for dynamic complete signed graphs where (1) a vertex can be added or deleted, and (2) the sign of an edge can be flipped. In the proposed online scheme, the offline approximation algorithm in [CALM+21] for correlation clustering is used. Up to the author's knowledge, this is the first online algorithm for dynamic graphs which allows a full set of graph editing operations. The proposed approach is rigorously analyzed and compared with a baseline method, which runs the original offline algorithm on each time step. Our results show that the dynamic operations have local effects on the neighboring vertices and we employ this locality to reduce the dependency of the running time in the Baseline to the summation of the degree of all vertices in $G_t$, the graph after applying the graph edit operation at time step $t$, to the summation of the degree of the changing vertices (e.g. two endpoints of an edge) and the number of clusters in the previous time step. Moreover, the required working memory is reduced to the square of the summation of the degree of the modified edge endpoints rather than the total number of vertices in the graph.

artificial intelligence, machine learning, vertex, (18 more...)

arXiv.org Artificial Intelligence

2211.07

Country: Asia > Middle East > Iran (0.04)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)

Add feedback

Salvatore (Sal) Magnone on LinkedIn: Breaking it Down: K-Means Clustering

#artificialintelligenceNov-12-2022, 14:55:32 GMT

We are pleased to announce, thanks to the acquisition of Latham BioPharm Group, we are accelerating in the Life Sciences sector with the creation of a Life Science Consulting division. Latham BioPharm Group founded in 1996 by Peter Latham, is one of the top Life Science Consulting specialty firms supporting Non-Dilutive Funding, Product Development, and Strategic decision-making for a wide range of sectors and clients. "Pete and I have agreed on a global goal, supported by significant financial resources, that will allow us to grow organically at more than 20% per year...." says Matthieu Courtecuisse, Founder and CEO of Sia Partners.

k-means clustering, linkedin, salvatore, (2 more...)

#artificialintelligence

Technology:

Information Technology > Communications > Social Media (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.40)

Add feedback

A Pipeline for Business Intelligence and Data-Driven Root Cause Analysis on Categorical Data

Thakar, Shubham, Kalbande, Dhananjay

arXiv.org Artificial IntelligenceNov-12-2022

Business intelligence (BI) is any knowledge derived from existing data that may be strategically applied within a business. Data mining is a technique or method for extracting BI from data using statistical data modeling. Finding relationships or correlations between the various data items that have been collected can be used to boost business performance or at the very least better comprehend what is going on. Root cause analysis (RCA) is discovering the root causes of problems or events to identify appropriate solutions. RCA can show why an event occurred and this can help in avoiding occurrences of an issue in the future. This paper proposes a new clustering + association rule mining pipeline for getting business insights from data. The results of this pipeline are in the form of association rules having consequents, antecedents, and various metrics to evaluate these rules. The results of this pipeline can help in anchoring important business decisions and can also be used by data scientists for updating existing models or while developing new ones. The occurrence of any event is explained by its antecedents in the generated rules. Hence this output can also help in data-driven root cause analysis.

data mining, machine learning, springer nature 2021, (14 more...)

arXiv.org Artificial Intelligence

2211.06717

Country:

North America > United States > Tennessee (0.04)
Europe > France (0.04)
Asia > India > Maharashtra > Mumbai (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.95)

Add feedback

Inv-SENnet: Invariant Self Expression Network for clustering under biased data

Singh, Ashutosh, Singh, Ashish, Masoomi, Aria, Imbiriba, Tales, Learned-Miller, Erik, Erdogmus, Deniz

arXiv.org Artificial IntelligenceNov-12-2022

Subspace clustering algorithms are used for understanding the cluster structure that explains the dataset well. These methods are extensively used for data-exploration tasks in various areas of Natural Sciences. However, most of these methods fail to handle unwanted biases in datasets. For datasets where a data sample represents multiple attributes, naively applying any clustering approach can result in undesired output. To this end, we propose a novel framework for jointly removing unwanted attributes (biases) while learning to cluster data points in individual subspaces. Assuming we have information about the bias, we regularize the clustering method by adversarially learning to minimize the mutual information between the data and the unwanted attributes. Our experimental result on synthetic and real-world datasets demonstrate the effectiveness of our approach.

artificial intelligence, dataset, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2211.0678

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.55)

Add feedback

Doubly Inhomogeneous Reinforcement Learning

Hu, Liyuan, Li, Mengbing, Shi, Chengchun, Wu, Zhenke, Fryzlewicz, Piotr

arXiv.org Artificial IntelligenceNov-12-2022

This paper studies reinforcement learning (RL) in doubly inhomogeneous environments under temporal non-stationarity and subject heterogeneity. In a number of applications, it is commonplace to encounter datasets generated by system dynamics that may change over time and population, challenging high-quality sequential decision making. Nonetheless, most existing RL solutions require either temporal stationarity or subject homogeneity, which would result in sub-optimal policies if both assumptions were violated. To address both challenges simultaneously, we propose an original algorithm to determine the ``best data chunks" that display similar dynamics over time and across individuals for policy learning, which alternates between most recent change point detection and cluster identification. Our method is general, and works with a wide range of clustering and change point detection algorithms. It is multiply robust in the sense that it takes multiple initial estimators as input and only requires one of them to be consistent. Moreover, by borrowing information over time and population, it allows us to detect weaker signals and has better convergence properties when compared to applying the clustering algorithm per time or the change point detection algorithm per subject. Empirically, we demonstrate the usefulness of our method through extensive simulations and a real data application.

change point, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2211.03983

Country:

North America > United States > New York (0.04)
North America > United States > New Jersey (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Transportation (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.66)

Add feedback

Non-parametric Clustering of Multivariate Populations with Arbitrary Sizes

Bakam, Yves Ismaël Ngounou, Pommeret, Denys

arXiv.org Machine LearningNov-11-2022

We propose a clustering procedure to group K populations into subgroups with the same dependence structure. The method is adapted to paired population and can be used with panel data. It relies on the differences between orthogonal projection coefficients of the K density copulas estimated from the K populations. Each cluster is then constituted by populations having significantly similar dependence structures. A recent test statistic from Ngounou-Bakam and Pommeret (2022) is used to construct automatically such clusters. The procedure is data driven and depends on the asymptotic level of the test. We illustrate our clustering algorithm via numerical studies and through two real datasets: a panel of financial datasets and insurance dataset of losses and allocated loss adjustment expense.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

2211.06338

Country: Europe > France (0.15)

Genre: Research Report (0.50)

Industry:

Information Technology (1.00)
Health & Medicine > Health Care Providers & Services (1.00)
Banking & Finance > Trading (0.94)
(6 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback