AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

12 Machine Learning Books You Should Read in 2023 - Machine Learning Techniques

#artificialintelligenceDec-7-2022, 05:26:30 GMT

This complements the list that I posted earlier under the title "Math for Machine Learning: 14 Must-Read Books", available here. Many of the following books have a free PDF version, their own website and GitHub repository, and usually you can purchase the print version. Some are self-published, with the PDF version regularly updated, and even

algorithm, learning, machine learning, (12 more...)

#artificialintelligence

Genre:

Summary/Review (1.00)
Instructional Material > Course Syllabus & Notes (0.99)

Industry: Education (0.71)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.30)

Add feedback

On the Global Solution of Soft k-Means

Nie, Feiping, Chen, Hong, Wang, Rong, Li, Xuelong

arXiv.org Artificial IntelligenceDec-7-2022

This paper presents an algorithm to solve the Soft k-Means problem globally. Unlike Fuzzy c-Means, Soft k-Means (SkM) has a matrix factorization-type objective and has been shown to have a close relation with the popular probability decomposition-type clustering methods, e.g., Left Stochastic Clustering (LSC). Though some work has been done for solving the Soft k-Means problem, they usually use an alternating minimization scheme or the projected gradient descent method, which cannot guarantee global optimality since the non-convexity of SkM. In this paper, we present a sufficient condition for a feasible solution of Soft k-Means problem to be globally optimal and show the output of the proposed algorithm satisfies it. Moreover, for the Soft k-Means problem, we provide interesting discussions on stability, solutions non-uniqueness, and connection with LSC. Then, a new model, named Minimal Volume Soft k-Means (MVSkM), is proposed to address the solutions non-uniqueness issue. Finally, experimental results support our theoretical results.

artificial intelligence, machine learning, soft -means problem, (17 more...)

arXiv.org Artificial Intelligence

2212.03589

Country:

Asia > China > Shaanxi Province > Xi'an (0.04)
South America > Paraguay > Asunción > Asunción (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.49)

Add feedback

A parallelizable model-based approach for marginal and multivariate clustering

de Carvalho, Miguel, Venturini, Gabriel Martos, Svetlošák, Andrej

arXiv.org Artificial IntelligenceDec-7-2022

Context and Motivation Clustering is an unsupervised learning approach for the task of partitioning data into meaningful subsets. The huge literature on cluster analysis is difficult to survey in a few sentences, but a concise description of well-known approaches is offered by Hastie et al. (2009), Everitt et al. (2011), and King (2014). Examples of mainstream methods for clustering data include model-based (Bouveyron et al., 2019), similarity-based (MacQueen, 1967; Kaufman and Rousseeuw, 1987), and hierarchical clustering (Hastie et al., 2009, Section 14.3). In this paper we propose a novel model-based approach for cluster analysis that lies at the interface of model-based clustering (i.e., via mixture models) and similarity-based clustering (i.e., via K-means and K-medoids). The proposed approach aims to benefit from the flexibility and soundness of model-based clustering, while attempting to mitigate Pitfalls 1 and 2 below. Model-based clustering is a fast-evolving and intradisciplinary research topic as can be seen from the recent Handbook on Mixture Analysis (Fruhwirth-Schnatter et al., 2019) as well as the survey papers of Melnykov and Maitra (2010), McNicholas (2016), Gormley et al. (2023), and the references therein.

artificial intelligence, machine learning, parallelizable model-based approach, (17 more...)

arXiv.org Artificial Intelligence

2212.04009

Country:

Europe > Austria > Vienna (0.14)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(6 more...)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Robust Point Cloud Segmentation with Noisy Annotations

Ye, Shuquan, Chen, Dongdong, Han, Songfang, Liao, Jing

arXiv.org Artificial IntelligenceDec-6-2022

Point cloud segmentation is a fundamental task in 3D. Despite recent progress on point cloud segmentation with the power of deep networks, current learning methods based on the clean label assumptions may fail with noisy labels. Yet, class labels are often mislabeled at both instance-level and boundary-level in real-world datasets. In this work, we take the lead in solving the instance-level label noise by proposing a Point Noise-Adaptive Learning (PNAL) framework. Compared to noise-robust methods on image tasks, our framework is noise-rate blind, to cope with the spatially variant noise rate specific to point clouds. Specifically, we propose a point-wise confidence selection to obtain reliable labels from the historical predictions of each point. A cluster-wise label correction is proposed with a voting strategy to generate the best possible label by considering the neighbor correlations. To handle boundary-level label noise, we also propose a variant ``PNAL-boundary " with a progressive boundary label cleaning strategy. Extensive experiments demonstrate its effectiveness on both synthetic and real-world noisy datasets. Even with $60\%$ symmetric noise and high-level boundary noise, our framework significantly outperforms its baselines, and is comparable to the upper bound trained on completely clean data. Moreover, we cleaned the popular real-world dataset ScanNetV2 for rigorous experiment. Our code and data is available at https://github.com/pleaseconnectwifi/PNAL.

artificial intelligence, machine learning, noise, (17 more...)

arXiv.org Artificial Intelligence

2212.03242

Country:

Asia > China > Hong Kong (0.04)
North America > United States > Washington > King County > Redmond (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Africa > Ethiopia > Addis Ababa > Addis Ababa (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.69)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Artificial Intelligence Security Competition (AISC)

Dong, Yinpeng, Chen, Peng, Deng, Senyou, L, Lianji, Sun, Yi, Zhao, Hanyu, Li, Jiaxing, Tan, Yunteng, Liu, Xinyu, Dong, Yangyi, Xu, Enhui, Xu, Jincai, Xu, Shu, Fu, Xuelin, Sun, Changfeng, Han, Haoliang, Zhang, Xuchong, Chen, Shen, Sun, Zhimin, Cao, Junyi, Yao, Taiping, Ding, Shouhong, Wu, Yu, Lin, Jian, Wu, Tianpeng, Wang, Ye, Fu, Yu, Feng, Lin, Gao, Kangkang, Liu, Zeyu, Pang, Yuanzhe, Duan, Chengqi, Zhou, Huipeng, Wang, Yajie, Zhao, Yuhang, Wu, Shangbo, Lyu, Haoran, Lin, Zhiyu, Gao, Yifei, Li, Shuang, Wang, Haonan, Sang, Jitao, Ma, Chen, Zheng, Junhao, Li, Yijia, Shen, Chao, Lin, Chenhao, Cui, Zhichao, Liu, Guoshuai, Shi, Huafeng, Hu, Kun, Zhang, Mengxin

arXiv.org Artificial IntelligenceDec-6-2022

The security of artificial intelligence (AI) is an important research area towards safe, reliable, and trustworthy AI systems. To accelerate the research on AI security, the Artificial Intelligence Security Competition (AISC) was organized by the Zhongguancun Laboratory, China Industrial Control Systems Cyber Emergency Response Team, Institute for Artificial Intelligence, Tsinghua University, and RealAI as part of the Zhongguancun International Frontier Technology Innovation Competition (https://www.zgc-aisc.com/en). The competition consists of three tracks, including Deepfake Security Competition, Autonomous Driving Security Competition, and Face Recognition Security Competition. This report will introduce the competition rules of these three tracks and the solutions of top-ranking teams in each track.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2212.03412

Country:

Asia > China > Beijing > Beijing (0.04)
Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Shaanxi Province > Xi'an (0.04)

Genre: Research Report (0.64)

Industry:

Information Technology > Security & Privacy (1.00)
Transportation > Ground > Road (0.66)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

A machine learning approach to support decision in insider trading detection

Mazzarisi, Piero, Ravagnani, Adele, Deriu, Paola, Lillo, Fabrizio, Medda, Francesca, Russo, Antonio

arXiv.org Artificial IntelligenceDec-6-2022

Identifying market abuse activity from data on investors' trading activity is very challenging both for the data volume and for the low signal to noise ratio. Here we propose two complementary unsupervised machine learning methods to support market surveillance aimed at identifying potential insider trading activities. The first one uses clustering to identify, in the vicinity of a price sensitive event such as a takeover bid, discontinuities in the trading activity of an investor with respect to his/her own past trading history and on the present trading activity of his/her peers. The second unsupervised approach aims at identifying (small) groups of investors that act coherently around price sensitive events, pointing to potential insider rings, i.e. a group of synchronised traders displaying strong directional trading in rewarding position in a period before the price sensitive event. As a case study, we apply our methods to investor resolved data of Italian stocks around takeover bids.

artificial intelligence, investor, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2212.05912

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Greater London > London > City of London (0.04)
Europe > Italy > Emilia-Romagna > Metropolitan City of Bologna > Bologna (0.04)

Genre: Research Report (1.00)

Industry: Banking & Finance > Trading (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)

Add feedback

Density-Based Clustering: DBSCAN vs. HDBSCAN

#artificialintelligenceDec-5-2022, 20:45:35 GMT

Cluster Analysis is a pertinent domain in data science that enables the grouping of similar objects into distinct subgroups. While there are different families of clustering algorithms, the most widely known is K-Means. This is a centroid-based algorithm, meaning that objects in the data are clustered by being assigned to the nearest centroid. However, a major pitfall of K-Means is its lack of detecting outliers, or noisy data points, which leads them to be classified incorrectly. Furthermore, K-Means has an intrinsic preference for globular clusters and does not work very well on data comprised of arbitrarily shaped clusters.

algorithm, density-based clustering, hdbscan, (9 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

HyperEF: Spectral Hypergraph Coarsening by Effective-Resistance Clustering

Aghdaei, Ali, Feng, Zhuo

arXiv.org Artificial IntelligenceDec-3-2022

This paper introduces a scalable algorithmic framework (HyperEF) for spectral coarsening (decomposition) of large-scale hypergraphs by exploiting hyperedge effective resistances. Motivated by the latest theoretical framework for low-resistance-diameter decomposition of simple graphs, HyperEF aims at decomposing large hypergraphs into multiple node clusters with only a few inter-cluster hyperedges. The key component in HyperEF is a nearly-linear time algorithm for estimating hyperedge effective resistances, which allows incorporating the latest diffusion-based non-linear quadratic operators defined on hypergraphs. To achieve good runtime scalability, HyperEF searches within the Krylov subspace (or approximate eigensubspace) for identifying the nearly-optimal vectors for approximating the hyperedge effective resistances. In addition, a node weight propagation scheme for multilevel spectral hypergraph decomposition has been introduced for achieving even greater node coarsening ratios. When compared with state-of-the-art hypergraph partitioning (clustering) methods, extensive experiment results on real-world VLSI designs show that HyperEF can more effectively coarsen (decompose) hypergraphs without losing key structural (spectral) properties of the original hypergraphs, while achieving over $70\times$ runtime speedups over hMetis and $20\times$ speedups over HyperSF.

artificial intelligence, hypergraph, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2210.14813

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe (0.04)

Genre: Research Report (1.00)

Industry: Semiconductors & Electronics (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Add feedback

Clustering individuals based on multivariate EMA time-series data

Ntekouli, Mandani, Spanakis, Gerasimos, Waldorp, Lourens, Roefs, Anne

arXiv.org Artificial IntelligenceDec-2-2022

In the field of psychopathology, Ecological Momentary Assessment (EMA) methodological advancements have offered new opportunities to collect time-intensive, repeated and intra-individual measurements. This way, a large amount of data has become available, providing the means for further exploring mental disorders. Consequently, advanced machine learning (ML) methods are needed to understand data characteristics and uncover hidden and meaningful relationships regarding the underlying complex psychological processes. Among other uses, ML facilitates the identification of similar patterns in data of different individuals through clustering. This paper focuses on clustering multivariate time-series (MTS) data of individuals into several groups. Since clustering is an unsupervised problem, it is challenging to assess whether the resulting grouping is successful. Thus, we investigate different clustering methods based on different distance measures and assess them for the stability and quality of the derived clusters. These clustering steps are illustrated on a real-world EMA dataset, including 33 individuals and 15 variables. Through evaluation, the results of kernel-based clustering methods appear promising to identify meaningful groups in the data. So, efficient representations of EMA data play an important role in clustering.

artificial intelligence, kernel, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2212.01159

Country:

Europe > Netherlands > Limburg > Maastricht (0.05)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Clustering through Feature Space Sequence Discovery and Analysis

Guobin, Shi

arXiv.org Artificial IntelligenceDec-2-2022

Identifying high-dimensional data patterns without a priori knowledge is an important task of data science. This paper proposes a simple and efficient noparametric algorithm: Data Convert to Sequence Analysis, DCSA, which dynamically explore each point in the feature space without repetition, and a Directed Hamilton Path will be found. Based on the change point analysis theory, The sequence corresponding to the path is cut into several fragments to achieve clustering. The experiments on real-world datasets from different fields with dimensions ranging from 4 to 20531 confirm that the method in this work is robust and has visual interpretability in result analysis.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2212.00996

Country:

North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.06)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > California > Orange County > Irvine (0.04)
(3 more...)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Add feedback