AITopics

2310.17712

Country:

South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Michigan (0.04)

Genre:

Overview (0.67)
Research Report > New Finding (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

Khalafaoui, Yasser, Matei, Basarab, Grozavu, Nistor, Lovisetto, Martino

Joint Multi-View Collaborative Clustering

arXiv.org Artificial IntelligenceOct-25-2023

Data is increasingly being collected from multiple sources and described by multiple views. These multi-view data provide richer information than traditional single-view data. Fusing the former for specific tasks is an essential component of multi-view clustering. Since the goal of multi-view clustering algorithms is to discover the common latent structure shared by multiple views, the majority of proposed solutions overlook the advantages of incorporating knowledge derived from horizontal collaboration between multi-view data and the final consensus. To fill this gap, we propose the Joint Multi-View Collaborative Clustering (JMVCC) solution, which involves the generation of basic partitions using Non-negative Matrix Factorization (NMF) and the horizontal collaboration principle, followed by the fusion of these local partitions using ensemble clustering. Furthermore, we propose a weighting method to reduce the risk of negative collaboration (i.e., views with low quality) during the generation and fusion of local partitions. The experimental results, which were obtained using a variety of data sets, demonstrate that JMVCC outperforms other multi-view clustering algorithms and is robust to noisy views.

information, local partition, partition, (12 more...)

doi: 10.1109/IJCNN54540.2023.10192014

2311.12859

Country:

North America > United States > California (0.04)
Europe > France > Auvergne-Rhône-Alpes > Lyon > Lyon (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Malki, Nabil El, Cugny, Robin, Teste, Olivier, Ravat, Franck

DECWA : Density-Based Clustering using Wasserstein Distance

arXiv.org Artificial IntelligenceOct-25-2023

Clustering is a data analysis method for extracting knowledge by discovering groups of data called clusters. Among these methods, state-of-the-art density-based clustering methods have proven to be effective for arbitrary-shaped clusters. Despite their encouraging results, they suffer to find low-density clusters, near clusters with similar densities, and high-dimensional data. Our proposals are a new characterization of clusters and a new clustering algorithm based on spatial density and probabilistic approach. First of all, sub-clusters are built using spatial density represented as probability density function ($p.d.f$) of pairwise distances between points. A method is then proposed to agglomerate similar sub-clusters by using both their density ($p.d.f$) and their spatial distance. The key idea we propose is to use the Wasserstein metric, a powerful tool to measure the distance between $p.d.f$ of sub-clusters. We show that our approach outperforms other state-of-the-art density-based clustering methods on a wide variety of datasets.

decwa, density-based clustering, wasserstein distance

doi: 10.1145/3340531.3412125

2310.16552

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Dong, Jing, Yoshida, Yuichi

A Batch-to-Online Transformation under Random-Order Model

arXiv.org Machine LearningOct-25-2023

We introduce a transformation framework that can be utilized to develop online algorithms with low $\epsilon$-approximate regret in the random-order model from offline approximation algorithms. We first give a general reduction theorem that transforms an offline approximation algorithm with low average sensitivity to an online algorithm with low $\epsilon$-approximate regret. We then demonstrate that offline approximation algorithms can be transformed into a low-sensitivity version using a coreset construction method. To showcase the versatility of our approach, we apply it to various problems, including online $(k,z)$-clustering, online matrix approximation, and online regression, and successfully achieve polylogarithmic $\epsilon$-approximate regret for each problem. Moreover, we show that in all three cases, our algorithm also enjoys low inconsistency, which may be desired in some online applications.

artificial intelligence, data mining, machine learning, (19 more...)

2306.07163

Country:

Asia > China > Hong Kong (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (1.00)

Industry: Education (0.48)

Technology:

Information Technology > Mathematics of Computing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

arXiv.org Machine LearningOct-25-2023

Hierarchical clustering with OWA-based linkages, the Lance-Williams formula, and dendrogram inversions

Gagolewski, Marek, Cena, Anna, James, Simon, Beliakov, Gleb

Agglomerative hierarchical clustering based on Ordered Weighted Averaging (OWA) operators not only generalises the single, complete, and average linkages, but also includes intercluster distances based on a few nearest or farthest neighbours, trimmed and winsorised means of pairwise point similarities, amongst many others. We explore the relationships between the famous Lance-Williams update formula and the extended OWA-based linkages with weights generated via infinite coefficient sequences. Furthermore, we provide some conditions for the weight generators to guarantee the resulting dendrograms to be free from unaesthetic inversions.

artificial intelligence, linkage, machine learning, (14 more...)

doi: 10.1016/j.fss.2023.108740

2303.05683

Country:

Europe > Poland > Masovia Province > Warsaw (0.05)
Oceania > Australia (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Czechia > Prague (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

arXiv.org Machine LearningOct-25-2023

A framework for benchmarking clustering algorithms

Gagolewski, Marek

The evaluation of clustering algorithms can involve running them on a variety of benchmark problems, and comparing their outputs to the reference, ground-truth groupings provided by experts. Unfortunately, many research papers and graduate theses consider only a small number of datasets. Also, the fact that there can be many equally valid ways to cluster a given problem set is rarely taken into account. In order to overcome these limitations, we have developed a framework whose aim is to introduce a consistent methodology for testing clustering algorithms. Furthermore, we have aggregated, polished, and standardised many clustering benchmark dataset collections referred to across the machine learning and data mining literature, and included new datasets of different dimensionalities, sizes, and cluster types. An interactive datasets explorer, the documentation of the Python API, a description of the ways to interact with the framework from other programming languages such as R or MATLAB, and other details are all provided at .

artificial intelligence, dataset, machine learning, (16 more...)

doi: 10.1016/j.softx.2022.101270

2209.09493

Country:

Europe > Poland > Masovia Province > Warsaw (0.05)
Oceania > Australia (0.04)
Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)

Genre:

Research Report (0.65)
Instructional Material > Course Syllabus & Notes (0.34)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Tzortzis, Alexandros-Menelaos, Pelekis, Sotiris, Spiliotis, Evangelos, Mouzakitis, Spiros, Psarras, John, Askounis, Dimitris

Transfer learning for day-ahead load forecasting: a case study on European national electricity demand time series

arXiv.org Artificial IntelligenceOct-24-2023

Short-term load forecasting (STLF) is crucial for the daily operation of power grids. However, the non-linearity, non-stationarity, and randomness characterizing electricity demand time series renders STLF a challenging task. Various forecasting approaches have been proposed for improving STLF, including neural network (NN) models which are trained using data from multiple electricity demand series that may not necessary include the target series. In the present study, we investigate the performance of this special case of STLF, called transfer learning (TL), by considering a set of 27 time series that represent the national day-ahead electricity demand of indicative European countries. We employ a popular and easy-to-implement NN model and perform a clustering analysis to identify similar patterns among the series and assist TL. In this context, two different TL approaches, with and without the clustering step, are compiled and compared against each other as well as a typical NN training setup. Our results demonstrate that TL can outperform the conventional approach, especially when clustering techniques are considered.

forecasting, load forecasting, time sery, (12 more...)

2310.15555

Country:

Europe > Norway (0.14)
Europe > Sweden (0.14)
Europe > Denmark (0.14)
(29 more...)

Genre: Research Report > New Finding (1.00)

Industry: Energy > Power Industry (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.66)

Hozumi, Yuta, Wei, Guo-Wei

Analyzing Single Cell RNA Sequencing with Topological Nonnegative Matrix Factorization

arXiv.org Machine LearningOct-24-2023

Single-cell RNA sequencing (scRNA-seq) is a relatively new technology that has stimulated enormous interest in statistics, data science, and computational biology due to the high dimensionality, complexity, and large scale associated with scRNA-seq data. Nonnegative matrix factorization (NMF) offers a unique approach due to its meta-gene interpretation of resulting low-dimensional components. However, NMF approaches suffer from the lack of multiscale analysis. This work introduces two persistent Laplacian regularized NMF methods, namely, topological NMF (TNMF) and robust topological NMF (rTNMF). By employing a total of 12 datasets, we demonstrate that the proposed TNMF and rTNMF significantly outperform all other NMF-based methods. We have also utilized TNMF and rTNMF for the visualization of popular Uniform Manifold Approximation and Projection (UMAP) and t-distributed stochastic neighbor embedding (t-SNE).

artificial intelligence, bioinformatics, machine learning, (18 more...)

2310.15744

Country:

North America > United States > Michigan > Ingham County > Lansing (0.04)
North America > United States > Michigan > Ingham County > East Lansing (0.04)
North America > United States > Missouri > Greene County > Springfield (0.04)
Asia > China > Guangxi Province > Nanning (0.04)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Biomedical Informatics > Translational Bioinformatics (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

arXiv.org Artificial IntelligenceOct-23-2023

Privacy-Preserving Federated Deep Clustering based on GAN

Yan, Jie, Liu, Jing, Qi, Ji, Zhang, Zhong-Yuan

Federated clustering (FC) is an essential extension of centralized clustering designed for the federated setting, wherein the challenge lies in constructing a global similarity measure without the need to share private data. Conventional approaches to FC typically adopt extensions of centralized methods, like K-means and fuzzy c-means. However, these methods are susceptible to non-independent-and-identically-distributed (non-IID) data among clients, leading to suboptimal performance, particularly with high-dimensional data. In this paper, we present a novel approach to address these limitations by proposing a Privacy-Preserving Federated Deep Clustering based on Generative Adversarial Networks (GANs). Each client trains a local generative adversarial network (GAN) locally and uploads the synthetic data to the server. The server applies a deep clustering network on the synthetic data to establish $k$ cluster centroids, which are then downloaded to the clients for cluster assignment. Theoretical analysis demonstrates that the GAN-generated samples, shared among clients, inherently uphold certain privacy guarantees, safeguarding the confidentiality of individual data. Furthermore, extensive experimental evaluations showcase the effectiveness and utility of our proposed method in achieving accurate and privacy-preserving federated clustering.

centroid, dataset, ppfc-gan, (12 more...)

2211.16965

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.87)
Information Technology > Data Science > Data Mining > Big Data (0.81)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

Manerikar, Ankit, Kak, Avinash C.

Self-Supervised One-Shot Learning for Automatic Segmentation of StyleGAN Images

arXiv.org Artificial IntelligenceOct-23-2023

We propose a framework for the automatic one-shot segmentation of synthetic images generated by a StyleGAN. Our framework is based on the observation that the multi-scale hidden features in the GAN generator hold useful semantic information that can be utilized for automatic on-the-fly segmentation of the generated images. Using these features, our framework learns to segment synthetic images using a self-supervised contrastive clustering algorithm that projects the hidden features into a compact space for per-pixel classification. This contrastive learner is based on using a novel data augmentation strategy and a pixel-wise swapped prediction loss that leads to faster learning of the feature vectors for one-shot segmentation. We have tested our implementation on five standard benchmarks to yield a segmentation performance that not only outperforms the semi-supervised baselines by an average wIoU margin of 1.02 % but also improves the inference speeds by a factor of 4.5. Finally, we also show the results of using the proposed one-shot learner in implementing BagGAN, a framework for producing annotated synthetic baggage X-ray scans for threat detection. This framework was trained and tested on the PIDRay baggage benchmark to yield a performance comparable to its baseline segmenter based on manual annotations.

dataset, learning, segmentation, (16 more...)

2303.05639

Country: North America > United States > Indiana (0.04)

Genre:

Research Report (0.50)
Overview (0.46)

Industry: Information Technology > Security & Privacy (0.34)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)