AITopics

doi: 10.1109/ACCESS.2023.3247564

2209.04213

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Germany > Berlin (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(10 more...)

Genre:

Overview (1.00)
Research Report (0.64)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Lattimer, Barrett, Lattimer, Alan

Creating Compact Regions of Social Determinants of Health

arXiv.org Artificial IntelligenceSep-23-2022

Regionalization is the act of breaking a dataset into contiguous homogeneous regions that are heterogeneous from each other. Many different algorithms exist for performing regionalization; however, using these algorithms on large real world data sets have only become feasible in terms of compute power in recent years. Very few studies have been done comparing different regionalization methods, and those that do lack analysis in memory, scalability, geographic metrics, and large-scale real-world applications. This study compares state-of-the-art regionalization methods, namely, Agglomerative Clustering, SKATER, REDCAP, AZP, and Max-P-Regions using real world social determinant of health (SDOH) data. The scale of real world SDOH data, up to 1 million data points in this study, not only compares the algorithms over different data sets but provides a stress test for each individual regionalization algorithm, most of which have never been run on such scales previously. We use several new geographic metrics to compare algorithms as well as perform a comparative memory analysis. The prevailing regionalization method is then compared with unconstrained K-Means clustering on their ability to separate real health data in Virginia and Washington DC.

artificial intelligence, data mining, machine learning, (18 more...)

2209.11836

Country:

North America > United States > District of Columbia > Washington (0.27)
North America > United States > Virginia > Montgomery County > Blacksburg (0.04)

Genre: Research Report > New Finding (0.88)

Industry: Health & Medicine > Therapeutic Area (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

arXiv.org Artificial IntelligenceSep-23-2022

Library transfer between distinct Laser-Induced Breakdown Spectroscopy systems with shared standards

Vrábel, J., Képeš, E., Nedělník, P., Buday, J., Cempírek, J., Pořízka, P., Kaiser, J.

The mutual incompatibility of distinct spectroscopic systems is among the most limiting factors in Laser-Induced Breakdown Spectroscopy (LIBS). The cost related to setting up a new LIBS system is increased, as its extensive calibration is required. Solving the problem would enable inter-laboratory reference measurements and shared spectral libraries, which are fundamental for other spectroscopic techniques. In this work, we study a simplified version of this challenge where LIBS systems differ only in used spectrometers and collection optics but share all other parts of the apparatus, and collect spectra simultaneously from the same plasma plume. Extensive datasets measured as hyperspectral images of heterogeneous specimens are used to train machine learning models that can transfer spectra between systems. The transfer is realized by a pipeline that consists of a variational autoencoder (VAE) and a fully-connected artificial neural network (ANN). In the first step, we obtain a latent representation of the spectra which were measured on the Primary system (by using the VAE). In the second step, we map spectra from the Secondary system to corresponding locations in the latent space (by the ANN). Finally, Secondary system spectra are reconstructed from the latent space to the space of the Primary system. The transfer is evaluated by several figures of merit (Euclidean and cosine distances, both spatially resolved; k-means clustering of transferred spectra). The methodology is compared to several baseline approaches.

artificial intelligence, deep learning, machine learning, (20 more...)

2209.07637

Country:

Europe > Czechia > South Moravian Region > Brno (0.05)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

#artificialintelligenceSep-22-2022, 11:01:49 GMT

Clustering Algorithms in Machine Learning

Machine Learning problems deal with a great deal of data and depend heavily on the algorithms that are used to train the model. There are various approaches and algorithms to train a machine learning model based on the problem at hand. Supervised and unsupervised learning are the two most prominent of these approaches. An important real-life problem of marketing a product or service to a specific target audience can be easily resolved with the help of a form of unsupervised learning known as Clustering. This article will explain clustering algorithms along with real-life problems and examples.

algorithm, clustering, machine learning, (16 more...)

#artificialintelligence

Country: Asia > India (0.05)

Industry: Education (0.51)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Kuncheva, Ludmila, Williams, Francis, Hennessey, Samuel

A Bibliographic View on Constrained Clustering

A keyword search on constrained clustering on Web-of-Science returned just under 3,000 documents. We ran automatic analyses of those, and compiled our own bibliography of 183 papers which we analysed in more detail based on their topic and experimental study, if any. This paper presents general trends of the area and its sub-topics by Pareto analysis, using citation count and year of publication. We list available software and analyse the experimental sections of our reference collection. We found a notable lack of large comparison experiments. Among the topics we reviewed, applications studies were most abundant recently, alongside deep learning, active learning and ensemble learning.

artificial intelligence, data mining, machine learning, (16 more...)

2209.11125

Country:

Europe > United Kingdom (0.14)
Asia > Middle East > Jordan (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)

Genre: Research Report > Experimental Study (0.66)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
(2 more...)

Out-of-Distribution Detection Without Class Labels

Cohen, Niv, Abutbul, Ron, Hoshen, Yedid

Out-of-distribution detection seeks to identify novelties, samples that deviate from the norm. The task has been found to be quite challenging, particularly in the case where the normal data distribution consists of multiple semantic classes (e.g., multiple object categories). To overcome this challenge, current approaches require manual labeling of the normal images provided during training. In this work, we tackle multi-class novelty detection without class labels. Our simple but effective solution consists of two stages: we first discover "pseudo-class" labels using unsupervised clustering. Then using these pseudo-class labels, we are able to use standard supervised out-of-distribution detection methods. We verify the performance of our method by a favorable comparison to the state-of-the-art, and provide extensive analysis and ablations.

data mining, machine learning, natural language, (16 more...)

2112.07662

Country:

North America > United States > California (0.04)
Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)

Genre: Research Report (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Hedjam, Rachid, Abdesselam, Abdelhamid, Rahiche, Abderrahmane, Cheriet, Mohamed

Non-Negative Matrix Factorization with Scale Data Structure Preservation

Low-rank matrix factorization (MF) is a hot topic in many research problems such as feature extraction and dimensionality reduction Vidal et al. [2005], subspace segmentation Liu et al. [2010], data clustering Favaro et al. [2011], image processing and computer vision Peng et al. [2012] to mention a few. The key idea behind MF is that there is a latent data structure embedded in the high dimensional observed data which, once discovered, provides better capacity for learning. Formally, MF techniques aim to decompose an observed high-dimensional data matrix into its constitute lower-dimensional factorizing matrices (in general two). One of the factorizing matrices represents the lower-dimensional space and the other one represents the spread of latent data in that space. MF has been widely used as a unified technique for dimensionality reduction, clustering, and matrix completion. There are several variants of MF in the literature including basic MF (BMF), non-negative MF (NMF) and Orthogonal NMF (ONMF). BMF are those described using traditional matrix decomposition such as principal component analysis (PCA), vector quantization (VQ) and singular value decomposition (SVD).

artificial intelligence, data mining, machine learning, (16 more...)

2209.10881

Country:

North America > Canada > Quebec > Montreal (0.04)
Asia > Middle East > Oman > Muscat Governorate > Muscat (0.04)
North America > United States > Wisconsin (0.04)
(2 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

High-order Multi-view Clustering for Generic Data

Pan, Erlin, Kang, Zhao

Graph-based multi-view clustering has achieved better performance than most non-graph approaches. However, in many real-world scenarios, the graph structure of data is not given or the quality of initial graph is poor. Additionally, existing methods largely neglect the high-order neighborhood information that characterizes complex intrinsic interactions. To tackle these problems, we introduce an approach called high-order multi-view clustering (HMvC) to explore the topology structure information of generic data. Firstly, graph filtering is applied to encode structure information, which unifies the processing of attributed graph data and non-graph data in a single framework. Secondly, up to infinity-order intrinsic relationships are exploited to enrich the learned graph. Thirdly, to explore the consistent and complementary information of various views, an adaptive graph fusion mechanism is proposed to achieve a consensus graph. Comprehensive experimental results on both non-graph and attributed graph data show the superior performance of our method with respect to various state-of-the-art techniques, including some deep learning methods.

artificial intelligence, graph, machine learning, (15 more...)

2209.10838

Country:

Asia > China > Sichuan Province > Chengdu (0.04)
North America > United States > Illinois > Jackson County > Carbondale (0.04)
Asia > Macao (0.04)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)

arXiv.org Artificial IntelligenceSep-21-2022

Efficient Distribution Similarity Identification in Clustered Federated Learning via Principal Angles Between Client Data Subspaces

Vahidian, Saeed, Morafah, Mahdi, Wang, Weijia, Kungurtsev, Vyacheslav, Chen, Chen, Shah, Mubarak, Lin, Bill

Clustered federated learning (FL) has been shown to produce promising results by grouping clients into clusters. This is especially effective in scenarios where separate groups of clients have significant differences in the distributions of their local data. Existing clustered FL algorithms are essentially trying to group together clients with similar distributions so that clients in the same cluster can leverage each other's data to better perform federated learning. However, prior clustered FL algorithms attempt to learn these distribution similarities indirectly during training, which can be quite time consuming as many rounds of federated learning may be required until the formation of clusters is stabilized. In this paper, we propose a new approach to federated learning that directly aims to efficiently identify distribution similarities among clients by analyzing the principal angles between the client data subspaces. Each client applies a truncated singular value decomposition (SVD) step on its local data in a single-shot manner to derive a small set of principal vectors, which provides a signature that succinctly captures the main characteristics of the underlying distribution. This small set of principal vectors is provided to the server so that the server can directly identify distribution similarities among the clients to form clusters. This is achieved by comparing the similarities of the principal angles between the client data subspaces spanned by those principal vectors. The approach provides a simple, yet effective clustered FL framework that addresses a broad range of data heterogeneity issues beyond simpler forms of Non-IIDness like label skews. Our clustered FL approach also enables convergence guarantees for non-convex objectives. Our code is available at https://github.com/MMorafah/PACFL.

artificial intelligence, dataset, machine learning, (17 more...)

2209.10526

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Texas > Dallas County > Dallas (0.04)
North America > United States > Nevada > Clark County > Las Vegas (0.04)
(3 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Aghaieabiane, Niloofar, Koutis, Ioannis

SGC: A semi-supervised pipeline for gene clustering using self-training approach in gene co-expression networks

arXiv.org Artificial IntelligenceSep-21-2022

A widely used approach for extracting information from gene expression data employ the construction of a gene co-expression network and the subsequent application of algorithms that discover network structure. In particular, a common goal is the computational discovery of gene clusters, commonly called modules. When applied on a novel gene expression dataset, the quality of the computed modules can be evaluated automatically, using Gene Ontology enrichment, a method that measures the frequencies of Gene Ontology terms in the computed modules and evaluates their statistical likelihood. In this work we propose SGC a novel pipeline for gene clustering based on relatively recent seminal work in the mathematics of spectral network theory. SGC consists of multiple novel steps that enable the computation of highly enriched modules in an unsupervised manner. But unlike all existing frameworks, it further incorporates a novel step that leverages Gene Ontology information in a semi-supervised clustering method that further improves the quality of the computed modules. Comparing with already well-known existing frameworks, we show that SGC results in higher enrichment in real data. In particular, in 12 real gene expression datasets, SGC outperforms in all except one.

artificial intelligence, bioinformatics, machine learning, (16 more...)

2209.10545

Country:

North America > United States > New Jersey > Essex County > Newark (0.04)
North America > United States > Maryland > Montgomery County > Bethesda (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > France (0.04)

Genre:

Research Report > Experimental Study (0.50)
Research Report > New Finding (0.46)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.67)

Technology:

Information Technology > Biomedical Informatics > Translational Bioinformatics (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)