AITopics

2309.13544

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.15)
North America > United States > Virginia (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)

Genre: Research Report (0.65)

Industry: Media > Music (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.30)

Abbas, Mohamed, El-Zoghobi, Adel, Shoukry, Amin

DenMune: Density peak based clustering using mutual nearest neighbors

arXiv.org Artificial IntelligenceSep-23-2023

Many clustering algorithms fail when clusters are of arbitrary shapes, of varying densities, or the data classes are unbalanced and close to each other, even in two dimensions. A novel clustering algorithm, DenMune is presented to meet this challenge. It is based on identifying dense regions using mutual nearest neighborhoods of size K, where K is the only parameter required from the user, besides obeying the mutual nearest neighbor consistency principle. The algorithm is stable for a wide range of values of K. Moreover, it is able to automatically detect and remove noise from the clustering process as well as detecting the target clusters. It produces robust results on various low and high-dimensional datasets relative to several known state-of-the-art clustering algorithms.

algorithm, dataset, denmune, (16 more...)

doi: 10.1016/j.patcog.2020.107589

2309.1342

Country:

Africa > Middle East > Egypt > Alexandria Governorate > Alexandria (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > Finland > North Karelia > Joensuu (0.04)
Asia > Japan (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area (0.47)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Boulin, Alexis, Di Bernardino, Elena, Laloë, Thomas, Toulemonde, Gwladys

High-dimensional variable clustering based on sub-asymptotic maxima of a weakly dependent random process

arXiv.org Machine LearningSep-23-2023

We propose a new class of models for variable clustering called Asymptotic Independent block (AI-block) models, which defines population-level clusters based on the independence of the maxima of a multivariate stationary mixing random process among clusters. This class of models is identifiable, meaning that there exists a maximal element with a partial order between partitions, allowing for statistical inference. We also present an algorithm for recovering the clusters of variables without specifying the number of clusters \emph{a priori}. Our work provides some theoretical insights into the consistency of our algorithm, demonstrating that under certain conditions it can effectively identify clusters in the data with a computational complexity that is polynomial in the dimension. This implies that groups can be learned nonparametrically in which block maxima of a dependent process are only sub-asymptotic. To further illustrate the significance of our work, we applied our method to neuroscience and environmental real-datasets. These applications highlight the potential and versatility of the proposed approach.

algorithm, artificial intelligence, machine learning, (14 more...)

arXiv.org Machine Learning

2302.00934

Country:

Europe > Northern Europe (0.04)
Europe > France > Provence-Alpes-Côte d'Azur (0.04)
Europe > France > Occitanie > Hérault > Montpellier (0.04)
(8 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Data Science (0.92)

Chalaki, Mahdi Abdollah, Maroufi, Daniyal, Robati, Mahdi, Karimi, Mohammad Javad, Sadighi, Ali

An Intelligent Approach to Detecting Novel Fault Classes for Centrifugal Pumps Based on Deep CNNs and Unsupervised Methods

Despite the recent success in data-driven fault diagnosis of rotating machines, there are still remaining challenges in this field. Among the issues to be addressed, is the lack of information about variety of faults the system may encounter in the field. In this paper, we assume a partial knowledge of the system faults and use the corresponding data to train a convolutional neural network. A combination of t-SNE method and clustering techniques is then employed to detect novel faults. Upon detection, the network is augmented using the new data. Finally, a test setup is used to validate this two-stage methodology on a centrifugal pump and experimental results show high accuracy in detecting novel faults.

artificial intelligence, machine learning, neural network, (15 more...)

doi: 10.1109/icspis54653.2021.9729350

2309.12765

Country: Asia > Middle East > Iran (0.18)

Genre: Research Report > New Finding (0.34)

Industry: Energy > Oil & Gas > Upstream (0.62)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.90)

Des-q: a quantum algorithm to construct and efficiently retrain decision trees for regression and binary classification

Kumar, Niraj, Yalovetzky, Romina, Li, Changhao, Minssen, Pierre, Pistoia, Marco

Decision trees are widely used in machine learning due to their simplicity in construction and interpretability. However, as data sizes grow, traditional methods for constructing and retraining decision trees become increasingly slow, scaling polynomially with the number of training examples. In this work, we introduce a novel quantum algorithm, named Des-q, for constructing and retraining decision trees in regression and binary classification tasks. Assuming the data stream produces small increments of new training examples, we demonstrate that our Des-q algorithm significantly reduces the time required for tree retraining, achieving a poly-logarithmic time complexity in the number of training examples, even accounting for the time needed to load the new examples into quantum-accessible memory. Our approach involves building a decision tree algorithm to perform k-piecewise linear tree splits at each internal node. These splits simultaneously generate multiple hyperplanes, dividing the feature space into k distinct regions. To determine the k suitable anchor points for these splits, we develop an efficient quantum-supervised clustering method, building upon the q-means algorithm of Kerenidis et al. Des-q first efficiently estimates each feature weight using a novel quantum technique to estimate the Pearson correlation. Subsequently, we employ weighted distance estimation to cluster the training examples in k disjoint regions and then proceed to expand the tree using the same procedure. We benchmark the performance of the simulated version of our algorithm against the state-of-the-art classical decision tree for regression and binary classification on multiple data sets with numerical features. Further, we showcase that the proposed algorithm exhibits similar performance to the state-of-the-art decision tree while significantly speeding up the periodic tree retraining.

algorithm, decision tree, node, (17 more...)

2309.09976

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Israel (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)

Genre: Research Report > New Finding (0.45)

Industry:

Banking & Finance (1.00)
Telecommunications (0.67)
Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Mistry, Jimit, Arzeno, Natalia M.

Document Understanding for Healthcare Referrals

Reliance on scanned documents and fax communication for healthcare referrals leads to high administrative costs and errors that may affect patient care. In this work we propose a hybrid model leveraging LayoutLMv3 along with domain-specific rules to identify key patient, physician, and exam-related entities in faxed referral documents. We explore some of the challenges in applying a document understanding model to referrals, which have formats varying by medical practice, and evaluate model performance using MUC-5 metrics to obtain appropriate metrics for the practical use case. Our analysis shows the addition of domain-specific rules to the transformer model yields greatly increased precision and F1 scores, suggesting a hybrid model trained on a curated dataset can increase efficiency in referral management.

annotation, entity type, prediction, (17 more...)

2309.13184

Country: North America > United States > Maryland > Baltimore (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.35)

Multi-level Map Construction for Dynamic Scenes

Hu, Xinggang

In dynamic scenes, both localization and mapping in visual SLAM face significant challenges. In recent years, numerous outstanding research works have proposed effective solutions for the localization problem. However, there has been a scarcity of excellent works focusing on constructing long-term consistent maps in dynamic scenes, which severely hampers map applications. To address this issue, we have designed a multi-level map construction system tailored for dynamic scenes. In this system, we employ multi-object tracking algorithms, DBSCAN clustering algorithm, and depth information to rectify the results of object detection, accurately extract static point clouds, and construct dense point cloud maps and octree maps. We propose a plane map construction algorithm specialized for dynamic scenes, involving the extraction, filtering, data association, and fusion optimization of planes in dynamic environments, thus creating a plane map. Additionally, we introduce an object map construction algorithm targeted at dynamic scenes, which includes object parameterization, data association, and update optimization. Extensive experiments on public datasets and real-world scenarios validate the accuracy of the multi-level maps constructed in this study and the robustness of the proposed algorithms. Furthermore, we demonstrate the practical application prospects of our algorithms by utilizing the constructed object maps for dynamic object tracking.

algorithm, point cloud, point cloud map, (15 more...)

2308.04

Country:

Asia > China > Liaoning Province > Dalian (0.04)
Europe > Greece > Ionian Islands > Corfu (0.04)
Asia > China > Liaoning Province > Shenyang (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.35)

Schindler, Dominik J., Barahona, Mauricio

Persistent Homology of the Multiscale Clustering Filtration

arXiv.org Artificial IntelligenceSep-21-2023

In many applications in data clustering, it is desirable to find not just a single partition into clusters but a sequence of partitions describing the data at different scales, or levels of coarseness. A natural problem then is to analyse and compare the (not necessarily hierarchical) sequences of partitions that underpin such multiscale descriptions of data. Here, we introduce a filtration of abstract simplicial complexes, denoted the Multiscale Clustering Filtration (MCF), which encodes arbitrary patterns of cluster assignments across scales, and we prove that the MCF produces stable persistence diagrams. We then show that the zero-dimensional persistent homology of the MCF measures the degree of hierarchy in the sequence of partitions, and that the higher-dimensional persistent homology tracks the emergence and resolution of conflicts between cluster assignments across the sequence of partitions. To broaden the theoretical foundations of the MCF, we also provide an equivalent construction via a nerve complex filtration, and we show that in the hierarchical case, the MCF reduces to a Vietoris-Rips filtration of an ultrametric space. We briefly illustrate how the MCF can serve to characterise multiscale clustering structures in numerical experiments on synthetic data.

conflict, partition, sequence, (15 more...)

2305.04281

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Hungary > Hajdú-Bihar County > Debrecen (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.89)

arXiv.org Artificial IntelligenceSep-21-2023

Mean Shift Mask Transformer for Unseen Object Instance Segmentation

Lu, Yangxiao, Chen, Yuqiao, Ruozzi, Nicholas, Xiang, Yu

Segmenting unseen objects from images is a critical perception skill that a robot needs to acquire. In robot manipulation, it can facilitate a robot to grasp and manipulate unseen objects. Mean shift clustering is a widely used method for image segmentation tasks. However, the traditional mean shift clustering algorithm is not differentiable, making it difficult to integrate it into an end-to-end neural network training framework. In this work, we propose the Mean Shift Mask Transformer (MSMFormer), a new transformer architecture that simulates the von Mises-Fisher (vMF) mean shift clustering algorithm, allowing for the joint training and inference of both the feature extractor and the clustering. Its central component is a hypersphere attention mechanism, which updates object queries on a hypersphere. To illustrate the effectiveness of our method, we apply MSMFormer to unseen object instance segmentation. Our experiments show that MSMFormer achieves competitive performance compared to state-of-the-art methods for unseen object instance segmentation. The project page, appendix, video, and code are available at https://irvlutd.github.io/MSMFormer

decoder layer, mean shift, segmentation, (12 more...)

2211.11679

Country: North America > United States > Texas > Dallas County > Richardson (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Damstrup, Anne Sophie Riis, Madsen, Sofie Tosti, Coscia, Michele

Unsupervised Learning via Network-Aware Embeddings

arXiv.org Artificial IntelligenceSep-19-2023

Data clustering, the task of grouping observations according to their similarity, is a key component of unsupervised learning - with real world applications in diverse fields such as biology, medicine, and social science. Often in these fields the data comes with complex interdependencies between the dimensions of analysis, for instance the various characteristics and opinions people can have live on a complex social network. Current clustering methods are ill-suited to tackle this complexity: deep learning can approximate these dependencies, but not take their explicit map as the input of the analysis. In this paper, we aim at fixing this blind spot in the unsupervised learning literature. We can create network-aware embeddings by estimating the network distance between numeric node attributes via the generalized Euclidean distance. Differently from all methods in the literature that we know of, we do not cluster the nodes of the network, but rather its node attributes. In our experiments we show that having these network embeddings is always beneficial for the learning task; that our method scales to large networks; and that we can actually provide actionable insights in applications in a variety of fields such as marketing, economics, and political science. Our method is fully open source and data and code are available to reproduce all results in the paper. Finding patterns in unlabeled data - a task known as unsupervised learning - is useful when we need to build understanding from data Hastie et al. (2009). Unsupervised learning includes grouping observations into clusters according to some criterion represented by a quality or loss function Gan et al. (2020) - data clustering. Applications range from grouping of genes with related expression patterns in biology Ranade et al. (2001), finding patterns in tissue images in medicine Filipovych et al. (2011), or segment customers for marketing purposes. Popular data clustering algorithms include DBSCAN Ester et al. (1996), OPTICS Ankerst et al. (1999), k-Means, and more. Modern data clustering approaches rely on deep learning and specifically deep neural networks Aljalbout et al. (2018); Aggarwal et al. (2018); Pang et al. (2021); Ezugwu et al. (2022), or denoising with autoencoders Nawaz et al. (2022); Cai et al. (2022). However, these approaches work in (deformations of) Euclidean spaces - where dependencies between the dimensions of the analysis can be learned Mahalanobis (1936); Xie et al. (2016) -, but the problem to be tackled here is fundamentally non-Euclidean Bronstein et al. (2017). Graph Neural Networks (GNN) Scarselli et al. (2008); Wu et al. (2022); Zhou et al. (2020a) work in non-Euclidean settings, and they are the focus of this paper.

conference paper, dimension, node, (17 more...)

2309.10408

Country:

Europe > Denmark > Capital Region > Copenhagen (0.04)
North America > United States > Texas (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(11 more...)

Genre: Research Report > New Finding (0.34)

Industry:

Health & Medicine (0.93)
Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)