AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Online Learning for Mixture of Multivariate Hawkes Processes

Ghassemi, Mohsen, Dalmasso, Niccolò, Lamba, Simran, Potluru, Vamsi K., Shah, Sameena, Balch, Tucker, Veloso, Manuela

arXiv.org Artificial IntelligenceAug-16-2022

Online learning of Hawkes processes has received increasing attention in the last couple of years especially for modeling a network of actors. However, these works typically either model the rich interaction between the events or the latent cluster of the actors or the network structure between the actors. We propose to model the latent structure of the network of actors as well as their rich interaction across events for real-world settings of medical and financial applications.

data mining, hawke process, machine learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3533271.3561771

2208.07961

Country: North America > United States > Massachusetts (0.04)

Genre:

Research Report (0.50)
Instructional Material > Online (0.47)

Industry: Education > Educational Setting > Online (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

Add feedback

A Survey on Incomplete Multi-view Clustering

Wen, Jie, Zhang, Zheng, Fei, Lunke, Zhang, Bob, Xu, Yong, Zhang, Zhao, Li, Jinxing

arXiv.org Artificial IntelligenceAug-16-2022

Conventional multi-view clustering seeks to partition data into respective groups based on the assumption that all views are fully observed. However, in practical applications, such as disease diagnosis, multimedia analysis, and recommendation system, it is common to observe that not all views of samples are available in many cases, which leads to the failure of the conventional multi-view clustering methods. Clustering on such incomplete multi-view data is referred to as incomplete multi-view clustering. In view of the promising application prospects, the research of incomplete multi-view clustering has noticeable advances in recent years. However, there is no survey to summarize the current progresses and point out the future research directions. To this end, we review the recent studies of incomplete multi-view clustering. Importantly, we provide some frameworks to unify the corresponding incomplete multi-view clustering methods, and make an in-depth comparative analysis for some representative methods from theoretical and experimental perspectives. Finally, some open problems in the incomplete multi-view clustering field are offered for researchers.

consensus representation, graph, representation, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TSMC.2022.3192635

2208.0804

Country:

Asia > China > Guangdong Province > Shenzhen (0.04)
Asia > Singapore (0.04)
Asia > Macao (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Add feedback

Fast Heterogeneous Federated Learning with Hybrid Client Selection

Shen, Guangyuan, Gao, Dehong, Song, Duanxiao, Yang, Libin, Zhou, Xukai, Pan, Shirui, Lou, Wei, Zhou, Fang

arXiv.org Artificial IntelligenceAug-16-2022

Client selection schemes are widely adopted to handle the communication-efficient problems in recent studies of Federated Learning (FL). However, the large variance of the model updates aggregated from the randomly-selected unrepresentative subsets directly slows the FL convergence. We present a novel clustering-based client selection scheme to accelerate the FL convergence by variance reduction. Simple yet effective schemes are designed to improve the clustering effect and control the effect fluctuation, therefore, generating the client subset with certain representativeness of sampling. Theoretically, we demonstrate the improvement of the proposed scheme in variance reduction. We also present the tighter convergence guarantee of the proposed method thanks to the variance reduction. Experimental results confirm the exceed efficiency of our scheme compared to alternatives.

gradient, hcsfed, selection, (14 more...)

arXiv.org Artificial Intelligence

2208.05135

Country:

North America > United States > Virginia (0.04)
Asia > China > Hong Kong (0.04)
Oceania > Australia (0.04)

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Applied Unsupervised Learning with R: Uncover hidden relationships and patterns with k-means clustering, hierarchical clustering, and PCA: Malik, Alok, Tuckfield, Bradford: 9781789956399: Amazon.com: Books

#artificialintelligenceAug-15-2022, 10:40:31 GMT

Applied Unsupervised Learning with R: Uncover hidden relationships and patterns with k-means clustering, hierarchical clustering, and PCA [Malik, Alok, Tuckfield, Bradford] on Amazon.com. *FREE* shipping on qualifying offers. Applied Unsupervised Learning with R: Uncover hidden relationships and patterns with k-means clustering, hierarchical clustering, and PCA

applied unsupervised learning, relationship and pattern, tuckfield, (15 more...)

#artificialintelligence

Industry: Retail > Online (0.60)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

DendroMap: Visual Exploration of Large-Scale Image Datasets for Machine Learning with Treemaps

Bertucci, Donald, Hamid, Md Montaser, Anand, Yashwanthi, Ruangrotsakun, Anita, Tabatabai, Delyar, Perez, Melissa, Kahng, Minsuk

arXiv.org Artificial IntelligenceAug-15-2022

ML practitioners often explore image datasets by generating a grid of images or projecting high-dimensional representations of images into 2-D using dimensionality reduction techniques (e.g., t-SNE). However, neither approach effectively scales to large datasets because images are ineffectively organized and interactions are insufficiently supported. To address these challenges, we develop DendroMap by adapting Treemaps, a well-known visualization technique. DendroMap effectively organizes images by extracting hierarchical cluster structures from high-dimensional representations of images. It enables users to make sense of the overall distributions of datasets and interactively zoom into specific areas of interests at multiple levels of abstraction. Our case studies with widely-used image datasets for deep learning demonstrate that users can discover insights about datasets and trained models by examining the diversity of images, identifying underperforming subgroups, and analyzing classification errors. We conducted a user study that evaluates the effectiveness of DendroMap in grouping and searching tasks by comparing it with a gridified version of t-SNE and found that participants preferred DendroMap.

dataset, dendromap, participant, (14 more...)

arXiv.org Artificial Intelligence

2205.06935

Country:

North America > United States > Oregon (0.04)
Asia (0.04)

Genre:

Questionnaire & Opinion Survey (1.00)
Overview (0.93)
Research Report > New Finding (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Robotic Inspection and Characterization of Subsurface Defects on Concrete Structures Using Impact Sounding

Hoxha, Ejup, Feng, Jinglun, Sanakov, Diar, Gjinofci, Ardian, Xiao, Jizhong

arXiv.org Artificial IntelligenceAug-12-2022

Impact-sounding (IS) and impact-echo (IE) are well-developed non-destructive evaluation (NDE) methods that are widely used for inspections of concrete structures to ensure the safety and sustainability. However, it is a tedious work to collect IS and IE data along grid lines covering a large target area for characterization of subsurface defects. On the other hand, data processing is very complicated that requires domain experts to interpret the results. To address the above problems, we present a novel robotic inspection system named as Impact-Rover to automate the data collection process and introduce data analytics software to visualize the inspection result allowing regular non-professional people to understand. The system consists of three modules: 1) a robotic platform with vertical mobility to collect IS and IE data in hard-to-reach locations, 2) vision-based positioning module that fuses the RGB-D camera, IMU and wheel encoder to estimate the 6-DOF pose of the robot, 3) a data analytics software module for processing the IS data to generate defect maps. The Impact-Rover hosts both IE and IS devices on a sliding mechanism and can perform move-stop-sample operations to collect multiple IS and IE data at adjustable spacing. The robot takes samples much faster than the manual data collection method because it automatically takes the multiple measurements along a straight line and records the locations. This paper focuses on reporting experimental results on IS. We calculate features and use unsupervised learning methods for analyzing the data. By combining the pose generated by our vision-based localization module and the position of the head of the sliding mechanism we can generate maps of possible defects. The results on concrete slabs demonstrate that our impact-sounding system can effectively reveal shallow defects.

artificial intelligence, defect, machine learning, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.12783/shm2021/36339

2208.06305

Country:

Europe > Kosovo > District of Pristina > Pristina (0.14)
Europe > Switzerland > Basel-City > Basel (0.05)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry:

Materials > Construction Materials (0.85)
Construction & Engineering (0.71)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)

Add feedback

Scholastic: Graphical Human-Al Collaboration for Inductive and Interpretive Text Analysis

Hong, Matt-Heun, Marsh, Lauren A., Feuston, Jessica L., Ruppert, Janet, Brubaker, Jed R., Szafir, Danielle Albers

arXiv.org Artificial IntelligenceAug-12-2022

Interpretive scholars generate knowledge from text corpora by manually sampling documents, applying codes, and refining and collating codes into categories until meaningful themes emerge. Given a large corpus, machine learning could help scale this data sampling and analysis, but prior research shows that experts are generally concerned about algorithms potentially disrupting or driving interpretive scholarship. We take a human-centered design approach to addressing concerns around machine-assisted interpretive research to build Scholastic, which incorporates a machine-in-the-loop clustering algorithm to scaffold interpretive text analysis. As a scholar applies codes to documents and refines them, the resulting coding schema serves as structured metadata which constrains hierarchical document and word clusters inferred from the corpus. Interactive visualizations of these clusters can help scholars strategically sample documents further toward insights. Scholastic demonstrates how human-centered algorithm design and visualizations employing familiar metaphors can support inductive and interpretive research methodologies through interactive topic modeling and document clustering.

category, code and category, scholastic, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3526113.3545681

2208.06133

Country:

North America > United States > Colorado > Boulder County > Boulder (0.14)
Asia > Middle East > Jordan (0.04)
North America > United States > New York > New York County > New York City (0.04)
(4 more...)

Genre: Research Report (0.82)

Industry: Education (0.48)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)

Add feedback

Collective Obfuscation and Crowdsourcing

Laufer, Benjamin, Grupen, Niko A.

arXiv.org Artificial IntelligenceAug-12-2022

Crowdsourcing technologies rely on groups of people to input information that may be critical for decision-making. This work examines obfuscation in the context of reporting technologies. We show that widespread use of reporting platforms comes with unique security and privacy implications, and introduce a threat model and corresponding taxonomy to outline some of the many attack vectors in this space. We then perform an empirical analysis of a dataset of call logs from a controversial, real-world reporting hotline and identify coordinated obfuscation strategies that are intended to hinder the platform's legitimacy. We propose a variety of statistical measures to quantify the strength of this obfuscation strategy with respect to the structural and semantic characteristics of the reporting attacks in our dataset.

non-spam report, platform, spam, (12 more...)

arXiv.org Artificial Intelligence

2208.06405

Country:

North America > United States > Texas (0.05)
North America > United States > Maryland > Baltimore (0.04)
North America > United States > Colorado > Denver County > Denver (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
(2 more...)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Communications > Social Media > Crowdsourcing (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Add feedback

fairDMS: Rapid Model Training by Data and Model Reuse

Ali, Ahsan, Sharma, Hemant, Kettimuthu, Rajkumar, Kenesei, Peter, Trujillo, Dennis, Miceli, Antonino, Foster, Ian, Coffee, Ryan, Thayer, Jana, Liu, Zhengchun

arXiv.org Artificial IntelligenceAug-11-2022

Extracting actionable information rapidly from data produced by instruments such as the Linac Coherent Light Source (LCLS-II) and Advanced Photon Source Upgrade (APS-U) is becoming ever more challenging due to high (up to TB/s) data rates. Conventional physics-based information retrieval methods are hard-pressed to detect interesting events fast enough to enable timely focusing on a rare event or correction of an error. Machine learning~(ML) methods that learn cheap surrogate classifiers present a promising alternative, but can fail catastrophically when changes in instrument or sample result in degradation in ML performance. To overcome such difficulties, we present a new data storage and ML model training architecture designed to organize large volumes of data and models so that when model degradation is detected, prior models and/or data can be queried rapidly and a more suitable model retrieved and fine-tuned for new conditions. We show that our approach can achieve up to 100x data labelling speedup compared to the current state-of-the-art, 200x improvement in training speed, and 92x speedup in-terms of end-to-end model updating time.

dataset, experiment, fairdm, (14 more...)

arXiv.org Artificial Intelligence

2204.09805

Country:

North America > United States > Illinois > Cook County > Lemont (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > New York (0.04)
North America > United States > California > San Mateo County > Menlo Park (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Energy (0.68)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)

Add feedback

Clustering Optimisation Method for Highly Connected Biological Data

Tjörnhammar, Richard

arXiv.org Artificial IntelligenceAug-11-2022

Currently, data-driven discovery in biological sciences resides in finding segmentation strategies in multivariate data that produce sensible descriptions of the data. Clustering is but one of several approaches and sometimes falls short because of difficulties in assessing reasonable cutoffs, the number of clusters that need to be formed or that an approach fails to preserve topological properties of the original system in its clustered form. In this work, we show how a simple metric for connectivity clustering evaluation leads to an optimised segmentation of biological data. The novelty of the work resides in the creation of a simple optimisation method for clustering crowded data. The resulting clustering approach only relies on metrics derived from the inherent properties of the clustering. The new method facilitates knowledge for optimised clustering, which is easy to implement. We discuss how the clustering optimisation strategy corresponds to the viable information content yielded by the final segmentation. We further elaborate on how the clustering results, in the optimal solution, corresponds to prior knowledge of three different data sets.

correspond, dim, matrix, (14 more...)

arXiv.org Artificial Intelligence

2208.0472

Country:

North America > United States (0.14)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback