AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Machine Learning in R & Predictive Models

#artificialintelligenceMay-4-2021, 13:40:40 GMT

My course will be your complete guide to the theory and applications of supervised & unsupervised machine learning and predictive modeling using the R-programming language. Unlike other courses, it offers NOT ONLY the guided demonstrations of the R-scripts but also covers theoretical background that will allow you to FULLY UNDERSTAND & APPLY MACHINE LEARNING & PREDICTIVE MODELS (K-means, Random Forest, SVM, logistic regression, etc) in R (many R packages incl. This course also covers all the main aspects of practical and highly applied data science related to Machine Learning (classification & regressions) and unsupervised clustering techniques. Thus, if you take this course, you will save lots of time & money on other expensive materials in the R based Data Science and Machine Learning domain. In this age of big data, companies across the globe use R to analyze big volumes of data for business and research.

learning, machine learning, unsupervised machine learning, (7 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Industry: Education (0.33)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.96)

Add feedback

The Pursuit of Knowledge: Discovering and Localizing Novel Categories using Dual Memory

Rambhatla, Sai Saketh, Chellappa, Rama, Shrivastava, Abhinav

arXiv.org Artificial IntelligenceMay-4-2021

We tackle object category discovery, which is the problem of discovering and localizing novel objects in a large unlabeled dataset. While existing methods show results on datasets with less cluttered scenes and fewer object instances per image, we present our results on the challenging COCO dataset. Moreover, we argue that, rather than discovering new categories from scratch, discovery algorithms can benefit from identifying what is already known and focusing their attention on the unknown. We propose a method to use prior knowledge about certain object categories to discover new categories by leveraging two memory modules, namely Working and Semantic memory. We show the performance of our detector on the COCO minival dataset to demonstrate its in-the-wild capabilities.

computer vision, discovery, representation, (16 more...)

arXiv.org Artificial Intelligence

2105.01652

Country:

North America > United States > California > Los Angeles County > Long Beach (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
(9 more...)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)

Add feedback

Visually grounded models of spoken language: A survey of datasets, architectures and evaluation techniques

Chrupała, Grzegorz

arXiv.org Artificial IntelligenceMay-1-2021

This survey provides an overview of the evolution of visually grounded models of spoken language over the last 20 years. Such models are inspired by the observation that when children pick up a language, they rely on a wide range of indirect and noisy clues, crucially including signals from the visual modality co-occurring with spoken utterances. Several fields have made important contributions to this approach to modeling or mimicking the process of learning language: Machine Learning, Natural Language and Speech Processing, Computer Vision and Cognitive Science. The current paper brings together these contributions in order to provide a useful introduction and overview for practitioners in all these areas. We discuss the central research questions addressed, the timeline of developments, and the datasets which enabled much of this work. We then summarize the main modeling architectures and offer an exhaustive overview of the evaluation metrics and analysis techniques.

architecture, dataset, representation, (15 more...)

arXiv.org Artificial Intelligence

2104.13225

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > China > Hong Kong (0.04)
North America > United States > New York (0.04)
(6 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.66)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
(4 more...)

Add feedback

Learning for Detecting Norm Violation in Online Communities

Santos, Thiago Freitas dos, Osman, Nardine, Schorlemmer, Marco

arXiv.org Artificial IntelligenceApr-30-2021

In this paper, we focus on normative systems for online communities. The paper addresses the issue that arises when different community members interpret these norms in different ways, possibly leading to unexpected behavior in interactions, usually with norm violations that affect the individual and community experiences. To address this issue, we propose a framework capable of detecting norm violations and providing the violator with information about the features of their action that makes this action violate a norm. We build our framework using Machine Learning, with Logistic Model Trees as the classification algorithm. Since norm violations can be highly contextual, we train our model using data from the Wikipedia online community, namely data on Wikipedia edits. Our work is then evaluated with the Wikipedia use case where we focus on the norm that prohibits vandalism in Wikipedia edits.

artificial intelligence, machine learning, social media, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-031-16617-4_9

2104.14911

Country: Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)

Genre: Research Report (1.00)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.55)
Information Technology (0.47)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Seeing All From a Few: Nodes Selection Using Graph Pooling for Graph Clustering

Wang, Yiming, Chang, Dongxia, Fu, Zhiqian, Zhao, Yao

arXiv.org Artificial IntelligenceApr-30-2021

Graph clustering aiming to obtain a partition of data using the graph information, has received considerable attention in recent years. However, noisy edges and nodes in the graph may make the clustering results worse. In this paper, we propose a novel dual graph embedding network(DGEN) to improve the robustness of the graph clustering to the noisy nodes and edges. DGEN is designed as a two-step graph encoder connected by a graph pooling layer, which learns the graph embedding of the selected nodes. Based on the assumption that a node and its nearest neighbors should belong to the same cluster, we devise the neighbor cluster pooling(NCPool) to select the most informative subset of vertices based on the clustering assignments of nodes and their nearest neighbor. This can effectively alleviate the impact of the noise edge to the clustering. After obtaining the clustering assignments of the selected nodes, a classifier is trained using these selected nodes and the final clustering assignments for all the nodes can be obtained by this classifier. Experiments on three benchmark graph datasets demonstrate the superiority compared with several state-of-the-art algorithms.

graph, node, representation, (15 more...)

arXiv.org Artificial Intelligence

2105.0532

Country:

Asia > China > Beijing > Beijing (0.05)
North America > United States > Alaska > Anchorage Municipality > Anchorage (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Performance evaluation results of evolutionary clustering algorithm star for clustering heterogeneous datasets

Hassan, Bryar A., Rashid, TarikA., Mirjalili, Seyedali

arXiv.org Artificial IntelligenceApr-30-2021

This article presents the data used to evaluate the performance of evolutionary clustering algorithm star (ECA*) compared to five traditional and modern clustering algorithms. Two experimental methods are employed to examine the performance of ECA* against genetic algorithm for clustering++ (GENCLUST++), learning vector quantisation (LVQ) , expectation maximisation (EM) , K-means++ (KM++) and K-means (KM). These algorithms are applied to 32 heterogenous and multi-featured datasets to determine which one performs well on the three tests. For one, ther paper examines the efficiency of ECA* in contradiction of its corresponding algorithms using clustering evaluation measures. These validation criteria are objective function and cluster quality measures. For another, it suggests a performance rating framework to measurethe the performance sensitivity of these algorithms on varos dataset features (cluster dimensionality, number of clusters, cluster overlap, cluster shape and cluster structure). The contributions of these experiments are two-folds: (i) ECA* exceeds its counterpart aloriths in ability to find out the right cluster number; (ii) ECA* is less sensitive towards dataset features compared to its competitive techniques. Nonetheless, the results of the experiments performed demonstrate some limitations in the ECA*: (i) ECA* is not fully applied based on the premise that no prior knowledge exists; (ii) Adapting and utilising ECA* on several real applications has not been achieved yet.

algorithm, cluster quality measure, dataset, (12 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.dib.2021.107044

2105.0281

Country:

Asia > Middle East > Iraq > Kurdistan Region > Sulaymaniyah Governorate > Sulaymaniyah (0.14)
Oceania > Australia (0.04)
Asia > South Korea > Seoul > Seoul (0.04)
Asia > Middle East > Iraq > Erbil Governorate > Erbil (0.04)

Genre: Research Report (0.84)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Flattening Multiparameter Hierarchical Clustering Functors

Shiebler, Dan

arXiv.org Artificial IntelligenceApr-29-2021

We bring together topological data analysis, applied category theory, and machine learning to study multiparameter hierarchical clustering. We begin by introducing a procedure for flattening multiparameter hierarchical clusterings. We demonstrate that this procedure is a functor from a category of multiparameter hierarchical partitions to a category of binary integer programs. We also include empirical results demonstrating its effectiveness. Next, we introduce a Bayesian update algorithm for learning clustering parameters from data. We demonstrate that the composition of this algorithm with our flattening procedure satisfies a consistency property.

algorithm, binary integer program, functor, (12 more...)

arXiv.org Artificial Intelligence

2104.14734

Country: Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback

Extending Isolation Forest for Anomaly Detection in Big Data via K-Means

Laskar, Md Tahmid Rahman, Huang, Jimmy, Smetana, Vladan, Stewart, Chris, Pouw, Kees, An, Aijun, Chan, Stephen, Liu, Lei

arXiv.org Artificial IntelligenceApr-27-2021

Industrial Information Technology (IT) infrastructures are often vulnerable to cyberattacks. To ensure security to the computer systems in an industrial environment, it is required to build effective intrusion detection systems to monitor the cyber-physical systems (e.g., computer networks) in the industry for malicious activities. This paper aims to build such intrusion detection systems to protect the computer networks from cyberattacks. More specifically, we propose a novel unsupervised machine learning approach that combines the K-Means algorithm with the Isolation Forest for anomaly detection in industrial big data scenarios. Since our objective is to build the intrusion detection system for the big data scenario in the industrial domain, we utilize the Apache Spark framework to implement our proposed model which was trained in large network traffic data (about 123 million instances of network traffic) stored in Elasticsearch. Moreover, we evaluate our proposed model on the live streaming data and find that our proposed system can be used for real-time anomaly detection in the industrial setup. In addition, we address different challenges that we face while training our model on large datasets and explicitly describe how these issues were resolved. Based on our empirical evaluation in different use-cases for anomaly detection in real-world network traffic data, we observe that our proposed system is effective to detect anomalies in big data scenarios. Finally, we evaluate our proposed model on several academic datasets to compare with other models and find that it provides comparable performance with other state-of-the-art approaches.

dataset, detection, isolation forest, (11 more...)

arXiv.org Artificial Intelligence

2104.1319

Country:

North America > United States > Wisconsin (0.04)
South America > Paraguay > Asunción > Asunción (0.04)
North America > Canada > Ontario (0.04)
Asia (0.04)

Genre: Research Report > Promising Solution (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
(3 more...)

Add feedback

Phenotyping OSA: a time series analysis using fuzzy clustering and persistent homology

Loliencar, Prachi, Heo, Giseon

arXiv.org Machine LearningApr-27-2021

Sleep apnea is a disorder that has serious consequences for the pediatric population. There has been recent concern that traditional diagnosis of the disorder using the apnea-hypopnea index may be ineffective in capturing its multi-faceted outcomes. In this work, we take a first step in addressing this issue by phenotyping patients using a clustering analysis of airflow time series. This is approached in three ways: using feature-based fuzzy clustering in the time and frequency domains, and using persistent homology to study the signal from a topological perspective. The fuzzy clusters are analyzed in a novel manner using a Dirichlet regression analysis, while the topological approach leverages Takens embedding theorem to study the periodicity properties of the signals.

airflow signal, periodicity score, persistent homology, (14 more...)

arXiv.org Machine Learning

2104.13479

Country:

Europe > Austria > Vienna (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)
Europe > Switzerland > Basel-City > Basel (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Sleep (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Bridging observation, theory and numerical simulation of the ocean using Machine Learning

Sonnewald, Maike, Lguensat, Redouane, Jones, Daniel C., Dueben, Peter D., Brajard, Julien, Balaji, Venkatramani

arXiv.org Machine LearningApr-26-2021

Progress within physical oceanography has been concurrent with the increasing sophistication of tools available for its study. The incorporation of machine learning (ML) techniques offers exciting possibilities for advancing the capacity and speed of established methods and also for making substantial and serendipitous discoveries. Beyond vast amounts of complex data ubiquitous in many modern scientific fields, the study of the ocean poses a combination of unique challenges that ML can help address. The observational data available is largely spatially sparse, limited to the surface, and with few time series spanning more than a handful of decades. Important timescales span seconds to millennia, with strong scale interactions and numerical modelling efforts complicated by details such as coastlines. This review covers the current scientific insight offered by applying ML and points to where there is imminent potential. We cover the main three branches of the field: observations, theory, and numerical modelling. Highlighting both challenges and opportunities, we discuss both the historical context and salient ML tools. We focus on the use of ML in situ sampling and satellite observations, and the extent to which ML applications can advance theoretical oceanographic exploration, as well as aid numerical simulations. Applications that are also covered include model error and bias correction and current and potential use within data assimilation. While not without risk, there is great interest in the potential benefits of oceanographic ML applications; this review caters to this interest within the research community.

application, deep learning, upstream oil & gas, (19 more...)

arXiv.org Machine Learning

2104.12506

Country:

Europe > France (0.28)
North America > United States > New York (0.14)
Atlantic Ocean (0.14)
(5 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Energy > Oil & Gas > Upstream (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)

Add feedback