AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Natural-Language Multi-Agent Simulations of Argumentative Opinion Dynamics

Betz, Gregor

arXiv.org Artificial IntelligenceApr-14-2021

This paper develops a natural-language agent-based model of argumentation (ABMA). Its artificial deliberative agents (ADAs) are constructed with the help of so-called neural language models recently developed in AI and computational linguistics. ADAs are equipped with a minimalist belief system and may generate and submit novel contributions to a conversation. The natural-language ABMA allows us to simulate collective deliberation in English, i.e. with arguments, reasons, and claims themselves -- rather than with their mathematical representations (as in formal models). This paper uses the natural-language ABMA to test the robustness of formal reason-balancing models of argumentation [Maes & Flache 2013, Singer et al. 2019]: First of all, as long as ADAs remain passive, confirmation bias and homophily updating trigger polarization, which is consistent with results from formal models. However, once ADAs start to actively generate new contributions, the evolution of a conservation is dominated by properties of the agents *as authors*. This suggests that the creation of new arguments, reasons, and claims critically affects a conversation and is of pivotal importance for understanding the dynamics of collective deliberation. The paper closes by pointing out further fruitful applications of the model and challenges for future research.

agent, language model, legalization, (15 more...)

arXiv.org Artificial Intelligence

2104.06737

Country:

Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Netherlands > South Holland > Dordrecht (0.04)
Europe > France (0.04)

Genre: Research Report (0.64)

Industry:

Law (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Consumer Health (0.94)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Solving weakly supervised regression problem using low-rank manifold regularization

Berikov, Vladimir, Litvinenko, Alexander

arXiv.org Machine LearningApr-13-2021

We solve a weakly supervised regression problem. Under "weakly" we understand that for some training points the labels are known, for some unknown, and for others uncertain due to the presence of random noise or other reasons such as lack of resources. The solution process requires to optimize a certain objective function (the loss function), which combines manifold regularization and low-rank matrix decomposition techniques. These low-rank approximations allow us to speed up all matrix calculations and reduce storage requirements. This is especially crucial for large datasets. Ensemble clustering is used for obtaining the co-association matrix, which we consider as the similarity matrix. The utilization of these techniques allows us to increase the quality and stability of the solution. In the numerical section, we applied the suggested method to artificial and real datasets using Monte-Carlo modeling.

algorithm, ensemble, matrix, (15 more...)

arXiv.org Machine Learning

2104.06548

Country:

Asia > Russia > Siberian Federal District > Novosibirsk Oblast > Novosibirsk (0.05)
Europe > Russia (0.04)
North America > United States > New York > New York County > New York City (0.04)
(4 more...)

Genre: Research Report (0.82)

Industry: Health & Medicine > Diagnostic Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

Add feedback

CDF Transform-and-Shift: An effective way to deal with datasets of inhomogeneous cluster densities

Zhu, Ye, Ting, Kai Ming, Carman, Mark, Angelova, Maia

arXiv.org Artificial IntelligenceApr-12-2021

The problem of inhomogeneous cluster densities has been a long-standing issue for distance-based and density-based algorithms in clustering and anomaly detection. These algorithms implicitly assume that all clusters have approximately the same density. As a result, they often exhibit a bias towards dense clusters in the presence of sparse clusters. Many remedies have been suggested; yet, we show that they are partial solutions which do not address the issue satisfactorily. To match the implicit assumption, we propose to transform a given dataset such that the transformed clusters have approximately the same density while all regions of locally low density become globally low density -- homogenising cluster density while preserving the cluster structure of the dataset. We show that this can be achieved by using a new multi-dimensional Cumulative Distribution Function in a transform-and-shift method. The method can be applied to every dataset, before the dataset is used in many existing algorithms to match their implicit assumption without algorithmic modification. We show that the proposed method performs better than existing remedies.

algorithm, cdf-ts, dataset, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.patcog.2021.107977

1810.02897

Country:

Oceania > Australia > Victoria (0.04)
North America > United States > New York > New York County > New York City (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)
(3 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.98)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.73)

Add feedback

Interpretable Methods for Identifying Product Variants

West, Rebecca, Jadda, Khalifeh Al, Ahsan, Unaiza, Qu, Huiming, Cui, Xiquan

arXiv.org Artificial IntelligenceApr-12-2021

For e-commerce companies with large product selections, the organization and grouping of products in meaningful ways is important for creating great customer shopping experiences and cultivating an authoritative brand image. One important way of grouping products is to identify a family of product variants, where the variants are mostly the same with slight and yet distinct differences (e.g. color or pack size). In this paper, we introduce a novel approach to identifying product variants. It combines both constrained clustering and tailored NLP techniques (e.g. extraction of product family name from unstructured product title and identification of products with similar model numbers) to achieve superior performance compared with an existing baseline using a vanilla classification approach. In addition, we design the algorithm to meet certain business criteria, including meeting high accuracy requirements on a wide range of categories (e.g. appliances, decor, tools, and building materials, etc.) as well as prioritizing the interpretability of the model to make it accessible and understandable to all business partners.

constraint, product variant, variant, (15 more...)

arXiv.org Artificial Intelligence

2104.05504

Country:

Asia > Taiwan > Taiwan Province > Taipei (0.06)
North America > United States > Georgia > Fulton County > Atlanta (0.05)
North America > United States > New York > New York County > New York City (0.04)

Genre:

Research Report (0.84)
Overview (0.66)

Industry:

Materials > Construction Materials (0.54)
Information Technology > Services > e-Commerce Services (0.35)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Pose Invariant Person Re-Identification using Robust Pose-transformation GAN

Karmakar, Arnab, Mishra, Deepak

arXiv.org Artificial IntelligenceApr-11-2021

Person re-identification (re-ID) aims to retrieve a person's images from an image gallery, given a single instance of the person of interest. Despite several advancements, learning discriminative identity-sensitive and viewpoint invariant features for robust Person Re-identification is a major challenge owing to large pose variation of humans. This paper proposes a re-ID pipeline that utilizes the image generation capability of Generative Adversarial Networks combined with pose regression and feature fusion to achieve pose invariant feature learning. The objective is to model a given person under different viewpoints and large pose changes and extract the most discriminative features from all the appearances. The pose transformational GAN (pt-GAN) module is trained to generate a person's image in any given pose. In order to identify the most significant poses for discriminative feature extraction, a Pose Regression module is proposed. The given instance of the person is modelled in varying poses and these features are effectively combined through the Feature Fusion Network. The final re-ID model consisting of these 3 sub-blocks, alleviates the pose dependence in person re-ID and outperforms the state-of-the-art GAN based models for re-ID in 4 benchmark datasets. The proposed model is robust to occlusion, scale and illumination, thereby outperforms the state-of-the-art models in terms of improvement over baseline.

dataset, person re-identification, proceedings, (8 more...)

arXiv.org Artificial Intelligence

2105.0093

Country: Asia > India > Karnataka > Bengaluru (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Add feedback

Uncover Residential Energy Consumption Patterns Using Socioeconomic and Smart Meter Data

Tang, Wenjun, Wang, Hao, Lee, Xian-Long, Yang, Hong-Tzer

arXiv.org Artificial IntelligenceApr-11-2021

This paper models residential consumers' energy-consumption behavior by load patterns and distributions and reveals the relationship between consumers' load patterns and socioeconomic features by machine learning. We analyze the real-world smart meter data and extract load patterns using K-Medoids clustering, which is robust to outliers. We develop an analytical framework with feature selection and deep learning models to estimate the relationship between load patterns and socioeconomic features. Specifically, we use an entropy-based feature selection method to identify the critical socioeconomic characteristics that affect load patterns and benefit our method's interpretability. We further develop a customized deep neural network model to characterize the relationship between consumers' load patterns and selected socioeconomic features. Numerical studies validate our proposed framework using Pecan Street smart meter data and survey. We demonstrate that our framework can capture the relationship between load patterns and socioeconomic information and outperform benchmarks such as regression and single DNN models.

load pattern, load profile, socioeconomic factor, (14 more...)

arXiv.org Artificial Intelligence

2104.05154

Country:

Europe (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States (0.04)

Genre: Research Report (1.00)

Industry: Energy > Power Industry (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Pseudo-supervised Deep Subspace Clustering

Lv, Juncheng, Kang, Zhao, Lu, Xiao, Xu, Zenglin

arXiv.org Artificial IntelligenceApr-8-2021

Auto-Encoder (AE)-based deep subspace clustering (DSC) methods have achieved impressive performance due to the powerful representation extracted using deep neural networks while prioritizing categorical separability. However, self-reconstruction loss of an AE ignores rich useful relation information and might lead to indiscriminative representation, which inevitably degrades the clustering performance. It is also challenging to learn high-level similarity without feeding semantic labels. Another unsolved problem facing DSC is the huge memory cost due to $n\times n$ similarity matrix, which is incurred by the self-expression layer between an encoder and decoder. To tackle these problems, we use pairwise similarity to weigh the reconstruction loss to capture local structure information, while a similarity is learned by the self-expression layer. Pseudo-graphs and pseudo-labels, which allow benefiting from uncertain knowledge acquired during network training, are further employed to supervise similarity learning. Joint learning and iterative training facilitate to obtain an overall optimal solution. Extensive experiments on benchmark datasets demonstrate the superiority of our approach. By combining with the $k$-nearest neighbors algorithm, we further show that our method can address the large-scale and out-of-sample problems.

dataset, representation, subspace, (16 more...)

arXiv.org Artificial Intelligence

2104.03531

Country:

North America > United States (0.14)
Asia > Middle East > Jordan (0.04)
Asia > China > Sichuan Province > Chengdu (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.66)

Add feedback

The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly Articles

Salatino, Angelo A., Osborne, Francesco, Thanapalasingam, Thiviyan, Motta, Enrico

arXiv.org Artificial IntelligenceApr-2-2021

Classifying research papers according to their research topics is an important task to improve their retrievability, assist the creation of smart analytics, and support a variety of approaches for analysing and making sense of the research environment. In this paper, we present the CSO Classifier, a new unsupervised approach for automatically classifying research papers according to the Computer Science Ontology (CSO), a comprehensive ontology of re-search areas in the field of Computer Science. The CSO Classifier takes as input the metadata associated with a research paper (title, abstract, keywords) and returns a selection of research concepts drawn from the ontology. The approach was evaluated on a gold standard of manually annotated articles yielding a significant improvement over alternative methods.

cso classifier, gold standard, module, (10 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-030-30760-8_26

2104.00948

Country:

North America > United States > Texas > Tarrant County > Fort Worth (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Monterey County > Monterey (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

Add feedback

Streaming Social Event Detection and Evolution Discovery in Heterogeneous Information Networks

Peng, Hao, Li, Jianxin, Song, Yangqiu, Yang, Renyu, Ranjan, Rajiv, Yu, Philip S., He, Lifang

arXiv.org Artificial IntelligenceApr-1-2021

Events are happening in real-world and real-time, which can be planned and organized for occasions, such as social gatherings, festival celebrations, influential meetings or sports activities. Social media platforms generate a lot of real-time text information regarding public events with different topics. However, mining social events is challenging because events typically exhibit heterogeneous texture and metadata are often ambiguous. In this paper, we first design a novel event-based meta-schema to characterize the semantic relatedness of social events and then build an event-based heterogeneous information network (HIN) integrating information from external knowledge base. Second, we propose a novel Pairwise Popularity Graph Convolutional Network, named as PP-GCN, based on weighted meta-path instance similarity and textual semantic representation as inputs, to perform fine-grained social event categorization and learn the optimal weights of meta-paths in different tasks. Third, we propose a streaming social event detection and evolution discovery framework for HINs based on meta-path similarity search, historical information about meta-paths, and heterogeneous DBSCAN clustering method. Comprehensive experiments on real-world streaming social text data are conducted to compare various social event detection and evolution discovery algorithms. Experimental results demonstrate that our proposed framework outperforms other alternative social event detection and evolution discovery techniques.

discovery, event detection, proceedings, (13 more...)

arXiv.org Artificial Intelligence

2104.00853

Country:

Europe > France > Île-de-France > Paris > Paris (0.14)
Asia > China > Beijing > Beijing (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(7 more...)

Genre: Research Report > New Finding (0.48)

Industry: Leisure & Entertainment > Social Events (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
(2 more...)

Add feedback

8 ways to jump-start your machine learning

#artificialintelligenceMar-31-2021, 14:16:55 GMT

From exploratory data analysis to automated machine learning, look to these techniques to get your data science project moving — and to build better models.

data analysis, hyperparameter, learning, (14 more...)

#artificialintelligence

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.30)

Add feedback