AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Nearly-Optimal Hierarchical Clustering for Well-Clustered Graphs

Laenen, Steinar, Manghiuc, Bogdan-Adrian, Sun, He

arXiv.org Artificial IntelligenceJun-16-2023

Hierarchical clustering (HC) is the recursive partitioning of a dataset into increasingly smaller clusters via an effective binary tree representation, and has been employed as a standard package in data analysis with widespread applications in practice. Traditional HC algorithms are typically based on agglomerative heuristics and, due to the lack of a clear objective function, there was limited work on their analysis. Dasgupta [Das16] introduced a simple cost function for hierarchical clustering, and this work has inspired a number of algorithmic studies on hierarchical clustering. In this paper we study efficient hierarchical clustering for graphs with a clear structure of clusters.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2306.0995

Country:

Asia > Afghanistan > Parwan Province > Charikar (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Correlation Clustering of Bird Sounds

Stein, David, Andres, Bjoern

arXiv.org Artificial IntelligenceJun-16-2023

Bird sound classification is the task of relating any sound recording to those species of bird that can be heard in the recording. Here, we study bird sound clustering, the task of deciding for any pair of sound recordings whether the same species of bird can be heard in both. We address this problem by first learning, from a training set, probabilities of pairs of recordings being related in this way, and then inferring a maximally probable partition of a test set by correlation clustering. We address the following questions: How accurate is this clustering, compared to a classification of the test set? How do the clusters thus inferred relate to the clusters obtained by classification? How accurate is this clustering when applied to recordings of bird species not heard during training? How effective is this clustering in separating, from bird sounds, environmental noise not heard during training?

artificial intelligence, classification, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2306.09906

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.69)

Add feedback

Error-Tolerant Exact Query Learning of Finite Set Partitions with Same-Cluster Oracle

DePavia, Adela Frances, del Campo, Olga Medrano Martín, Tani, Erasmo

arXiv.org Artificial IntelligenceJun-16-2023

This paper initiates the study of active learning for exact recovery of partitions exclusively through access to a same-cluster oracle in the presence of bounded adversarial error. We first highlight a novel connection between learning partitions and correlation clustering. Then we use this connection to build a R\'enyi-Ulam style analytical framework for this problem, and prove upper and lower bounds on its worst-case query complexity. Further, we bound the expected performance of a relevant randomized algorithm. Finally, we study the relationship between adaptivity and query complexity for this problem and related variants.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2305.13402

Country:

North America > United States > Nevada > Clark County > Las Vegas (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > France (0.04)
(4 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.46)

Add feedback

Energy Trees: Regression and Classification With Structured and Mixed-Type Covariates

Giubilei, Riccardo, Padellini, Tullia, Brutti, Pierpaolo

arXiv.org Machine LearningJun-15-2023

The increasing complexity of data requires methods and models that can effectively handle intricate structures, as simplifying them would result in loss of information. While several analytical tools have been developed to work with complex data objects in their original form, these tools are typically limited to single-type variables. In this work, we propose energy trees as a regression and classification model capable of accommodating structured covariates of various types. Energy trees leverage energy statistics to extend the capabilities of conditional inference trees, from which they inherit sound statistical foundations, interpretability, scale invariance, and freedom from distributional assumptions. We specifically focus on functional and graph-structured covariates, while also highlighting the model's flexibility in integrating other variable types. Extensive simulation studies demonstrate the model's competitive performance in terms of variable selection and robustness to overfitting. Finally, we assess the model's predictive ability through two empirical analyses involving human biological data. Energy trees are implemented in the R package etree.

artificial intelligence, covariate, machine learning, (15 more...)

arXiv.org Machine Learning

2207.0443

Country:

Europe > Italy > Lazio > Rome (0.04)
Europe > United Kingdom (0.04)

Genre:

Research Report > Experimental Study (0.69)
Research Report > New Finding (0.68)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

A Gromov--Wasserstein Geometric View of Spectrum-Preserving Graph Coarsening

Chen, Yifan, Yao, Rentian, Yang, Yun, Chen, Jie

arXiv.org Artificial IntelligenceJun-15-2023

Graph coarsening is a technique for solving large-scale graph problems by working on a smaller version of the original graph, and possibly interpolating the results back to the original graph. It has a long history in scientific computing and has recently gained popularity in machine learning, particularly in methods that preserve the graph spectrum. This work studies graph coarsening from a different perspective, developing a theory for preserving graph distances and proposing a method to achieve this. The geometric approach is useful when working with a collection of graphs, such as in graph classification and regression. In this study, we consider a graph as an element on a metric space equipped with the Gromov--Wasserstein (GW) distance, and bound the difference between the distance of two graphs and their coarsened versions. Minimizing this difference can be done using the popular weighted kernel $K$-means method, which improves existing spectrum-preserving methods with the proper choice of the kernel. The study includes a set of experiments to support the theory and method, including approximating the GW distance, preserving the graph spectrum, classifying graphs using spectral information, and performing regression using graph convolutional networks. Code is available at https://github.com/ychen-stat-ml/GW-Graph-Coarsening .

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2306.08854

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Ohio > Hamilton County > Cincinnati (0.04)
North America > United States > Illinois > Champaign County > Urbana (0.04)
(11 more...)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)

Add feedback

Integrating machine learning paradigms and mixed-integer model predictive control for irrigation scheduling

Agyeman, Bernard T., Naouri, Mohamed, Appels, Willemijn, Liu, Jinfeng, Shah, Sirish L.

arXiv.org Artificial IntelligenceJun-14-2023

The agricultural sector currently faces significant challenges in water resource conservation and crop yield optimization, primarily due to concerns over freshwater scarcity. Traditional irrigation scheduling methods often prove inadequate in meeting the needs of large-scale irrigation systems. To address this issue, this paper proposes a predictive irrigation scheduler that leverages the three paradigms of machine learning to optimize irrigation schedules. The proposed scheduler employs the k-means clustering approach to divide the field into distinct irrigation management zones based on soil hydraulic parameters and topology information. Furthermore, a long short-term memory network is employed to develop dynamic models for each management zone, enabling accurate predictions of soil moisture dynamics. Formulated as a mixed-integer model predictive control problem, the scheduler aims to maximize water uptake while minimizing overall water consumption and irrigation costs. To tackle the mixed-integer optimization challenge, the proximal policy optimization algorithm is utilized to train a reinforcement learning agent responsible for making daily irrigation decisions. To evaluate the performance of the proposed scheduler, a 26.4-hectare field in Lethbridge, Canada, was chosen as a case study for the 2015 and 2022 growing seasons. The results demonstrate the superiority of the proposed scheduler compared to a traditional irrigation scheduling method in terms of water use efficiency and crop yield improvement for both growing seasons. Notably, the proposed scheduler achieved water savings ranging from 6.4% to 22.8%, along with yield increases ranging from 2.3% to 4.3%.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2306.08715

Country:

North America > Canada > Alberta > Census Division No. 2 > Lethbridge County > Lethbridge (0.34)
Asia > Japan (0.14)

Genre: Research Report > New Finding (0.66)

Industry:

Food & Agriculture > Agriculture (1.00)
Energy > Oil & Gas > Upstream (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

Identification of Energy Management Configuration Concepts from a Set of Pareto-optimal Solutions

Lanfermann, Felix, Liu, Qiqi, Jin, Yaochu, Schmitt, Sebastian

arXiv.org Artificial IntelligenceJun-14-2023

Optimizing building configurations for an efficient use of energy is increasingly receiving attention by current research and several methods have been developed to address this task. Selecting a suitable configuration based on multiple conflicting objectives, such as initial investment cost, recurring cost, robustness with respect to uncertainty of grid operation is, however, a difficult multi-criteria decision making problem. Concept identification can facilitate a decision maker by sorting configuration options into semantically meaningful groups (concepts), further introducing constraints to meet trade-off expectations for a selection of objectives. In this study, for a set of 20000 Pareto-optimal building energy management configurations, resulting from a many-objective evolutionary optimization, multiple concept identification iterations are conducted to provide a basis for making an informed investment decision. In a series of subsequent analysis steps, it is shown how the choice of description spaces, i.e., the partitioning of the features into sets for which consistent and non-overlapping concepts are required, impacts the type of information that can be extracted and that different setups of description spaces illuminate several different aspects of the configuration data - an important aspect that has not been addressed in previous work.

artificial intelligence, description space, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2306.08318

Country:

Europe > Germany (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
Asia > China > Tianjin Province > Tianjin (0.04)

Genre: Research Report (0.70)

Industry: Energy > Power Industry (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Unraveling the ARC Puzzle: Mimicking Human Solutions with Object-Centric Decision Transformer

Park, Jaehyun, Im, Jaegyun, Hwang, Sanha, Lim, Mintaek, Ualibekova, Sabina, Kim, Sejin, Kim, Sundong

arXiv.org Artificial IntelligenceJun-13-2023

In the pursuit of artificial general intelligence (AGI), we tackle Abstraction and Reasoning Corpus (ARC) tasks using a novel two-pronged approach. We employ the Decision Transformer in an imitation learning paradigm to model human problem-solving, and introduce an object detection algorithm, the Push and Pull clustering method. This dual strategy enhances AI's ARC problem-solving skills and provides insights for AGI progression. Yet, our work reveals the need for advanced data collection tools, robust training datasets, and refined model structures. This study highlights potential improvements for Decision Transformers and propels future AGI research.

artificial intelligence, decision transformer, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2306.08204

Country: Asia > South Korea > Gwangju > Gwangju (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Education (0.46)
Health & Medicine (0.34)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.66)

Add feedback

DTW k-means clustering for fault detection in photovoltaic modules

Oviedo, Edgar Hernando Sepúlveda, Travé-Massuyès, Louise, Subias, Audine, Pavlov, Marko, Alonso, Corinne

arXiv.org Artificial IntelligenceJun-13-2023

The increase in the use of photovoltaic (PV) energy in the world has shown that the useful life and maintenance of a PV plant directly depend on theability to quickly detect severe faults on a PV plant. To solve this problem of detection, data based approaches have been proposed in the literature.However, these previous solutions consider only specific behavior of one or few faults. Most of these approaches can be qualified as supervised, requiring an enormous labelling effort (fault types clearly identified in each technology). In addition, most of them are validated in PV cells or one PV module. That is hardly applicable in large-scale PV plants considering their complexity. Alternatively, some unsupervised well-known approaches based on data try to detect anomalies but are not able to identify precisely the type of fault. The most performant of these methods do manage to efficiently group healthy panels and separate them from faulty panels. In that way, this article presents an unsupervised approach called DTW K-means. This approach takes advantages of both the dynamic time warping (DWT) metric and the Kmeans clustering algorithm as a data-driven approach. The results of this mixed method in a PV string are compared to diagnostic labels established by visual inspection of the panels.

artificial intelligence, current signal, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2306.08003

Country: Europe > France > Occitanie > Haute-Garonne > Toulouse (0.06)

Genre: Research Report (0.40)

Industry: Energy > Renewable > Solar (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.87)

Add feedback

UAV Trajectory and Multi-User Beamforming Optimization for Clustered Users Against Passive Eavesdropping Attacks With Unknown CSI

Abdalla, Aly Sabri, Behfarnia, Ali, Marojevic, Vuk

arXiv.org Artificial IntelligenceJun-13-2023

This paper tackles the fundamental passive eavesdropping problem in modern wireless communications in which the location and the channel state information (CSI) of the attackers are unknown. In this regard, we propose deploying an unmanned aerial vehicle (UAV) that serves as a mobile aerial relay (AR) to help ground base station (GBS) support a subset of vulnerable users. More precisely, our solution (1) clusters the single-antenna users in two groups to be either served by the GBS directly or via the AR, (2) employs optimal multi-user beamforming to the directly served users, and (3) optimizes the AR's 3D position, its multi-user beamforming matrix and transmit powers by combining closed-form solutions with machine learning techniques. Specifically, we design a plain beamforming and power optimization combined with a deep reinforcement learning (DRL) algorithm for an AR to optimize its trajectory for the security maximization of the served users. Numerical results show that the multi-user multiple input, single output (MU-MISO) system split between a GBS and an AR with optimized transmission parameters without knowledge of the eavesdropping channels achieves high secrecy capacities that scale well with increasing the number of users.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2306.06686

Country:

North America > United States > Tennessee (0.04)
North America > United States > Mississippi (0.04)
Europe > Spain (0.04)
(3 more...)

Genre: Research Report > New Finding (0.34)

Industry:

Telecommunications (1.00)
Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback