AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Robust Metric Learning based on the Rescaled Hinge Loss

Al-Obaidi, Sumia Abdulhussien Razooqi, Zabihzadeh, Davood, Rasheed, Ali Salim, Monsefi, Reza

arXiv.org Machine LearningApr-26-2019

Distance/Similarity learning is a fundamental problem in machine learning. For example, kNN classifier or clustering methods are based on a distance/similarity measure. Metric learning algorithms enhance the efficiency of these methods by learning an optimal distance function from data. Most metric learning methods need training information in the form of pair or triplet sets. Nowadays, this training information often is obtained from the Internet via crowdsourcing methods. Therefore, this information may contain label noise or outliers leading to the poor performance of the learned metric. It is even possible that the learned metric functions perform worse than the general metrics such as Euclidean distance. To address this challenge, this paper presents a new robust metric learning method based on the Rescaled Hinge loss. This loss function is a general case of the popular Hinge loss and initially introduced in (Xu et al. 2017) to develop a new robust SVM algorithm. In this paper, we formulate the metric learning problem using the Rescaled Hinge loss function and then develop an efficient algorithm based on HQ (Half-Quadratic) to solve the problem. Experimental results on a variety of both real and synthetic datasets confirm that our new robust algorithm considerably outperforms state-of-the-art metric learning methods in the presence of label noise and outliers.

algorithm, artificial intelligence, machine learning, (13 more...)

arXiv.org Machine Learning

1904.11711

Country:

North America > United States (0.69)
Asia > Middle East > Iran (0.14)
Asia > Middle East > Iraq (0.14)

Genre: Research Report (1.00)

Industry: Education (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)

Add feedback

The Mutex Watershed and its Objective: Efficient, Parameter-Free Image Partitioning

Wolf, Steffen, Bailoni, Alberto, Pape, Constantin, Rahaman, Nasim, Kreshuk, Anna, Köthe, Ullrich, Hamprecht, Fred A.

arXiv.org Machine LearningApr-25-2019

Image partitioning, or segmentation without semantics, is the task of decomposing an image into distinct segments, or equivalently to detect closed contours. Most prior work either requires seeds, one per segment; or a threshold; or formulates the task as multicut / correlation clustering, an NP-hard problem. Here, we propose a greedy algorithm for signed graph partitioning, the "Mutex Watershed". Unlike seeded watershed, the algorithm can accommodate not only attractive but also repulsive cues, allowing it to find a previously unspecified number of segments without the need for explicit seeds or a tunable threshold. We also prove that this simple algorithm solves to global optimality an objective function that is intimately related to the multicut / correlation clustering integer linear programming formulation. The algorithm is deterministic, very simple to implement, and has empirically linearithmic complexity. When presented with short-range attractive and long-range repulsive cues from a deep neural network, the Mutex Watershed gives the best results currently known for the competitive ISBI 2012 EM segmentation benchmark.

artificial intelligence, machine learning, segmentation, (18 more...)

arXiv.org Machine Learning

1904.12654

Country: Europe > Germany (0.28)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Discrete Optimal Graph Clustering

Han, Yudong, Zhu, Lei, Cheng, Zhiyong, Li, Jingjing, Liu, Xiaobai

arXiv.org Machine LearningApr-25-2019

Graph based clustering is one of the major clustering methods. Most of it work in three separate steps: similarity graph construction, clustering label relaxing and label discretization with k-means. Such common practice has three disadvantages: 1) the predefined similarity graph is often fixed and may not be optimal for the subsequent clustering. 2) the relaxing process of cluster labels may cause significant information loss. 3) label discretization may deviate from the real clustering result since k-means is sensitive to the initialization of cluster centroids. To tackle these problems, in this paper, we propose an effective discrete optimal graph clustering (DOGC) framework. A structured similarity graph that is theoretically optimal for clustering performance is adaptively learned with a guidance of reasonable rank constraint. Besides, to avoid the information loss, we explicitly enforce a discrete transformation on the intermediate continuous label, which derives a tractable optimization problem with discrete solution. Further, to compensate the unreliability of the learned labels and enhance the clustering accuracy, we design an adaptive robust module that learns prediction function for the unseen data based on the learned discrete cluster labels. Finally, an iterative optimization strategy guaranteed with convergence is developed to directly solve the clustering results. Extensive experiments conducted on both real and synthetic datasets demonstrate the superiority of our proposed methods compared with several state-of-the-art clustering approaches.

artificial intelligence, graph, machine learning, (15 more...)

arXiv.org Machine Learning

1904.11266

Country:

Asia (0.69)
North America > United States (0.67)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Machine Learning Tips and Tricks for Power Line Communications

Tonello, Andrea M., Letizia, Nunzio A., Righini, Davide, Marcuzzi, Francesco

arXiv.org Machine LearningApr-24-2019

A great deal of attention has been recently given to Machine Learning (ML) techniques in many different application fields. This paper provides a vision of what ML can do in Power Line Communications (PLC). We firstly and briefly describe classical formulations of ML, and distinguish deterministic problems from statistical problems with relevance to communications. We then discuss ML applications in PLC for each layer, namely, for characterization and modeling, for physical layer algorithms, for media access control and networking algorithms. Finally, other applications of PLC that can benefit from the usage of ML, as grid diagnostics, are analyzed. Illustrative numerical examples are reported to serve the purpose of validating the ideas and motivate future research endeavors in this stimulating signal/data processing field.

application, artificial intelligence, machine learning, (14 more...)

arXiv.org Machine Learning

1904.11949

Country: North America > United States (0.93)

Genre: Research Report (1.00)

Industry:

Telecommunications (1.00)
Information Technology (1.00)
Energy > Power Industry (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)
(3 more...)

Add feedback

A Unified Framework for Structured Graph Learning via Spectral Constraints

Kumar, Sandeep, Ying, Jiaxi, Cardoso, José Vinícius de M., Palomar, Daniel

arXiv.org Machine LearningApr-22-2019

Graph learning from data represents a canonical problem that has received substantial attention in the literature. However, insufficient work has been done in incorporating prior structural knowledge onto the learning of underlying graphical models from data. Learning a graph with a specific structure is essential for interpretability and identification of the relationships among data. Useful structured graphs include the multi-component graph, bipartite graph, connected graph, sparse graph, and regular graph. In general, structured graph learning is an NP-hard combinatorial problem, therefore, designing a general tractable optimization method is extremely challenging. In this paper, we introduce a unified graph learning framework lying at the integration of Gaussian graphical models and spectral graph theory. To impose a particular structure on a graph, we first show how to formulate the combinatorial constraints as an analytical property of the graph matrix. Then we develop an optimization framework that leverages graph learning with specific structures via spectral constraints on graph matrices. The proposed algorithms are provably convergent, computationally efficient, and practically amenable for numerous graph-based tasks. Extensive numerical experiments with both synthetic and real data sets illustrate the effectiveness of the proposed algorithms. The code for all the simulations is made available as an open source repository.

artificial intelligence, machine learning, optimization problem, (19 more...)

arXiv.org Machine Learning

1904.09792

Country:

Asia > China > Hong Kong (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
(2 more...)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

TiK-means: $K$-means clustering for skewed groups

Berry, Nicholas S., Maitra, Ranjan

arXiv.org Machine LearningApr-21-2019

The $K$-means algorithm is extended to allow for partitioning of skewed groups. Our algorithm is called TiK-Means and contributes a $K$-means type algorithm that assigns observations to groups while estimating their skewness-transformation parameters. The resulting groups and transformation reveal general-structured clusters that can be explained by inverting the estimated transformation. Further, a modification of the jump statistic chooses the number of groups. Our algorithm is evaluated on simulated and real-life datasets and then applied to a long-standing astronomical dispute regarding the distinct kinds of gamma ray bursts.

algorithm, dataset, tik-means, (11 more...)

arXiv.org Machine Learning

doi: 10.1002/sam11416

1904.09609

Country:

Europe > Austria > Vienna (0.14)
Europe > Italy > Sardinia (0.05)
Europe > Italy > Liguria (0.05)
(13 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Food & Agriculture (0.93)
Government > Regional Government > North America Government > United States Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.95)
Information Technology > Data Science > Data Mining (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

The Why's and how's of Machine Learning

#artificialintelligenceApr-19-2019, 21:00:05 GMT

The knowledge is the output of learning through the inseparable combination of theory and practice. It's what remains in one's experience from all the data which got shaped into what we call information. This process can be noticed throughout the different stages of our lives and it's never limited to the academic journey. What I'm aiming to express is that machine learning is nothing but a human logic tailored for more complex problems that surely require more computational capabilities. The last quote represents the nature knowledge acquiring process which, as you may notice, is similar to CRISP-DM Methodology which I detailed in a previous article and which is essential to succeed in your data mining project. To define Machine learning, its is a set of algorithms that are included in the many operations like the Data Mining process and which help you transform your raw data into knowledge, the layer that hides under the obvious information.

algorithm, artificial intelligence, machine learning, (17 more...)

#artificialintelligence

Genre: Research Report (0.34)

Industry: Materials > Metals & Mining (0.89)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.51)

Add feedback

Top 10 Machine Learning Algorithms for Data Science

#artificialintelligenceApr-19-2019, 20:55:46 GMT

For the majority of newcomers, machine learning algorithms may seem too boring and complicated subject to be mastered. Well, to some extent, this is true. In most cases, you stumble upon a few-page description for each algorithm and yes, it's hard to find time and energy to deal with each and every detail. However, if you truly, madly, deeply want to be an ML-expert, you have to brush up your knowledge regarding it and there is no other way to be. But relax, today I will try to simplify this task and explain core principles of 10 most common algorithms in simple words (each includes a brief description, guides, and useful links).

algorithm, artificial intelligence, machine learning, (16 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)

Add feedback

Identifying Points of Interest and Similar Individuals from Raw GPS Data

Andrade, Thiago, Gama, João

arXiv.org Machine LearningApr-19-2019

Smartphones and portable devices have become ubiquitous and part of everyone's life. Due to the fact of its portability, these devices are perfect to record individuals' traces and life-logging generating vast amounts of data at low costs. These data is emerging as a new source for studies in human mobility patterns raising the number of research projects and techniques aiming to analyze and retrieve useful information from it. The aim of this paper is to explore GPS raw data from different individuals in a community and apply data mining algorithms to identify meaningful places in a region and describe user's profiles and its similarities. We evaluate the proposed method with a real-world dataset. The experimental results show that the steps performed to identify points of interest (POIs) and further the similarity between the users are quite satisfactory serving as a supplement for urban planning and social networks.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Machine Learning

1904.09357

Country:

Europe > Portugal > Porto > Porto (0.04)
Asia > China > Beijing > Beijing (0.04)
North America > United States > Hawaii (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology > Services (0.35)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Communications > Social Media (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.71)

Add feedback

Optimal initialization of K-means using Particle Swarm Optimization

Pednekar, Ashutosh Mahesh

arXiv.org Machine LearningApr-19-2019

This paper proposes the use of an optimization algorithm, namely PSO to decide the initial centroids in K-means, to eventually get better accuracy. The vectorized notation of the optimal centroids can be thought of as entities in an optimization space, where the accuracy of K-means over a random subset of the data could act as a fitness measure. The resultant optimal vector can be used as the initial centroids for K-means.

artificial intelligence, evolutionary algorithm, machine learning, (14 more...)

arXiv.org Machine Learning

1904.09098

Country: Asia > India (0.15)

Genre: Research Report (0.40)

Industry: Education (0.31)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.50)

Add feedback