AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Balanced Order Batching with Task-Oriented Graph Clustering

Duan, Lu, Hu, Haoyuan, Wu, Zili, Li, Guozheng, Zhang, Xinhang, Gong, Yu, Xu, Yinghui

arXiv.org Artificial IntelligenceAug-19-2020

Balanced order batching problem (BOBP) arises from the process of warehouse picking in Cainiao, the largest logistics platform in China. Batching orders together in the picking process to form a single picking route, reduces travel distance. The reason for its importance is that order picking is a labor intensive process and, by using good batching methods, substantial savings can be obtained. The BOBP is a NP-hard combinational optimization problem and designing a good problem-specific heuristic under the quasi-real-time system response requirement is non-trivial. In this paper, rather than designing heuristics, we propose an end-to-end learning and optimization framework named Balanced Task-orientated Graph Clustering Network (BTOGCN) to solve the BOBP by reducing it to balanced graph clustering optimization problem. In BTOGCN, a task-oriented estimator network is introduced to guide the type-aware heterogeneous graph clustering networks to find a better clustering result related to the BOBP objective. Through comprehensive experiments on single-graph and multi-graphs, we show: 1) our balanced task-oriented graph clustering network can directly utilize the guidance of target signal and outperforms the two-stage deep embedding and deep clustering method; 2) our method obtains an average 4.57m and 0.13m picking distance ("m" is the abbreviation of the meter (the SI base unit of length)) reduction than the expert-designed algorithm on single and multi-graph set and has a good generalization ability to apply in practical scenario.

artificial intelligence, machine learning, optimization problem, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3394486.3403355

2008.09018

Country:

North America > United States > New York > New York County > New York City (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)

Genre: Research Report (0.82)

Industry: Consumer Products & Services > Travel (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Segmenting Bank Customers via RFM Model and Unsupervised Machine Learning

Aliyev, Musadig, Ahmadov, Elvin, Gadirli, Habil, Mammadova, Arzu, Alasgarov, Emin

arXiv.org Machine LearningAug-19-2020

In recent years, one of the major challenges for financial institutions is the retention of their customers using new methodologies of reliable and profitable segmentation. In the field of banking, the approach of offering all of the services to all the existing customers at the same time does not always work. However, being aware of what to sell, when to sell and whom to sell makes a huge difference in the conversion rate of the customers responding to new services and buying new products. In this paper, we used RFM technique and various clustering algorithms applied to the real customer data of one of the largest private banks of Azerbaijan.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

2008.08662

Country: Asia > Azerbaijan > Baku Economic Region > Baku (0.05)

Genre: Research Report (0.50)

Industry: Banking & Finance (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Machine Learning for Reliability Engineering and Safety Applications: Review of Current Status and Future Opportunities

Xu, Zhaoyi, Saleh, Joseph Homer

arXiv.org Machine LearningAug-18-2020

Machine learning (ML) pervades an increasing number of academic disciplines and industries. Its impact is profound, and several fields have been fundamentally altered by it, autonomy and computer vision for example; reliability engineering and safety will undoubtedly follow suit. There is already a large but fragmented literature on ML for reliability and safety applications, and it can be overwhelming to navigate and integrate into a coherent whole. In this work, we facilitate this task by providing a synthesis of, and a roadmap to this ever-expanding analytical landscape and highlighting its major landmarks and pathways. We first provide an overview of the different ML categories and sub-categories or tasks, and we note several of the corresponding models and algorithms. We then look back and review the use of ML in reliability and safety applications. We examine several publications in each category/sub-category, and we include a short discussion on the use of Deep Learning to highlight its growing popularity and distinctive advantages. Finally, we look ahead and outline several promising future opportunities for leveraging ML in service of advancing reliability and safety considerations. Overall, we argue that ML is capable of providing novel insights and opportunities to solve important challenges in reliability and safety applications. It is also capable of teasing out more accurate insights from accident datasets than with traditional analysis tools, and this in turn can lead to better informed decision-making and more effective accident prevention.

application, renewable energy, upstream oil & gas, (19 more...)

arXiv.org Machine Learning

2008.08221

Country:

Asia (0.28)
North America > United States > Wisconsin (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Transportation (1.00)
Information Technology (1.00)
Health & Medicine > Consumer Health (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Differentially Private Clustering: Tight Approximation Ratios

Ghazi, Badih, Kumar, Ravi, Manurangsi, Pasin

arXiv.org Machine LearningAug-18-2020

We study the task of differentially private clustering. For several basic clustering problems, including Euclidean DensestBall, 1-Cluster, k-means, and k-median, we give efficient differentially private algorithms that achieve essentially the same approximation ratios as those that can be obtained by any non-private algorithm, while incurring only small additive errors. This improves upon existing efficient algorithms that only achieve some large constant approximation factors. Our results also imply an improved algorithm for the Sample and Aggregate privacy framework. Furthermore, we show that one of the tools used in our 1-Cluster algorithm can be employed to get a faster quantum algorithm for ClosestPair in a moderate number of dimensions.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

2008.08007

Country:

Asia > Afghanistan > Parwan Province > Charikar (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
North America > United States > California > Santa Clara County > Mountain View (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology > Security & Privacy (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science (0.93)
Information Technology > Security & Privacy (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.66)

Add feedback

Gradient-based Learning Methods Extended to Smooth Manifolds Applied to Automated Clustering

Koudounas, Alkis, Fiori, Simone

Journal of Artificial Intelligence ResearchAug-17-2020

Grassmann manifold based sparse spectral clustering is a classification technique that consists in learning a latent representation of data, formed by a subspace basis, which is sparse. In order to learn a latent representation, spectral clustering is formulated in terms of a loss minimization problem over a smooth manifold known as Grassmannian. Such minimization problem cannot be tackled by one of traditional gradient-based learning algorithms, which are only suitable to perform optimization in absence of constraints among parameters. It is, therefore, necessary to develop specific optimization/learning algorithms that are able to look for a local minimum of a loss function under smooth constraints in an efficient way. Such need calls for manifold optimization methods. In this paper, we extend classical gradient-based learning algorithms on at parameter spaces (from classical gradient descent to adaptive momentum) to curved spaces (smooth manifolds) by means of tools from manifold calculus. We compare clustering performances of these methods and known methods from the scientific literature. The obtained results confirm that the proposed learning algorithms prove lighter in computational complexity than existing ones without detriment in clustering efficacy.

algorithm, artificial intelligence, machine learning, (18 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1.12192

AI Access Foundation

12192

Journal of Artificial Intelligence Research

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Italy > Piedmont > Turin Province > Turin (0.04)
Asia > Middle East > Jordan (0.04)
(2 more...)

Genre: Research Report > New Finding (0.48)

Industry:

Education (0.67)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Exploring the weather impact on bike sharing usage through a clustering analysis

Quach, Jessica, Malekian, Reza

arXiv.org Machine LearningAug-17-2020

Bike sharing systems (BSS) have been a popular traveling service for years and are used worldwide. It is attractive for cities and users who wants to promote healthier lifestyles; to reduce air pollution and greenhouse gas emission as well as improve traffic. One major challenge to docked bike sharing system is redistributing bikes and balancing dock stations. Some studies propose models that can help forecasting bike usage; strategies for rebalancing bike distribution; establish patterns or how to identify patterns. Other studies propose to extend the approach by including weather data. This study aims to extend upon these proposals and opportunities to explore how and in what magnitude weather impacts bike usage. Bike usage data and weather data are gathered for the city of Washington D.C. and are analyzed using k-means clustering algorithm. K-means managed to identify three clusters that correspond to bike usage depending on weather conditions. The results show that the weather impact on bike usage was noticeable between clusters. It showed that temperature followed by precipitation weighted the most, out of five weather variables.

artificial intelligence, machine learning, usage, (15 more...)

arXiv.org Machine Learning

2008.07249

Country:

North America > United States > District of Columbia > Washington (0.34)
Europe > Sweden > Skåne County > Malmö (0.05)
North America > Canada > Ontario > Toronto (0.04)
(7 more...)

Genre: Research Report > New Finding (0.34)

Industry:

Law (0.68)
Government > Regional Government > North America Government > United States Government (0.46)
Transportation (0.46)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.87)

Add feedback

DBSCAN Clustering Algorithm in Machine Learning - KDnuggets

#artificialintelligenceAug-14-2020, 18:02:57 GMT

In 2014, the DBSCAN algorithm was awarded the test of time award (an award given to algorithms which have received substantial attention in theory and practice) at the leading data mining conference, ACM SIGKDD. Clustering analysis is an unsupervised learning method that separates the data points into several specific bunches or groups, such that the data points in the same groups have similar properties and data points in different groups have different properties in some sense. It comprises of many different methods based on different distance measures. Centrally, all clustering methods use the same approach i.e. first we calculate similarities and then we use it to cluster the data points into groups or batches. Here we will focus on the Density-based spatial clustering of applications with noise (DBSCAN) clustering method. If you are unfamiliar with the clustering algorithms, I advise you to read the Introduction to Image Segmentation with K-Means clustering.

algorithm, artificial intelligence, machine learning, (13 more...)

#artificialintelligence

Genre: Personal > Honors (0.56)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Segmentation analysis and the recovery of queuing parameters via the Wasserstein distance: a study of administrative data for patients with chronic obstructive pulmonary disease

Wilde, Henry, Knight, Vincent, Gillard, Jonathan, Smith, Kendal

arXiv.org Machine LearningAug-14-2020

However, many such methods rely heavily on detailed data about both the healthcare system and its population which may limit research where sophisticated data pipelines are not yet in place. This work demonstrates a method of overcoming this, using routinely gathered, administrative hospital data to build a clustering that feeds into a multi-class queuing model, allowing for better understanding of the healthcare population and the system with which they interact. Specifically, this work examines records of patient spells from the National Health Service (NHS) Wales Cwm Taf Morgannwg University Health Board (UHB) presenting chronic obstructive pulmonary disease (COPD). COPD is a condition of particular interest to population health research, and to Cwm Taf Morgannwg UHB, as it is known to often present as a comorbidity in patients [15], increasing the complexity of treatments among those with the condition. Moreover, an internal report by NHS Wales found the Cwm Taf Morgannwg UHB had the highest prevalence of the condition across all the Welsh health boards. This work draws upon several overlapping sources within mathematical research, and this work contributes to the literature in three ways: to theoretical queuing research by the estimation of missing queuing parameters with the Wasserstein distance; to operational healthcare research through the weaving together of the combination of methods used in this work despite data constraints; and to public health research by adding to the growing body of mathematical and operational work around a condition that is vital to understand operationally, socially and medically. The remainder of the paper is structured as follows: Section 1 provides a literature review, and an overview of the dataset and its clustering; Section 2 describes the queuing model used and the estimation of its parameters; Section 3 presents several what-if scenarios with insight provided by the model parameterisation and the clustering; Section 4 concludes the paper. Although the data is confidential and may not be published, a synthetic analogue has been archived [43] along with all the source code used in this paper [40].

artificial intelligence, machine learning, system time, (17 more...)

arXiv.org Machine Learning

2008.04295

Country:

Europe > United Kingdom > Wales > Glamorganshire (0.65)
North America > Canada (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (1.00)
Health & Medicine > Health Care Providers & Services (1.00)
Health & Medicine > Government Relations & Public Policy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Fast and Eager k-Medoids Clustering: O(k) Runtime Improvement of the PAM, CLARA, and CLARANS Algorithms

Schubert, Erich, Rousseeuw, Peter J.

arXiv.org Artificial IntelligenceAug-12-2020

Clustering non-Euclidean data is difficult, and one of the most used algorithms besides hierarchical clustering is the popular algorithm Partitioning Around Medoids (PAM), also simply referred to as k-medoids clustering. In Euclidean geometry the mean-as used in k-means-is a good estimator for the cluster center, but this does not exist for arbitrary dissimilarities. PAM uses the medoid instead, the object with the smallest dissimilarity to all others in the cluster. This notion of centrality can be used with any (dis-)similarity, and thus is of high relevance to many domains and applications. A key issue with PAM is its high run time cost. We propose modifications to the PAM algorithm that achieve an O(k)-fold speedup in the second ("SWAP") phase of the algorithm, but will still find the same results as the original PAM algorithm. If we relax the choice of swaps performed (while retaining comparable quality), we can further accelerate the algorithm by eagerly performing additional swaps in each iteration. With the substantially faster SWAP, we can now explore faster initialization strategies, because (i) the classic ("BUILD") initialization now becomes the bottleneck, and (ii) our swap is fast enough to compensate for worse starting conditions. We also show how the CLARA and CLARANS algorithms benefit from the proposed modifications. While we do not study the parallelization of our approach in this work, it can easily be combined with earlier approaches to use PAM and CLARA on big data (some of which use PAM as a subroutine, hence can immediately benefit from these improvements), where the performance with high k becomes increasingly important. In experiments on real data with k=100,200, we observed a 458x respectively 1191x speedup compared to the original PAM SWAP algorithm, making PAM applicable to larger data sets, and in particular to higher k.

artificial intelligence, data mining, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2008.05171

Country:

Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)
North America > United States (0.04)
Europe > Germany > North Rhine-Westphalia > Arnsberg Region > Dortmund (0.04)
Asia (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

An Entropy Equation for Energy

Greer, Kieran

arXiv.org Artificial IntelligenceAug-11-2020

This paper describes an entropy equation, but one that should be used for measuring energy and not information. In relation to the human brain therefore, both of these quantities can be used to represent the stored information. The human brain makes use of energy efficiency to form its structures, which is likely to be linked to the neuron wiring. This energy efficiency can also be used as the basis for a clustering algorithm, which is described in a different paper. This paper is more of a discussion about global properties, where the rules used for the clustering algorithm can also create the entropy equation E = (mean * variance). This states that work is done through the energy released by the 'change' in entropy. The equation is so simplistic and generic that it can offer arguments for completely different domains, where the journey ends with a discussion about global energy properties in physics and beyond. A comparison with Einstein's relativity equation is made and also the audacious suggestion that a black hole has zero-energy inside.

artificial intelligence, equation, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2007.03286

Country:

North America > United States > New York (0.04)
Europe > Switzerland (0.04)

Genre: Research Report (0.65)

Industry: Energy (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.55)

Add feedback