AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

A Case-Based Reasoning and Clustering Framework for the Development of Intelligent Agents in Simulation Systems

Lucca, Marcos R. B. (Federal University of Santa Maria) | Junior, Alcides G. Lopes ( Federal University of Rio Grande do Sul ) | Freitas, Edison P. ( Federal University of Rio Grande do Sul ) | Silva, Luis A. L. ( Federal University of Santa Maria )

AAAI ConferencesMay-17-2018

Artificial Intelligence (AI) techniques are essential to the modeling of realistic behaviors for agents in simulation systems. Although Case-Based Reasoning (CBR) and Clustering techniques are being explored in the implementation of such agents in computer games, these techniques are still under-used in the implementation of simulation systems. This work approaches this gap by proposing a new CBR and clustering framework in which clustering algorithms and clustering evaluation techniques are explored in both the construction of adjusted similarity functions and the organization of sub-case bases, which are indexing components to the efficient retrieval of relevant cases from case bases so as to support the solution of new simulation problems. To evaluate this framework, a case-based algorithm was implemented to simulate the choice of military supplies to be used in artillery battery missions in virtual tactical simulations.

artificial intelligence, case-based reasoning and clustering framework, machine learning, (2 more...)

AAAI Conferences

The Thirty-First International Flairs Conference

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.53)

Add feedback

WSCAN-TFP: Weighted SCAN Clustering Algorithm for Team Formation Problem in Social Network

Selvarajah, Kalyani (University of Windsor) | Bhullar, Amangel (University of Windsor) | Kobti, Ziad (University of Windsor) | Kargar, Mehdi (University of Windsor)

AAAI ConferencesMay-17-2018

In this paper, we provide a novel approach for the Team Formation Problem (TFP) in social networks. With a given social network of experts and communication cost between them, we address the problem of finding a team with a set of required skills necessary to complete a project. An expert of this network is treated as a node and possesses a given set of skills. The basic idea of the Structural Clustering Algorithm for Networks (SCAN) is to detect clusters, hubs, and outliers in networks. To employ SCAN on TFP, we first find the pool of experts with required skills. Then we search for highly connected (core) expert among all experts network. We expand the cluster from core to neighborhood nodes, and it goes from densely connected to loosely connected nodes within a threshold range of communication cost. We solve this TFP by identifying experts while minimizing communication cost for the project with specific skills. We then measure the communication cost with the sum of a distance function. An enhanced variant of SCAN is the weighted structural clustering algorithm (WSCAN) which is implemented in this paper to solve the TFP with minimum communication cost. Our result with WSCAN performed approximately equal to Greedy algorithms while slightly worse than other Genetic, Cultural and Exact Algorithms. The run-time of WSCAN however, was better compared to the others.

data mining, machine learning, weighted scan clustering algorithm, (5 more...)

AAAI Conferences

The Thirty-First International Flairs Conference

Genre: Research Report (0.53)

Industry: Information Technology > Services (0.80)

Technology:

Information Technology > Data Science > Data Mining (0.80)
Information Technology > Communications > Social Media (0.80)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.80)

Add feedback

Convex Programming Based Spectral Clustering

Mizutani, Tomohiko

arXiv.org Machine LearningMay-11-2018

Clustering is a fundamental task in data analysis, and spectral clustering has been recognized as a promising approach to it. Given a graph describing the relationship between data, spectral clustering explores the underlying cluster structure in two stages. The first stage embeds the nodes of the graph into real space, and the second stage groups the embedded nodes into several clusters. The use of the $k$-means method in the grouping stage is currently standard practice. We present a spectral clustering algorithm that uses convex programming in the grouping stage, and study how well it works. The concept behind the algorithm design lies in the following observation. The nodes with the largest degree in each cluster may be found by computing an enclosing ellipsoid for embedded nodes in real space, and the clusters may be identified by using those nodes. We show that the observations are valid, and the algorithm returns clusters to provide the conductance of graph, if the gap assumption, introduced by Peng el al. at COLT 2015, is satisfied. We also give an experimental assessment of the algorithm's performance.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

1805.04246

Country: Asia (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)

Add feedback

Survey and cross-benchmark comparison of remaining time prediction methods in business process monitoring

Verenich, Ilya, Dumas, Marlon, La Rosa, Marcello, Maggi, Fabrizio, Teinemaa, Irene

arXiv.org Artificial IntelligenceMay-10-2018

Predictive business process monitoring methods exploit historical process execution logs to generate predictions about running instances (called cases) of a business process, such as the prediction of the outcome, next activity or remaining cycle time of a given process case. These insights could be used to support operational managers in taking remedial actions as business processes unfold, e.g. shifting resources from one case onto another to ensure this latter is completed on time. A number of methods to tackle the remaining cycle time prediction problem have been proposed in the literature. However, due to differences in their experimental setup, choice of datasets, evaluation measures and baselines, the relative merits of each method remain unclear. This article presents a systematic literature review and taxonomy of methods for remaining time prediction in the context of business processes, as well as a cross-benchmark comparison of 16 such methods based on 16 real-life datasets originating from different industry domains.

data mining, machine learning, prediction, (20 more...)

arXiv.org Artificial Intelligence

1805.02896

Country:

Europe (1.00)
North America > United States (0.92)
North America > Canada (0.67)
Oceania > Australia > Queensland (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Workflow (0.92)

Industry:

Health & Medicine (0.68)
Materials > Metals & Mining (0.46)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
(4 more...)

Add feedback

An Unsupervised Clustering-Based Short-Term Solar Forecasting Methodology Using Multi-Model Machine Learning Blending

Feng, Cong, Cui, Mingjian, Hodge, Bri-Mathias, Lu, Siyuan, Hamann, Hendrik F., Zhang, Jie

arXiv.org Machine LearningMay-10-2018

Abstract--Solar forecasting accuracy is affected by weather conditions, and weather awareness forecasting models are expected to improve the performance. However, it may not be available and reliable to classify different forecasting tasks by using only meteorological weather categorization. In this paper, an unsupervised clustering-based (UC-based) solar forecasting methodology is developed for short-term (1-hour-ahead) global horizontal irradiance (GHI) forecasting. This methodology consists of three parts: GHI time series unsupervised clustering, pattern recognition, and UC-based forecasting. The daily GHI time series is first clustered by an Optimized Cross-validated ClUsteRing (OCCUR) method, which determines the optimal number of clusters and best clustering results. Then, support vector machine pattern recognition (SVM-PR) is adopted to recognize the category of a certain day using the first few hours data in the forecasting stage. GHI forecasts are generated by the most suitable models in different clusters, which are built by a two-layer Machine learning based Multi-Model (M3) forecasting framework. The developed UC-based methodology is validated by using 1-year of data with six solar features. Numerical results show that (i) UC-based models outperform non-UC (all-in-one) models with the same M3 architecture by approximately 20%; (ii) M3-based models also outperform the single-algorithm machine learning (SAML) models by approximately 20%.

artificial intelligence, forecasting, machine learning, (14 more...)

arXiv.org Machine Learning

1805.04193

Country: North America > United States (1.00)

Genre: Research Report (0.84)

Industry:

Energy > Renewable > Solar (1.00)
Energy > Power Industry (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

The 10 Mining Techniques Data Scientists Need For Their Own Toolbox

@machinelearnbotMay-8-2018, 14:06:04 GMT

At their core, data scientists have a math and statistics background. Out of this math background, they're creating advanced analytics. Just like their software engineering counterparts, data scientists will have to interact with the business side. This includes understanding the domain enough to make insights. Data scientists are often tasked with analyzing data to help the business, and this requires a level of business acumen. Finally, their results need to be given to the business in an understandable fashion. This requires the ability to verbally and visually communicate complex results and observations in a way that the business can understand and act on them. Thus, it'll be extremely valuable for any aspiring data scientists to learn data mining -- the process where one structures the raw data and formulate or recognize the various patterns in the data through the mathematical and computational algorithms. This helps generate new information and unlock various insights. Here is a simple list of reasons on why you should study Data Mining?

artificial intelligence, data mining, machine learning, (22 more...)

@machinelearnbot

Industry:

Marketing (1.00)
Information Technology > Services (1.00)
Media (0.68)
Leisure & Entertainment (0.68)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Finding Frequent Entities in Continuous Data

Alet, Ferran, Chitnis, Rohan, Kaelbling, Leslie P., Lozano-Perez, Tomas

arXiv.org Machine LearningMay-8-2018

In many applications that involve processing high-dimensional data, it is important to identify a small set of entities that account for a significant fraction of detections. Rather than formalize this as a clustering problem, in which all detections must be grouped into hard or soft categories, we formalize it as an instance of the frequent items or heavy hitters problem, which finds groups of tightly clustered objects that have a high density in the feature space. We show that the heavy hitters formulation generates solutions that are more accurate and effective than the clustering formulation. In addition, we present a novel online algorithm for heavy hitters, called HAC, which addresses problems in continuous space, and demonstrate its effectiveness on real video and household domains.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

1805.02874

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (0.92)
Information Technology > Artificial Intelligence > Natural Language (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.66)

Add feedback

Semi-Orthogonal Non-Negative Matrix Factorization

Li, Jack Yutong, Zhu, Ruoqing, Qu, Annie, Ye, Han, Sun, Zhankun

arXiv.org Machine LearningMay-6-2018

Non-negative Matrix Factorization (NMF) is a popular clustering and dimension reduction method by decomposing a non-negative matrix into the product of two lower dimension matrices composed of basis vectors. In this paper, we propose a semi-orthogonal NMF method that enforces one of the matrices to be orthogonal with mixed signs, thereby guarantees the rank of the factorization. Our method preserves strict orthogonality by implementing the Cayley transformation to force the solution path to be exactly on the Stiefel manifold, as opposed to the approximated orthogonality solutions in existing literature. We apply a line search update scheme along with an SVD-based initialization which produces a rapid convergence of the algorithm compared to other existing approaches. In addition, we present formulations of our method to incorporate both continuous and binary design matrices. Through various simulation studies, we show that our model has an advantage over other NMF variations regarding the accuracy of the factorization, rate of convergence, and the degree of orthogonality while being computationally competitive. We also apply our method to a text-mining data on classifying triage notes, and show the effectiveness of our model in reducing classification error compared to the conventional bag-of-words model and other alternative matrix factorization approaches.

factorization, matrix, matrix factorization, (15 more...)

arXiv.org Machine Learning

1805.02306

Country:

North America > United States > Illinois (0.04)
North America > Canada > Alberta (0.04)
Asia > Vietnam > Khánh Hòa Province > Nha Trang (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.67)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Branching embedding: A heuristic dimensionality reduction algorithm based on hierarchical clustering

Oku, Makito

arXiv.org Machine LearningMay-6-2018

This paper proposes a new dimensionality reduction algorithm named branching embedding (BE). It converts a dendrogram to a two-dimensional scatter plot, and visualizes the inherent structures of the original high-dimensional data. Since the conversion part is not computationally demanding, the BE algorithm would be beneficial for the case where hierarchical clustering is already performed. Numerical experiments revealed that the outputs of the algorithm moderately preserve the original hierarchical structures.

artificial intelligence, dendrogram, machine learning, (14 more...)

arXiv.org Machine Learning

1805.02161

Country: Asia > Japan (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.63)

Add feedback

Clustering With Pairwise Relationships: A Generative Approach

Yu, Yen-Yun, Elhabian, Shireen Y., Whitaker, Ross T.

arXiv.org Machine LearningMay-6-2018

Semi-supervised learning (SSL) has become important in current data analysis applications, where the amount of unlabeled data is growing exponentially and user input remains limited by logistics and expense. Constrained clustering, as a subclass of SSL, makes use of user input in the form of relationships between data points (e.g., pairs of data points belonging to the same class or different classes) and can remarkably improve the performance of unsupervised clustering in order to reflect user-defined knowledge of the relationships between particular data points. Existing algorithms incorporate such user input, heuristically, as either hard constraints or soft penalties, which are separate from any generative or statistical aspect of the clustering model; this results in formulations that are suboptimal and not sufficiently general. In this paper, we propose a principled, generative approach to probabilistically model, without ad hoc penalties, the joint distribution given by user-defined pairwise relations. The proposed model accounts for general underlying distributions without assuming a specific form and relies on expectation-maximization for model fitting. For distributions in a standard form, the proposed approach results in a closed-form solution for updated parameters.

artificial intelligence, machine learning, relation, (17 more...)

arXiv.org Machine Learning

1805.02285

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)

Add feedback