AITopics

Løkse, Sigurd, Bianchi, Filippo Maria, Salberg, Arnt-Børre, Jenssen, Robert

Spectral Clustering using PCKID - A Probabilistic Cluster Kernel for Incomplete Data

arXiv.org Machine LearningFeb-23-2017

In this paper, we propose PCKID, a novel, robust, kernel function for spectral clustering, specifically designed to handle incomplete data. By combining posterior distributions of Gaussian Mixture Models for incomplete data on different scales, we are able to learn a kernel for incomplete data that does not depend on any critical hyperparameters, unlike the commonly used RBF kernel. To evaluate our method, we perform experiments on two real datasets. PCKID outperforms the baseline methods for all fractions of missing values and in some cases outperforms the baseline methods with up to 25 percentage points.

data mining, machine learning, spectral, (15 more...)

arXiv.org Machine Learning

1702.0719

Genre: Research Report (0.64)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

#artificialintelligenceFeb-22-2017, 23:46:12 GMT

K-Means & Other Clustering Algorithms: A Quick Intro with Python

Clustering is the grouping of objects together so that objects belonging in the same group (cluster) are more similar to each other than those in other groups (clusters). In this intro cluster analysis tutorial, we'll check out a few algorithms in Python so you can get a basic understanding of the fundamentals of clustering on a real dataset. For the clustering problem, we will use the famous Zachary's Karate Club dataset. The story behind the data set is quite simple: There was a Karate Club that had an administrator "John A" and an instructor "Mr. Then a conflict arose between them, causing the students (Nodes) to split into two groups.

algorithm, artificial intelligence, machine learning, (12 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Ignatov, Dmitry I., Watson, Bruce W.

Towards a Unified Taxonomy of Biclustering Methods

arXiv.org Machine LearningFeb-17-2017

Being an unsupervised machine learning and data mining technique, biclustering and its multimodal extensions are becoming popular tools for analysing object-attribute data in different domains. Apart from conventional clustering techniques, biclustering is searching for homogeneous groups of objects while keeping their common description, e.g., in binary setting, their shared attributes. In bioinformatics, biclustering is used to find genes, which are active in a subset of situations, thus being candidates for biomarkers. However, the authors of those biclustering techniques that are popular in gene expression analysis, may overlook the existing methods. For instance, BiMax algorithm is aimed at finding biclusters, which are well-known for decades as formal concepts. Moreover, even if bioinformatics classify the biclustering methods according to reasonable domain-driven criteria, their classification taxonomies may be different from survey to survey and not full as well. So, in this paper we propose to use concept lattices as a tool for taxonomy building (in the biclustering domain) and attribute exploration as means for cross-domain taxonomy completion.

machine learning, object-oriented architecture, pattern recognition, (19 more...)

arXiv.org Machine Learning

1702.05376

Country:

Europe (1.00)
North America > United States (0.68)

Genre: Research Report (0.40)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

Yang, Jie, Shebalov, Sergey, Klabjan, Diego

Semi-supervised Learning for Discrete Choice Models

arXiv.org Machine LearningFeb-16-2017

We introduce a semi-supervised discrete choice model to calibrate discrete choice models when relatively few requests have both choice sets and stated preferences but the majority only have the choice sets. Two classic semi-supervised learning algorithms, the expectation maximization algorithm and the cluster-and-label algorithm, have been adapted to our choice modeling problem setting. We also develop two new algorithms based on the cluster-and-label algorithm. The new algorithms use the Bayesian Information Criterion to evaluate a clustering setting to automatically adjust the number of clusters. Two computational studies including a hotel booking case and a large-scale airline itinerary shopping case are presented to evaluate the prediction accuracy and computational effort of the proposed algorithms. Algorithmic recommendations are rendered under various scenarios.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Machine Learning

1702.05137

Country: North America > United States (1.00)

Genre: Research Report (1.00)

Industry:

Consumer Products & Services > Travel (1.00)
Transportation > Passenger (0.95)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.86)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.49)

Simultaneous Clustering and Ensemble

Tao, Zhiqiang (Northeastern University) | Liu, Hongfu (Northeastern University) | Fu, Yun (Northeastern University)

Ensemble Clustering (EC) has gained a great deal of attention throughout the fields of data mining and machine learning, since it emerged as an effective and robust clustering framework. Typically, EC methods try to fuse multiple basic partitions (BPs) into a consensus one, of which each BP is obtained by performing traditional clustering method on the same dataset. One promising direction for ensemble clustering is to derive pairwise similarity from BPs, and then transform it as a graph partition problem. However, these graph based methods may suffer from an information loss when computing the similarity between data points, because they only utilize the categorical data provided by multiple BPs, yet neglect rich information from raw features. This problem can badly undermine the underlying cluster structure in the original feature space, and thus degrade the clustering performance. In light of this, we propose a novel Simultaneous Clustering and Ensemble (SCE) framework to alleviate such detrimental effect, which employs the similarity matrix from raw features to enhance the co-association matrix summarized by multiple BPs. Two neat closed-form solutions given by eigenvalue decomposition are provided for SCE. Experiments conducted on 16 real-world datasets demonstrate the effectiveness of the proposed SCE over the traditional clustering and state-of-the-art ensemble clustering methods. Moreover, several impact factors that may affect our method are also explored extensively.

artificial intelligence, ensemble, machine learning, (18 more...)

Thirty-First AAAI Conference on Artificial Intelligence

Country: North America > United States (0.47)

Genre: Research Report > Experimental Study (0.46)

Industry: Energy > Oil & Gas (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

A General Clustering Agreement Index: For Comparing Disjoint and Overlapping Clusters

Rabbany, Reihaneh (University of Alberta) | Zaïane, Osmar R. (University of Alberta)

A clustering agreement index quantifies the similarity between two given clusterings. It is most commonly used to compare the results obtained from different clustering algorithms against the ground-truth clustering in the benchmark datasets. In this paper, we present a general Clustering Agreement Index (CAI) for comparing disjoint and overlapping clusterings. CAI is generic and introduces a family of clustering agreement indexes. In particular, the two widely used indexes of Adjusted Rand Index (ARI), and Normalized Mutual Information (NMI), are special cases of the CAI. Our index, therefore, provides overlapping extensions for both these commonly used indexes, whereas their original formulations are only defined for disjoint cases. Lastly, unlike previous indexes, CAI is flexible and can be adapted to incorporate the structure of the data, which is important when comparing clusters in networks, a.k.a communities.

agreement index, extension, overlap, (14 more...)

Thirty-First AAAI Conference on Artificial Intelligence

Country: North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.88)

Zhang, Nevin L. (The Hong Kong University of Science and Technology) | Poon, Leonard K. M. (The Education University of Hong Kong)

Latent Tree Analysis

Latent tree analysis seeks to model the correlations amonga set of random variables using a tree of latent variables. It was proposed as an improvement to latent class analysis—a method widely used in social sciences and medicine to identify homogeneous subgroups in a population. It provides new and fruitful perspectives on a number of machine learningareas, including cluster analysis, topic detection, and deep probabilistic modeling. This paper gives an overview of the research on latent tree analysis and various ways it is used inpractice.

latent variable, partition, tree model, (16 more...)

Thirty-First AAAI Conference on Artificial Intelligence

Country:

Asia > China > Hong Kong (0.05)
Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Overview (0.88)
Instructional Material (0.68)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Shen, Xiaobo (Nanjing University of Science and Technology) | Liu, Weiwei (University of Technology Sydney) | Tsang, Ivor (University of Technology Sydney) | Shen, Fumin (University of Electronic Science and Technology of China) | Sun, Quan-Sen (Nanjing University of Science and Technology)

Compressed K-Means for Large-Scale Clustering

Large-scale clustering has been widely used in many applications, and has received much attention. Most existing clustering methods suffer from both expensive computation and memory costs when applied to large-scale datasets. In this paper, we propose a novel clustering method, dubbed compressed k-means (CKM), for fast large-scale clustering. Specifically, high-dimensional data are compressed into short binary codes, which are well suited for fast clustering. CKM enjoys two key benefits: 1) storage can be significantly reduced by representing data points as binary codes; 2) distance computation is very efficient using Hamming metric between binary codes. We propose to jointly learn binary codes and clusters within one framework. Extensive experimental results on four large-scale datasets, including two million-scale datasets demonstrate that CKM outperforms the state-of-the-art large-scale clustering methods in terms of both computation and memory cost, while achieving comparable clustering accuracy.

binary code, dataset, large-scale dataset, (16 more...)

Thirty-First AAAI Conference on Artificial Intelligence

Country:

Asia > Middle East > Jordan (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Robust Manifold Matrix Factorization for Joint Clustering and Feature Extraction

Zhang, Lefei (Wuhan University) | Zhang, Qian (Alibaba Group) | Du, Bo (Wuhan University) | Tao, Dacheng (University of Technology Sydney) | You, Jane (The Hong Kong Polytechnic University)

Low-rank matrix approximation has been widely used for data subspace clustering and feature representation in many computer vision and pattern recognition applications. However, in order to enhance the discriminability, most of the matrix approximation based feature extraction algorithms usually generate the cluster labels by certain clustering algorithm (e.g., the kmeans) and then perform the matrix approximation guided by such label information. In addition, the noises and outliers in the dataset with large reconstruction errors will easily dominate the objective function by the conventional ℓ 2 -norm based squared residue minimization. In this paper, we propose a novel clustering and feature extraction algorithm based on an unified low-rank matrix factorization framework, which suggests that the observed data matrix can be approximated by the production of projection matrix and low dimensional representation, among which the low-dimensional representation can be approximated by the cluster indicator and latent feature matrix simultaneously. Furthermore, we have proposed using the ℓ 2,1 -norm and integrating the manifold regularization to further promote the proposed model. A novel Augmented Lagrangian Method (ALM) based procedure is designed to effectively and efficiently seek the optimal solution of the problem. The experimental results in both clustering and feature extraction perspectives demonstrate the superior performance of the proposed method.

dataset, factorization, matrix factorization, (15 more...)

Thirty-First AAAI Conference on Artificial Intelligence

Country:

Asia > Middle East > Jordan (0.05)
Asia > China > Hubei Province > Wuhan (0.05)
Asia > China > Hong Kong (0.05)
(3 more...)

Technology:

Information Technology > Data Science > Data Mining > Feature Extraction (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.90)