AITopics

1210.4184

Country: North America > United States (0.46)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Yu, Yao-Liang, Schuurmans, Dale

Rank/Norm Regularization with Closed-Form Solutions: Application to Subspace Clustering

arXiv.org Machine LearningOct-9-2012

When data is sampled from an unknown subspace, principal component analysis (PCA) provides an effective way to estimate the subspace and hence reduce the dimension of the data. At the heart of PCA is the Eckart-Young-Mirsky theorem, which characterizes the best rank k approximation of a matrix. In this paper, we prove a generalization of the Eckart-Young-Mirsky theorem under all unitarily invariant norms. Using this result, we obtain closed-form solutions for a set of rank/norm regularized problems, and derive closed-form solutions for a general class of subspace clustering problems (where data is modelled by unions of unknown subspaces). From these results we obtain new theoretical insights and promising experimental results.

artificial intelligence, machine learning, unitarily invariant norm, (16 more...)

1202.3772

Country: North America > Canada > Alberta (0.28)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.49)

Eaton, Eric, desJardins, Marie, Jacob, Sara

Multi-view constrained clustering with an incomplete mapping between views

arXiv.org Artificial IntelligenceOct-9-2012

Multi-view learning algorithms typically assume a complete bipartite mapping between the different views in order to exchange information during the learning process. However, many applications provide only a partial mapping between the views, creating a challenge for current methods. To address this problem, we propose a multi-view algorithm based on constrained clustering that can operate with an incomplete mapping. Given a set of pairwise constraints in each view, our approach propagates these constraints using a local similarity measure to those instances that can be mapped to the other views, allowing the propagated constraints to be transferred across views via the partial mapping. It uses co-EM to iteratively estimate the propagation within each view based on the current clustering model, transfer the constraints across views, and then update the clustering model. By alternating the learning process between views, this approach produces a unified clustering model that is consistent with all views. We show that this approach significantly improves clustering performance over several other methods for transferring constraints and allows multi-view clustering to be reliably applied when given a limited mapping between the views. Our evaluation reveals that the propagated constraints have high precision with respect to the true clusters in the data, explaining their benefit to clustering performance in both single- and multi-view learning scenarios.

artificial intelligence, constraint, machine learning, (14 more...)

doi: 10.1007/s10115-012-0577-7

1210.264

Country:

North America > United States > Maryland (0.46)
North America > United States > California (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.96)

Mahrsi, Mohamed Khalil El, Rossi, Fabrice

Modularity-Based Clustering for Network-Constrained Trajectories

arXiv.org Machine LearningOct-5-2012

We present a novel clustering approach for moving object trajectories that are constrained by an underlying road network. The approach builds a similarity graph based on these trajectories then uses modularity-optimization hiearchical graph clustering to regroup trajectories with similar profiles. Our experimental study shows the superiority of the proposed approach over classic hierarchical clustering and gives a brief insight to visualization of the clustering results.

artificial intelligence, machine learning, trajectory, (16 more...)

1205.2172

Country:

Europe > France (0.15)
North America > United States (0.14)

Genre: Research Report (1.00)

Industry:

Transportation > Infrastructure & Services (0.54)
Transportation > Ground > Road (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Mahrsi, Mohamed Khalil El, Rossi, Fabrice

Graph-Based Approaches to Clustering Network-Constrained Trajectory Data

arXiv.org Machine LearningOct-2-2012

Even though clustering trajectory data attracted considerable attention in the last few years, most of prior work assumed that moving objects can move freely in an euclidean space and did not consider the eventual presence of an underlying road network and its influence on evaluating the similarity between trajectories. In this paper, we present two approaches to clustering network-constrained trajectory data. The first approach discovers clusters of trajectories that traveled along the same parts of the road network. The second approach is segment-oriented and aims to group together road segments based on trajectories that they have in common. Both approaches use a graph model to depict the interactions between observations w.r.t. their similarity and cluster this similarity graph using a community detection algorithm. We also present experimental results obtained on synthetic data to showcase our propositions.

data mining, machine learning, trajectory, (17 more...)

1210.0762

Country:

North America > United States (0.30)
Europe (0.28)

Genre: Research Report (0.40)

Industry:

Transportation > Ground > Road (0.97)
Transportation > Infrastructure & Services (0.83)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Kim, Seyoung, Xing, Eric P.

Tree-guided group lasso for multi-response regression with structured sparsity, with an application to eQTL mapping

arXiv.org Machine LearningSep-28-2012

We consider the problem of estimating a sparse multi-response regression function, with an application to expression quantitative trait locus (eQTL) mapping, where the goal is to discover genetic variations that influence gene-expression levels. In particular, we investigate a shrinkage technique capable of capturing a given hierarchical structure over the responses, such as a hierarchical clustering tree with leaf nodes for responses and internal nodes for clusters of related responses at multiple granularity, and we seek to leverage this structure to recover covariates relevant to each hierarchically-defined cluster of responses. We propose a tree-guided group lasso, or tree lasso, for estimating such structured sparsity under multi-response regression by employing a novel penalty function constructed from the tree. We describe a systematic weighting scheme for the overlapping groups in the tree-penalty such that each regression coefficient is penalized in a balanced manner despite the inhomogeneous multiplicity of group memberships of the regression coefficients due to overlaps among groups. For efficient optimization, we employ a smoothing proximal gradient method that was originally developed for a general class of structured-sparsity-inducing penalties. Using simulated and yeast data sets, we demonstrate that our method shows a superior performance in terms of both prediction errors and recovery of true sparsity patterns, compared to other methods for learning a multivariate-response regression.

artificial intelligence, machine learning, regression coefficient, (15 more...)

doi: 10.1214/12-AOAS549

0909.1373

Country: North America > United States > Pennsylvania (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)

arXiv.org Machine LearningSep-21-2012

On Constrained Spectral Clustering and Its Applications

Wang, Xiang, Qian, Buyue, Davidson, Ian

Constrained clustering has been well-studied for algorithms such as $K$-means and hierarchical clustering. However, how to satisfy many constraints in these algorithmic settings has been shown to be intractable. One alternative to encode many constraints is to use spectral clustering, which remains a developing area. In this paper, we propose a flexible framework for constrained spectral clustering. In contrast to some previous efforts that implicitly encode Must-Link and Cannot-Link constraints by modifying the graph Laplacian or constraining the underlying eigenspace, we present a more natural and principled formulation, which explicitly encodes the constraints as part of a constrained optimization problem. Our method offers several practical advantages: it can encode the degree of belief in Must-Link and Cannot-Link constraints; it guarantees to lower-bound how well the given constraints are satisfied using a user-specified threshold; it can be solved deterministically in polynomial time through generalized eigendecomposition. Furthermore, by inheriting the objective function from spectral clustering and encoding the constraints explicitly, much of the existing analysis of unconstrained spectral clustering techniques remains valid for our formulation. We validate the effectiveness of our approach by empirical results on both artificial and real datasets. We also demonstrate an innovative use of encoding large number of constraints: transfer learning via constraints.

artificial intelligence, constraint, machine learning, (17 more...)

doi: 10.1007/s10618-012-0291-9

1201.5338

Country: North America > United States > California (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.87)

Eynard, Davide, Glashoff, Klaus, Bronstein, Michael M., Bronstein, Alexander M.

Multimodal diffusion geometry by joint diagonalization of Laplacians

arXiv.org Artificial IntelligenceSep-12-2012

We construct an extension of diffusion geometry to multiple modalities through joint approximate diagonalization of Laplacian matrices. This naturally extends classical data analysis tools based on spectral geometry, such as diffusion maps and spectral clustering. We provide several synthetic and real examples of manifold learning, retrieval, and clustering demonstrating that the joint diffusion geometry frequently better captures the inherent structure of multi-modal data. We also show that many previous attempts to construct multimodal spectral clustering can be seen as particular cases of joint approximate diagonalization of the Laplacians.

artificial intelligence, laplacian, machine learning, (18 more...)

1209.2295

Country:

Europe > Germany (0.28)
Asia > Middle East (0.28)

Genre: Research Report (0.64)

Industry: Education (0.34)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

arXiv.org Machine LearningSep-4-2012

A Split-Merge Framework for Comparing Clusterings

Xiang, Qiaoliang, Mao, Qi, Chai, Kian Ming, Chieu, Hai Leong, Tsang, Ivor, Zhao, Zhendong

Clustering evaluation measures are frequently used to evaluate the performance of algorithms. However, most measures are not properly normalized and ignore some information in the inherent structure of clusterings. We model the relation between two clusterings as a bipartite graph and propose a general component-based decomposition formula based on the components of the graph. Most existing measures are examples of this formula. In order to satisfy consistency in the component, we further propose a split-merge framework for comparing clusterings of different data sets. Our framework gives measures that are conditionally normalized, and it can make use of data point information, such as feature vectors and pairwise distances. We use an entropy-based instance of the framework and a coreference resolution data set to demonstrate empirically the utility of our framework over other measures.

artificial intelligence, machine learning, similarity measure, (15 more...)

1206.6475

Country: Europe (0.46)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.94)

De Vries, Christopher M., Geva, Shlomo, Trotman, Andrew

Document Clustering Evaluation: Divergence from a Random Baseline

arXiv.org Artificial IntelligenceAug-29-2012

Divergence from a random baseline is a technique for the evaluation of document clustering. It ensures cluster quality measures are performing work that prevents ineffective clusterings from giving high scores to clusterings that provide no useful result. These concepts are defined and analysed using intrinsic and extrinsic approaches to the evaluation of document cluster quality. This includes the classical clusters to categories approach and a novel approach that uses ad hoc information retrieval. The divergence from a random baseline approach is able to differentiate ineffective clusterings encountered in the INEX XML Mining track. It also appears to perform a normalisation similar to the Normalised Mutual Information (NMI) measure but it can be applied to any measure of cluster quality. When it is applied to the intrinsic measure of distortion as measured by RMSE, subtraction from a random baseline provides a clear optimum that is not apparent otherwise. This approach can be applied to any clustering evaluation. This paper describes its use in the context of document clustering evaluation.

category, data mining, machine learning, (19 more...)

1208.5654

Country: Oceania > Australia (0.28)

Genre: Research Report (0.84)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
(2 more...)