Constrained Coclustering for Textual Documents

Jul-15-2010–AAAI Conferences

In this paper, we present a constrained co-clustering approach for clustering textual documents. Our approach combines the benefits of information-theoretic co-clustering and constrained clustering. We use a two-sided hidden Markov random field (HMRF) to model both the document and word constraints. We also develop an alternating expectation maximization (EM) algorithm to optimize the constrained co-clustering model. We have conducted two sets of experiments on a benchmark data set: (1) using human-provided category labels to derive document and word constraints for semi-supervised document clustering, and (2) using automatically extracted named entities to derive document constraints for unsupervised document clustering. Compared to several representative constrained clustering and co-clustering approaches, our approach is shown to be more effective for high-dimensional, sparse text data.

constraint, machine learning, natural language, (20 more...)

AAAI Conferences

Jul-15-2010

Conferences PDF

Add feedback

Country:
- North America > United States
  - Massachusetts > Middlesex County
    - Cambridge (0.04)
  - California > Santa Clara County
    - San Jose (0.04)
    - Palo Alto (0.04)
- Asia
  - Middle East > Jordan (0.05)
  - China > Beijing
    - Beijing (0.04)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Machine Learning
    - Statistical Learning > Clustering (1.00)
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.54)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found