Dissimilarity Kernels for Paraphrase Identification

Lintean, Mihai (University of Memphis) | Rus, Vasile ( University of Memphis )

May-18-2011–AAAI Conferences

We present in this paper a novel solution to the problem of paraphrase identification based on lexical dissimilarity kernels. Lexical kernels in conjunction with Support Vector Machines are preferred over other learning methods, e.g. decision trees, due to their ability to handle a high number of features. Dissimilarity-based kernels emphasize dissimilarities among text fragments and therefore are appropriate for text similarity tasks characterized by high lexical overlap. We conducted experiments with our kernels on the Microsoft Research (MSR) Paraphrase Corpus, a standardized data set used for assessing approaches to paraphrase identification. Our reported accuracy results are competitive and robust when compared to state-of-the-art single-model approaches. The results were obtained using 10-fold cross-validation over the entire corpus. We also report competitive results on the test portion of the MSR Paraphrase Corpus, which is the standard way to report results on this corpus.

AAAI Conferences

May-18-2011

Conferences PDF

Add feedback

Country:
- North America > United States
  - Tennessee > Shelby County
    - Memphis (0.04)
  - New Jersey > Bergen County
    - Mahwah (0.04)
  - Michigan > Washtenaw County
    - Ann Arbor (0.04)
  - Massachusetts > Suffolk County
    - Boston (0.04)
  - California > Santa Clara County
    - San Jose (0.04)
- Europe > United Kingdom
  - England > East Sussex > Brighton (0.04)
- Asia
  - Singapore (0.04)
  - Middle East > Iraq
    - Kirkuk Governorate > Kirkuk (0.04)

Genre:
- Research Report > Promising Solution (0.48)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Text Processing (1.00)
  - Machine Learning > Statistical Learning
    - Support Vector Machines (0.89)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found