FastKASSIM: A Fast Tree Kernel-Based Syntactic Similarity Metric
Chen, Maximillian, Chen, Caitlyn, Yu, Xiao, Yu, Zhou
–arXiv.org Artificial Intelligence
Syntax is a fundamental component of language, yet few metrics have been employed to capture syntactic similarity or coherence at the utterance- and document-level. The existing standard document-level syntactic similarity metric is computationally expensive and performs inconsistently when faced with syntactically dissimilar documents. To address these challenges, we present FastKASSIM, a metric for utterance- and document-level syntactic similarity which pairs and averages the most similar constituency parse trees between a pair of documents based on tree kernels. FastKASSIM is more robust to syntactic dissimilarities and runs up to to 5.32 times faster than its predecessor over documents in the r/ChangeMyView corpus. FastKASSIM's improvements allow us to examine hypotheses in two settings with large documents. We find that syntactically similar arguments on r/ChangeMyView tend to be more persuasive, and that syntax is predictive of authorship attribution in the Australian High Court Judgment corpus.
arXiv.org Artificial Intelligence
Jan-27-2023
- Country:
- Asia (0.67)
- Europe (1.00)
- North America > United States
- Minnesota (0.28)
- Genre:
- Research Report > New Finding (0.69)
- Technology: