Sub-Sentence Encoder: Contrastive Learning of Propositional Semantic Representations

Chen, Sihao, Zhang, Hongming, Chen, Tong, Zhou, Ben, Yu, Wenhao, Yu, Dian, Peng, Baolin, Wang, Hongwei, Roth, Dan, Yu, Dong

Nov-7-2023–arXiv.org Artificial Intelligence

We introduce sub-sentence encoder, a contrastively-learned contextual embedding model for fine-grained semantic representation of text. In contrast to the standard practice with sentence embeddings, where the meaning of an entire sequence of text is encoded into a fixed-length vector, the sub-sentence encoder learns to produce distinct contextual embeddings corresponding to different atomic propositions, i.e. atomic units of meaning expressed within a text sequence. The sub-sentence embeddings are contrastively learned to recognize (inferred) semantic equivalence between propositions across different text sequences. Our experiments show the effectiveness of sub-sentence encoders in applications, such as retrieving supporting facts for fine-grained text attribution or recognizing the conditional semantic similarity between texts. In practice, we demonstrate that sub-sentence encoders keep the same level of inference cost and space complexity compared to sentence encoders.

artificial intelligence, natural language, text processing, (3 more...)

arXiv.org Artificial Intelligence

Nov-7-2023

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Text Processing (0.89)
  - Representation & Reasoning (0.60)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found