A model and package for German ColBERT

Dang, Thuong, Chen, Qiqi

arXiv.org Artificial Intelligence 

The original ColBERT model was proposed by Khattab and Zaharia [8 ], introducing the MaxSim scoring function based on token-level intera ctions. The model was trained using a softmax cross-entropy loss over triplet s derived from the MS MARCO Ranking [1] and TREC Complex Answer Retrieval (TREC CAR) [5] datasets, leveraging the English BERT model [4] as its backb one encoder. The ColBERT MaxSim score can be interpreted as a substitut e for the BM25 score used in full-text search; consequently, there are simila rities between the ColBERT retrieval method and BM25-based full-text search. T his will be discussed in detail in Section 2. ColBERT is flexible, and can be used as a first retrieval method or a reranker. ColBERT score is computed o n the token similarity level, and can be applied in contexts where keyword similarities are significant. ColBERT model was also trained for Japanese [3] where the author a lso discussed different strategies to choose hard negatives using mult ilingual e5 embedding model and BM25.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found