PiC: A Phrase-in-Context Dataset for Phrase Understanding and Semantic Search
Pham, Thang M., Yoon, Seunghyun, Bui, Trung, Nguyen, Anh
–arXiv.org Artificial Intelligence
While contextualized word embeddings have been a de-facto standard, learning contextualized phrase embeddings is less explored and being hindered by the lack of a human-annotated benchmark that tests machine understanding of phrase semantics given a context sentence or paragraph (instead of phrases alone). To fill this gap, we propose PiC -- a dataset of ~28K of noun phrases accompanied by their contextual Wikipedia pages and a suite of three tasks for training and evaluating phrase embeddings. Training on PiC improves ranking models' accuracy and remarkably pushes span-selection (SS) models (i.e., predicting the start and end index of the target phrase) near-human accuracy, which is 95% Exact Match (EM) on semantic search given a query phrase and a passage. Interestingly, we find evidence that such impressive performance is because the SS models learn to better capture the common meaning of a phrase regardless of its actual context. SotA models perform poorly in distinguishing two senses of the same phrase in two contexts (~60% EM) and in estimating the similarity between two different phrases in the same context (~70% EM).
arXiv.org Artificial Intelligence
Feb-2-2023
- Country:
- Asia
- Europe
- Czechia > Prague (0.04)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- France > Occitanie
- Haute-Garonne > Toulouse (0.04)
- Sweden (0.04)
- Ukraine
- Crimea > Sevastopol (0.04)
- Poltava Oblast > Poltava (0.04)
- Portugal > Lisbon
- Lisbon (0.04)
- Norway (0.04)
- Denmark > Capital Region
- Copenhagen (0.04)
- Austria
- Indian Ocean (0.04)
- North America
- Dominican Republic (0.04)
- United States
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Texas > Travis County
- Austin (0.04)
- Minnesota > Hennepin County
- Oceania > Australia (0.68)
- Genre:
- Research Report (0.63)
- Industry:
- Government
- Military (0.67)
- Regional Government > Oceania Government
- Australia Government (0.45)
- Health & Medicine (1.00)
- Law (0.67)
- Law Enforcement & Public Safety
- Crime Prevention & Enforcement (1.00)
- Terrorism (0.92)
- Leisure & Entertainment (0.92)
- Media (1.00)
- Government
- Technology:
- Information Technology
- Artificial Intelligence
- Machine Learning (1.00)
- Natural Language > Text Processing (0.68)
- Communications (0.90)
- Information Management > Search (0.84)
- Artificial Intelligence
- Information Technology