Idioms, Probing and Dangerous Things: Towards Structural Probing for Idiomaticity in Vector Space
Klubička, Filip, Nedumpozhimana, Vasudevan, Kelleher, John D.
–arXiv.org Artificial Intelligence
The goal of this paper is to learn more about how idiomatic information is structurally encoded in embeddings, using a structural probing method. We repurpose an existing English verbal multi-word expression (MWE) dataset to suit the probing framework and perform a comparative probing study of static (GloVe) and contextual (BERT) embeddings. Our experiments indicate that both encode some idiomatic information to varying degrees, but yield conflicting evidence as to whether idiomaticity is encoded in the vector norm, leaving this an open question. We also identify some limitations of the used dataset and highlight important directions for future work in improving its suitability for a probing analysis.
arXiv.org Artificial Intelligence
Apr-27-2023
- Country:
- Asia
- China > Hong Kong (0.04)
- Japan > Kyūshū & Okinawa
- Kyūshū > Miyazaki Prefecture > Miyazaki (0.04)
- Middle East
- Europe
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Bulgaria > Varna Province
- Varna (0.04)
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Germany > Berlin (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Slovenia (0.04)
- Spain > Valencian Community
- Valencia Province > Valencia (0.04)
- Belgium > Brussels-Capital Region
- North America > United States
- California
- Los Angeles County > Los Angeles (0.14)
- Santa Clara County > Palo Alto (0.04)
- Colorado > Denver County
- Denver (0.04)
- Illinois > Cook County
- Chicago (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- New Mexico > Santa Fe County
- Santa Fe (0.04)
- Texas > Travis County
- Austin (0.04)
- Washington > King County
- Seattle (0.04)
- California
- Oceania > Australia
- South America > Colombia
- Meta Department > Villavicencio (0.04)
- Asia
- Genre:
- Research Report > New Finding (1.00)