Understanding Higher-Order Correlations Among Semantic Components in Embeddings
Oyama, Momose, Yamagiwa, Hiroaki, Shimodaira, Hidetoshi
–arXiv.org Artificial Intelligence
Independent Component Analysis (ICA) offers interpretable semantic components of embeddings. While ICA theory assumes that embeddings can be linearly decomposed into independent components, real-world data often do not satisfy this assumption. Consequently, non-independencies remain between the estimated components, which ICA cannot eliminate. We quantified these non-independencies using higher-order correlations and demonstrated that when the higher-order correlation between two components is large, it indicates a strong semantic association between them, along with many words sharing common meanings with both components. The entire structure of non-independencies was visualized using Figure 1: Heatmap visualization of 300-dimensional a maximum spanning tree of semantic components. SGNS embeddings transformed by PCA and ICA, with These findings provide deeper insights axes sorted by variance and skewness, respectively.
arXiv.org Artificial Intelligence
Oct-9-2024
- Country:
- Africa
- Asia
- China > Beijing
- Beijing (0.04)
- India (0.04)
- Indonesia (0.04)
- Japan > Honshū
- Kansai > Kyoto Prefecture
- Kyoto (0.04)
- Kantō > Tokyo Metropolis Prefecture
- Tokyo (0.04)
- Kansai > Kyoto Prefecture
- Malaysia (0.04)
- Middle East
- Iran (0.04)
- Iraq (0.04)
- Israel (0.04)
- Jordan (0.04)
- Republic of Türkiye > Istanbul Province
- Istanbul (0.04)
- Russia (0.04)
- Singapore (0.04)
- China > Beijing
- Europe
- Portugal (0.04)
- Ireland (0.04)
- Middle East > Republic of Türkiye
- Istanbul Province > Istanbul (0.04)
- Sweden (0.04)
- Croatia (0.04)
- Belgium (0.04)
- Greece (0.04)
- Italy (0.04)
- Serbia (0.04)
- Norway (0.04)
- United Kingdom > England
- Greater London > London > Wimbledon (0.04)
- Germany (0.04)
- Poland (0.04)
- Spain > Aragón (0.04)
- Austria (0.04)
- Netherlands > North Holland
- Amsterdam (0.04)
- Russia > Central Federal District
- Moscow Oblast > Moscow (0.04)
- North America
- Canada
- United States
- Alabama (0.04)
- California (0.04)
- Kentucky (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Mississippi (0.04)
- Missouri (0.04)
- Texas > Smith County
- Tyler (0.04)
- Oceania > Australia (0.04)
- Genre:
- Research Report (0.82)
- Industry:
- Government (0.68)
- Health & Medicine
- Pharmaceuticals & Biotechnology (0.94)
- Therapeutic Area (1.00)
- Law (0.68)
- Leisure & Entertainment > Sports (1.00)
- Technology: