Understanding Higher-Order Correlations Among Semantic Components in Embeddings

Oyama, Momose, Yamagiwa, Hiroaki, Shimodaira, Hidetoshi

Oct-9-2024–arXiv.org Artificial Intelligence

Independent Component Analysis (ICA) offers interpretable semantic components of embeddings. While ICA theory assumes that embeddings can be linearly decomposed into independent components, real-world data often do not satisfy this assumption. Consequently, non-independencies remain between the estimated components, which ICA cannot eliminate. We quantified these non-independencies using higher-order correlations and demonstrated that when the higher-order correlation between two components is large, it indicates a strong semantic association between them, along with many words sharing common meanings with both components. The entire structure of non-independencies was visualized using Figure 1: Heatmap visualization of 300-dimensional a maximum spanning tree of semantic components. SGNS embeddings transformed by PCA and ICA, with These findings provide deeper insights axes sorted by variance and skewness, respectively.

axis, component value, higher-order correlation, (13 more...)

arXiv.org Artificial Intelligence

Oct-9-2024

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia (0.04)
- North America
  - United States
    - Missouri (0.04)
    - Mississippi (0.04)
    - Kentucky (0.04)
    - California (0.04)
    - Alabama (0.04)
    - Texas > Smith County
      - Tyler (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.14)
  - Canada
    - Quebec (0.04)
    - Ontario (0.04)
- Europe
  - Germany (0.04)
  - Austria (0.04)
  - Greece (0.04)
  - Belgium (0.04)
  - Italy (0.04)
  - Spain > Aragón (0.04)
  - Norway (0.04)
  - Serbia (0.04)
  - Croatia (0.04)
  - Poland (0.04)
  - Sweden (0.04)
  - Ireland (0.04)
  - Portugal (0.04)
  - Russia > Central Federal District
    - Moscow Oblast > Moscow (0.04)
  - Netherlands > North Holland
    - Amsterdam (0.04)
  - United Kingdom > England
    - Greater London > London > Wimbledon (0.04)
  - Middle East > Republic of Türkiye
    - Istanbul Province > Istanbul (0.04)
- Asia
  - India (0.04)
  - Singapore (0.04)
  - Russia (0.04)
  - Malaysia (0.04)
  - Indonesia (0.04)
  - Middle East
    - Israel (0.04)
    - Iran (0.04)
    - Jordan (0.04)
    - Iraq (0.04)
    - Republic of Türkiye > Istanbul Province
      - Istanbul (0.04)
  - Japan > Honshū
    - Kantō > Tokyo Metropolis Prefecture
      - Tokyo (0.04)
    - Kansai > Kyoto Prefecture
      - Kyoto (0.04)
  - China > Beijing
    - Beijing (0.04)
- Africa
  - Namibia (0.04)
  - Ethiopia (0.04)
  - Eritrea (0.04)

Genre:
- Research Report (0.82)

Industry:
- Leisure & Entertainment > Sports (1.00)
- Government (0.68)
- Law (0.68)
- Health & Medicine
  - Therapeutic Area (1.00)
  - Pharmaceuticals & Biotechnology (0.94)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Natural Language > Text Processing (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found