Individuation in Neural Models with and without Visual Grounding

Tikhonov, Alexey, Bylinina, Lisa, Yamshchikov, Ivan P.

Sep-27-2024–arXiv.org Artificial Intelligence

We show differences between a language-and-vision model CLIP and two text-only models - FastText and SBERT - when it comes to the encoding of individuation information. We study latent representations that CLIP provides for substrates, granular aggregates, and various numbers of objects. We demonstrate that CLIP embeddings capture quantitative differences in individuation better than models trained on text-only data. Moreover, the individuation hierarchy we deduce from the CLIP embeddings agrees with the hierarchies proposed in linguistics and cognitive science.

artificial intelligence, individuation, natural language, (18 more...)

arXiv.org Artificial Intelligence

Sep-27-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Illinois > Cook County > Chicago (0.05)
- Europe
  - Netherlands (0.04)
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
  - Germany
    - Berlin (0.04)
    - Bavaria > Lower Franconia
      - Würzburg (0.04)
- Africa > Middle East
  - Egypt > Cairo Governorate > Cairo (0.04)

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Representation & Reasoning (0.93)
  - Cognitive Science (0.89)
  - Vision (0.70)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found