AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.47)

Neural Information Processing SystemsJun-18-2026, 07:24:26 GMT

6ebb92aad3a4fe7aae230b0e63c2ef35-Paper-Conference.pdf

Recent advances in multimodal models have raised questions about whether visionand-language models (VLMs) integrate cross-modal information in ways that reflect human cognition. One well-studied test case in this domain is the boubakiki effect, where humans reliably associate pseudowords like'bouba' with round shapes and'kiki' with jagged ones. Given the mixed evidence found in prior studies for this effect in VLMs, we present a comprehensive re-evaluation focused on two variants of CLIP, ResNet and Vision Transformer (ViT), given their centrality in many state-of-the-art VLMs. We apply two complementary methods closely modelled after human experiments: a prompt-based evaluation that uses probabilities as a measure of model preference, and we use Grad-CAM as a novel approach to interpret visual attention in shape-word matching tasks. Our findings show that these model variants do not consistently exhibit the bouba-kiki effect. While ResNet shows a preference for round shapes, overall performance across both model variants lacks the expected associations. Moreover, direct comparison with prior human data on the same task shows that the models' responses fall markedly short of the robust, modality-integrated behaviour characteristic of human cognition. These results contribute to the ongoing debate about the extent to which VLMs truly understand cross-modal concepts, highlighting limitations in their internal representations and alignment with human intuitions.

large language model, machine learning, natural language, (19 more...)

Country:

North America > United States (1.00)
Europe (1.00)
Asia > Middle East > UAE (0.46)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
(2 more...)

Neural Information Processing SystemsApr-30-2026, 08:25:43 GMT

or Sound Symbolism in Vision and Language Models

Although the mapping between sound and meaning in human language is assumed to be largely arbitrary, research in cognitive science has shown that there are non-trivial correlations between particular sounds and meanings across languages and demographic groups, a phenomenon known as sound symbolism. Among the many dimensions of meaning, sound symbolism is particularly salient and welldemonstrated with regards to cross-modal associations between language and the visual domain. In this work, we address the question of whether sound symbolism is reflected in vision-and-language models such as CLIP and Stable Diffusion. Using zero-shot knowledge probing to investigate the inherent knowledge of these models, we find strong evidence that they do show this pattern, paralleling the well-known kiki-bouba effect in psycholinguistics. Our work provides a novel method for demonstrating sound symbolism and understanding its nature using computational tools. Our code will be made publicly available1.

artificial intelligence, machine learning, natural language, (18 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Neural Information Processing SystemsFeb-18-2026, 00:40:58 GMT

or Sound Symbolism in Vision and Language Models

That which we call a rose by any other name would smell as sweet."

artificial intelligence, machine learning, natural language, (18 more...)

Country:

North America > United States (0.14)
Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
Africa > Namibia (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Neural Information Processing SystemsFeb-12-2026, 00:47:18 GMT

439539557e9ba0d04055773ff1f3241c-Paper-Datasets_and_Benchmarks_Track.pdf

large language model, machine learning, natural language, (19 more...)

Country: North America > United States > Iowa (0.04)

Genre: Research Report (0.93)

Industry: Law (0.93)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Cognitive Science (0.93)

arXiv.org Artificial IntelligenceDec-2-2025

Slovak Conceptual Dictionary

Blšták, Miroslav

When solving tasks in the field of natural language processing, we sometimes need dictionary tools, such as lexicons, word form dictionaries or knowledge bases. However, the availability of dictionary data is insufficient in many languages, especially in the case of low resourced languages. In this article, we introduce a new conceptual dictionary for the Slovak language as the first linguistic tool of this kind. Since Slovak language is a language with limited linguistic resources and there are currently not available any machine-readable linguistic data sources with a sufficiently large volume of data, many tasks which require automated processing of Slovak text achieve weaker results compared to other languages and are almost impossible to solve.

artificial intelligence, natural language, text processing, (17 more...)

2512.00579

Country: Europe > Austria (0.28)

Genre: Research Report (0.50)

Industry: Education (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

arXiv.org Artificial IntelligenceNov-13-2025

Evaluating DisCoCirc in Translation Tasks & its Limitations: A Comparative Study Between Bengali & English

Moon, Nazmoon Falgunee

In [4], the authors present the DisCoCirc (Distributed Compositional Circuits) formalism for the English language, a grammar-based framework derived from the production rules that incorporates circuit-like representations in order to give a precise categorical theoretical structure to the language. In this paper, we extend this approach to develop a similar framework for Bengali and apply it to translation tasks between English and Bengali. A central focus of our work lies in reassessing the effectiveness of DisCoCirc in reducing language bureaucracy. Unlike the result suggested in [5], our findings indicate that although it works well for a large part of the language, it still faces limitations due to the structural variation of the two languages. We discuss the possible methods that might handle these shortcomings and show that, in practice, DisCoCirc still struggles even with relatively simple sentences. This divergence from prior claims not only highlights the framework's constraints in translation but also suggest scope for future improvement. Apart from our primary focus on English-Bengali translation, we also take a short detour to examine English conjunctions, following [1], showing a connection between conjunctions and Boolean logic.

artificial intelligence, natural language, text circuit, (19 more...)

2511.08601

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Pekkanen, Matti, Verdoja, Francesco, Kyrki, Ville

QuASH: Using Natural-Language Heuristics to Query Visual-Language Robotic Maps

arXiv.org Artificial IntelligenceOct-17-2025

Embeddings from Visual-Language Models are increasingly utilized to represent semantics in robotic maps, offering an open-vocabulary scene understanding that surpasses traditional, limited labels. Embeddings enable on-demand querying by comparing embedded user text prompts to map embeddings via a similarity metric. The key challenge in performing the task indicated in a query is that the robot must determine the parts of the environment relevant to the query. This paper proposes a solution to this challenge. We leverage natural-language synonyms and antonyms associated with the query within the embedding space, applying heuristics to estimate the language space relevant to the query, and use that to train a classifier to partition the environment into matches and non-matches. We evaluate our method through extensive experiments, querying both maps and standard image benchmarks. The results demonstrate increased queryability of maps and images. Our querying technique is agnostic to the representation and encoder used, and requires limited training.

artificial intelligence, large language model, natural language, (17 more...)

2510.14546

Country:

Europe (1.00)
North America > United States (0.46)
Asia > China (0.28)
Asia > Middle East > UAE (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.98)

Kouwenhoven, Tom, Shahrasbi, Kiana, Verhoef, Tessa

Cross-modal Associations in Vision and Language Models: Revisiting the Bouba-Kiki Effect

arXiv.org Artificial IntelligenceOct-16-2025

Recent advances in multimodal models have raised questions about whether vision-and-language models (VLMs) integrate cross-modal information in ways that reflect human cognition. One well-studied test case in this domain is the bouba-kiki effect, where humans reliably associate pseudowords like `bouba' with round shapes and `kiki' with jagged ones. Given the mixed evidence found in prior studies for this effect in VLMs, we present a comprehensive re-evaluation focused on two variants of CLIP, ResNet and Vision Transformer (ViT), given their centrality in many state-of-the-art VLMs. We apply two complementary methods closely modelled after human experiments: a prompt-based evaluation that uses probabilities as a measure of model preference, and we use Grad-CAM as a novel approach to interpret visual attention in shape-word matching tasks. Our findings show that these model variants do not consistently exhibit the bouba-kiki effect. While ResNet shows a preference for round shapes, overall performance across both model variants lacks the expected associations. Moreover, direct comparison with prior human data on the same task shows that the models' responses fall markedly short of the robust, modality-integrated behaviour characteristic of human cognition. These results contribute to the ongoing debate about the extent to which VLMs truly understand cross-modal concepts, highlighting limitations in their internal representations and alignment with human intuitions.

cross-modal association, machine learning, natural language, (19 more...)

2507.10013

Country:

North America > United States (1.00)
Europe (1.00)
Asia > Middle East > UAE (0.46)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Bouguettaya, Ayoub, Stuart, Elizabeth M.

Machine learning methods fail to provide cohesive atheoretical construction of personality traits from semantic embeddings

arXiv.org Artificial IntelligenceOct-14-2025

Here, we test this hypothesis using novel machine learning methods to create a bottom-up, atheoretical model of personality from the same trait-descriptive adjective list that led to the dominant, contemporary model of personality (the Big Five). We then compare the descriptive utility of this machine learning method (resulting in lexical clusters) by comparing it to the established Big Five personality model in how well these describe conversations online (on Reddit forums). Our analysis of 1 million online comments shows that the Big Five model provides a much more powerful and interpretable description of these communities and the differences between them. Specifically, the dimensions of Agreeableness, Conscientiousness, and Neuroticism effectively distinguish Reddit communities. In contrast, our lexical clusters do not provide meaningful distinctions and fail to describe the spread. Validation against the International Personality Item Pool confirmed the Big Five model's superior psychometric coherence, and our machine learning methods notably failed to recover the trait of Extraversion. These results affirm the robustness of the Big Five, while also showing that the semantic structure of personality is likely depending on social context. Our findings suggest that while machine learning can help with understanding and explaining human behavior, especially by checking ecological validity of existing theories, machine learning methods may not be able to replace established psychological theories.

artificial intelligence, machine learning, personality, (16 more...)

2510.09739

Country: North America > United States (0.68)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)