Goto

Collaborating Authors

 prototypicality


A Comprehensive Evaluation of Semantic Relation Knowledge of Pretrained Language Models and Humans

Cao, Zhihan, Yamada, Hiroaki, Teufel, Simone, Tokunaga, Takenobu

arXiv.org Artificial Intelligence

Recently, much work has concerned itself with the enigma of what exactly PLMs (pretrained language models) learn about different aspects of language, and how they learn it. One stream of this type of research investigates the knowledge that PLMs have about semantic relations. However, many aspects of semantic relations were left unexplored. Only one relation was considered, namely hypernymy. Furthermore, previous work did not measure humans' performance on the same task as that solved by the PLMs. This means that at this point in time, there is only an incomplete view of models' semantic relation knowledge. To address this gap, we introduce a comprehensive evaluation framework covering five relations beyond hypernymy, namely hyponymy, holonymy, meronymy, antonymy, and synonymy. We use six metrics (two newly introduced here) for recently untreated aspects of semantic relation knowledge, namely soundness, completeness, symmetry, asymmetry, prototypicality, and distinguishability and fairly compare humans and models on the same task. Our extensive experiments involve 16 PLMs, eight masked and eight causal language models. Up to now only masked language models had been tested although causal and masked language models treat context differently. Our results reveal a significant knowledge gap between humans and models for almost all semantic relations. Antonymy is the outlier relation where all models perform reasonably well. In general, masked language models perform significantly better than causal language models. Nonetheless, both masked and causal language models are likely to confuse non-antonymy relations with antonymy.


The correlation between nativelike selection and prototypicality: a multilingual onomasiological case study using semantic embedding

Zhang, Huasheng

arXiv.org Artificial Intelligence

In native speakers' lexical choices, a concept can be more readily expressed by one expression over another grammatical one, a phenomenon known as nativelike selection (NLS). In previous research, arbitrary chunks such as collocations have been considered crucial for this phenomenon. However, this study examines the possibility of analyzing the semantic motivation and deducibility behind some NLSs by exploring the correlation between NLS and prototypicality, specifically the onomasiological hypothesis of Grondelaers and Geeraerts (2003, Towards a pragmatic model of cognitive onomasiology. In Hubert Cuyckens, Ren\'e Dirven & John R. Taylor (eds.), Cognitive approaches to lexical semantics, 67-92. Berlin: De Gruyter Mouton). They hypothesized that "[a] referent is more readily named by a lexical item if it is a salient member of the category denoted by that item". To provide a preliminary investigation of this important but rarely explored phenomenon, a series of innovative methods and procedures, including the use of semantic embedding and interlingual comparisons, is designed. Specifically, potential NLSs are efficiently discovered through an automatic exploratory analysis using topic modeling techniques, and then confirmed by manual inspection through frame semantics. Finally, to account for the NLS in question, cluster analysis and behavioral profile analysis are conducted to uncover a language-specific prototype for the Chinese verb shang 'harm', providing supporting evidence for the correlation between NLS and prototypicality.


More Distinctively Black and Feminine Faces Lead to Increased Stereotyping in Vision-Language Models

Lee, Messi H. J., Montgomery, Jacob M., Lai, Calvin K.

arXiv.org Artificial Intelligence

Vision Language Models (VLMs), exemplified by GPT-4V, adeptly integrate text and vision modalities. This integration enhances Large Language Models' ability to mimic human perception, allowing them to process image inputs. Despite VLMs' advanced capabilities, however, there is a concern that VLMs inherit biases of both modalities in ways that make biases more pervasive and difficult to mitigate. Our study explores how VLMs perpetuate homogeneity bias and trait associations with regards to race and gender. When prompted to write stories based on images of human faces, GPT-4V describes subordinate racial and gender groups with greater homogeneity than dominant groups and relies on distinct, yet generally positive, stereotypes. Importantly, VLM stereotyping is driven by visual cues rather than group membership alone such that faces that are rated as more prototypically Black and feminine are subject to greater stereotyping. These findings suggest that VLMs may associate subtle visual cues related to racial and gender groups with stereotypes in ways that could be challenging to mitigate. We explore the underlying reasons behind this behavior and discuss its implications and emphasize the importance of addressing these biases as VLMs come to mirror human perception.


Dissecting Sample Hardness: A Fine-Grained Analysis of Hardness Characterization Methods for Data-Centric AI

Seedat, Nabeel, Imrie, Fergus, van der Schaar, Mihaela

arXiv.org Artificial Intelligence

Characterizing samples that are difficult to learn from is crucial to developing highly performant ML models. This has led to numerous Hardness Characterization Methods (HCMs) that aim to identify "hard" samples. However, there is a lack of consensus regarding the definition and evaluation of "hardness". Unfortunately, current HCMs have only been evaluated on specific types of hardness and often only qualitatively or with respect to downstream performance, overlooking the fundamental quantitative identification task. We address this gap by presenting a fine-grained taxonomy of hardness types. Additionally, we propose the Hardness Characterization Analysis Toolkit (H-CAT), which supports comprehensive and quantitative benchmarking of HCMs across the hardness taxonomy and can easily be extended to new HCMs, hardness types, and datasets. We use H-CAT to evaluate 13 different HCMs across 8 hardness types. This comprehensive evaluation encompassing over 14K setups uncovers strengths and weaknesses of different HCMs, leading to practical tips to guide HCM selection and future development. Our findings highlight the need for more comprehensive HCM evaluation, while we hope our hardness taxonomy and toolkit will advance the principled evaluation and uptake of data-centric AI methods.


Unsupervised Feature Learning with Emergent Data-Driven Prototypicality

Guo, Yunhui, Zhang, Youren, Chen, Yubei, Yu, Stella X.

arXiv.org Artificial Intelligence

Given an image set without any labels, our goal is to train a model that maps each image to a point in a feature space such that, not only proximity indicates visual similarity, but where it is located directly encodes how prototypical the image is according to the dataset. Our key insight is to perform unsupervised feature learning in hyperbolic instead of Euclidean space, where the distance between points still reflect image similarity, and yet we gain additional capacity for representing prototypicality with the location of the point: The closer it is to the origin, the more prototypical it is. The latter property is simply emergent from optimizing the usual metric learning objective: The image similar to many training instances is best placed at the center of corresponding points in Euclidean space, but closer to the origin in hyperbolic space. We propose an unsupervised feature learning algorithm in Hyperbolic space with sphere pACKing. HACK first generates uniformly packed particles in the Poincar\'e ball of hyperbolic space and then assigns each image uniquely to each particle. Images after congealing are regarded more typical of the dataset it belongs to. With our feature mapper simply trained to spread out training instances in hyperbolic space, we observe that images move closer to the origin with congealing, validating our idea of unsupervised prototypicality discovery. We demonstrate that our data-driven prototypicality provides an easy and superior unsupervised instance selection to reduce sample complexity, increase model generalization with atypical instances and robustness with typical ones.


Are all training examples equally valuable?

Lapedriza, Agata, Pirsiavash, Hamed, Bylinskii, Zoya, Torralba, Antonio

arXiv.org Machine Learning

When learning a new concept, not all training examples may prove equally useful for training: some may have higher or lower training value than others. The goal of this paper is to bring to the attention of the vision community the following considerations: (1) some examples are better than others for training detectors or classifiers, and (2) in the presence of better examples, some examples may negatively impact performance and removing them may be beneficial. In this paper, we propose an approach for measuring the training value of an example, and use it for ranking and greedily sorting examples. We test our methods on different vision tasks, models, datasets and classifiers. Our experiments show that the performance of current state-of-the-art detectors and classifiers can be improved when training on a subset, rather than the whole training set.