Goto

Collaborating Authors

 Kühnberger, Kai-Uwe


Opening the Black Box: Analyzing Attention Weights and Hidden States in Pre-trained Language Models for Non-language Tasks

arXiv.org Artificial Intelligence

Investigating deep learning language models has always been a significant research area due to the ``black box" nature of most advanced models. With the recent advancements in pre-trained language models based on transformers and their increasing integration into daily life, addressing this issue has become more pressing. In order to achieve an explainable AI model, it is essential to comprehend the procedural steps involved and compare them with human thought processes. Thus, in this paper, we use simple, well-understood non-language tasks to explore these models' inner workings. Specifically, we apply a pre-trained language model to constrained arithmetic problems with hierarchical structure, to analyze their attention weight scores and hidden states. The investigation reveals promising results, with the model addressing hierarchical problems in a moderately structured manner, similar to human problem-solving strategies. Additionally, by inspecting the attention weights layer by layer, we uncover an unconventional finding that layer 10, rather than the model's final layer, is the optimal layer to unfreeze for the least parameter-intensive approach to fine-tune the model. We support these findings with entropy analysis and token embeddings similarity analysis. The attention analysis allows us to hypothesize that the model can generalize to longer sequences in ListOps dataset, a conclusion later confirmed through testing on sequences longer than those in the training set. Lastly, by utilizing a straightforward task in which the model predicts the winner of a Tic Tac Toe game, we identify limitations in attention analysis, particularly its inability to capture 2D patterns.


Investigating Pre-trained Language Models on Cross-Domain Datasets, a Step Closer to General AI

arXiv.org Artificial Intelligence

Pre-trained language models have recently emerged as a powerful tool for fine-tuning a variety of language tasks. Ideally, when models are pre-trained on large amount of data, they are expected to gain implicit knowledge. In this paper, we investigate the ability of pre-trained language models to generalize to different non-language tasks. In particular, we test them on tasks from different domains such as computer vision, reasoning on hierarchical data, and protein fold prediction. The four pre-trained models that we used, T5, BART, BERT, and GPT-2 achieve outstanding results. They all have similar performance and they outperform transformers that are trained from scratch by a large margin. For instance, pre-trained language models perform better on the Listops dataset, with an average accuracy of 58.7\%, compared to transformers trained from scratch, which have an average accuracy of 29.0\%. The significant improvement demonstrated across three types of datasets suggests that pre-training on language helps the models to acquire general knowledge, bringing us a step closer to general AI. We also showed that reducing the number of parameters in pre-trained language models does not have a great impact as the performance drops slightly when using T5-Small instead of T5-Base. In fact, when using only 2\% of the parameters, we achieved a great improvement compared to training from scratch. Finally, in contrast to prior work, we find out that using pre-trained embeddings for the input layer is necessary to achieve the desired results.


Generalizing Psychological Similarity Spaces to Unseen Stimuli

arXiv.org Machine Learning

Generalizing Psychological Similarity Spaces to Unseen Stimuli Combining Multidimensional Scaling with Artificial Neural Networks Lucas Bechberger and Kai-Uwe Kühnberger Abstract The cognitive framework of conceptual spaces proposes to represent concepts as regions in psychological similarity spaces. These similarity spaces are typically obtained through multidimensional scaling (MDS), which converts human dissimilarity ratings for a fixed set of stimuli into a spatial representation. One can distinguish metric MDS (which assumes that the dissimilarity ratings are interval or ratio scaled) from nonmetric MDS (which only assumes an ordinal scale). In our first study, we show that despite its additional assumptions, metric MDS does not necessarily yield better solutions than nonmetric MDS. In this chapter, we furthermore propose to learn a mapping from raw stimuli into the similarity space using artificial neural networks (ANNs) in order to generalize the similarity space to unseen inputs. In our second study, we show that a linear regression from the activation vectors of a convolutional ANN to similarity spaces obtained by MDS can be successful and that the results are sensitive to the number of dimensions of the similarity space. 1 Introduction The cognitive framework of conceptual spaces [Gärdenfors, 2000] proposes a geometric representation of conceptual structures: Instances are represented as points and concepts are represented as regions in psychological similarity spaces. Based on this representation, one can explain a range of cognitive phenomena from oneshotLucas Bechberger Institute of Cognitive Science, Osnabrück University email: lucas.bechberger@ The research presented in this paper is an updated, corrected, and significantly extended version of research reported in [Bechberger and Kypridemou, 2018]. 1 arXiv:1908.09260v1 In principle, there are three ways of obtaining the dimensions of a conceptual space: If the domain of interest is well understood, one can manually define the dimensions and thus the overall similarity space. A second approach is based on machine learning algorithms for dimensionality reduction. For instance, unsupervised artificial neural networks (ANNs) such as autoencoders or self-organizing maps can be used to find a compressed representation for a given set of input stimuli. This task is typically solved by optimizing a mathematical error function which may be not satisfactory from a psychological point of view. A third way of obtaining the dimensions of a conceptual space is based on dissimilarity ratings obtained from human subjects. The technique of "multidimensional scaling" (MDS) takes as an input these pairwise dissimilarities as well as the desired number t of dimensions. It then represents each stimulus as a point in an t -dimensional space in such a way that the distances between points in this space reflect the dissimilarities of their corresponding stimuli.


Formalized Conceptual Spaces with a Geometric Representation of Correlations

arXiv.org Artificial Intelligence

The highly influential framework of conceptual spaces provides a geometric way of representing knowledge. Instances are represented by points in a similarity space and concepts are represented by convex regions in this space. After pointing out a problem with the convexity requirement, we propose a formalization of conceptual spaces based on fuzzy star-shaped sets. Our formalization uses a parametric definition of concepts and extends the original framework by adding means to represent correlations between different domains in a geometric way. Moreover, we define various operations for our formalization, both for creating new concepts from old ones and for measuring relations between concepts. We present an illustrative toy-example and sketch a research project on concept formation that is based on both our formalization and its implementation.


A Comprehensive Implementation of Conceptual Spaces

arXiv.org Artificial Intelligence

The highly influential framework of conceptual spaces provides a geometric way of representing knowledge. Instances are represented by points and concepts are represented by regions in a (potentially) high-dimensional space. Based on our recent formalization, we present a comprehensive implementation of the conceptual spaces framework that is not only capable of representing concepts with inter-domain correlations, but that also offers a variety of operations on these concepts.


Formal Ways for Measuring Relations between Concepts in Conceptual Spaces

arXiv.org Artificial Intelligence

The highly influential framework of conceptual spaces provides a geometric way of representing knowledge. Instances are represented by points in a high-dimensional space and concepts are represented by regions in this space. In this article, we extend our recent mathematical formalization of this framework by providing quantitative mathematical definitions for measuring relations between concepts: We develop formal ways for computing concept size, subsethood, implication, similarity, and betweenness. This considerably increases the representational capabilities of our formalization and makes it the most thorough and comprehensive formalization of conceptual spaces developed so far.


Neural-Symbolic Learning and Reasoning: Contributions and Challenges

AAAI Conferences

The goal of neural-symbolic computation is to integrate robust connectionist learning and sound symbolic reasoning. With the recent advances in connectionist learning, in particular deep neural networks, forms of representation learning have emerged. However, such representations have not become useful for reasoning. Results from neural-symbolic computation have shown to offer powerful alternatives for knowledge representation, learning and reasoning in neural computation. This paper recalls the main contributions and discusses key challenges for neural-symbolic integration which have been identified at a recent Dagstuhl seminar.


Report on the Sixth Conference on Artificial General Intelligence

AI Magazine

Motivated by the original idea of artificial intelligence in the 1950s and 1960s, there has been a revival of research in general intelligence during the last years. The annual AGI conference series, which is the major event in this area, has been held in cooperation with AAAI since 2008. The sixth conference on AGI was held at Peking University, Beijing, from July 31 to August 3, 2013. AGI-13 was collocated with the International Joint Conference on Artificial Intelligence (IJCAI 2013), the major international AI conference. This was the first time an AGI conference took place in Asia.