AITopics | Schulz, Alexander

Collaborating Authors

Schulz, Alexander

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Conceptualizing Uncertainty

Roberts, Isaac, Schulz, Alexander, Schroeder, Sarah, Hinder, Fabian, Hammer, Barbara

arXiv.org Artificial IntelligenceMar-5-2025

While advances in deep learning in the last years have led to impressive performance in many domains, such models are not always reliable, particularly when it comes to generalizing to new environments or adversarial attacks. To improve on that, numerous methods have been developed in the field of explainable artificial intelligence (xAI) [5] to provide insights into model behavior and facilitate actionable modifications. However, the majority of methods focus on explaining model predictions, which can help understand misclassifications but do not explicitly address predictive uncertainty(See Figure 1). Understanding uncertainty is crucial for detecting potential model weaknesses, particularly in dynamic environments. Since uncertainty quantification is useful in various applications, including active learning [20], classification with rejects [17], adversarial example detection [26], and reinforcement learning [24], a significant body of work aims to improve the quantification of predictive uncertainty using Bayesian deep learning (BDL) and approximations thereof [15,9,14]. In contrast, the literature on understanding the sources of uncertainty for a given model via explanations is limited, focusing on methods for feature attribution [28,27] (see section 2.4 for more related

explanation, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2503.03443

Country:

Europe (0.68)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Government (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.55)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.54)

Add feedback

Intelligent Learning Rate Distribution to reduce Catastrophic Forgetting in Transformers

Kenneweg, Philip, Schulz, Alexander, Schröder, Sarah, Hammer, Barbara

arXiv.org Artificial IntelligenceMar-27-2024

Pretraining language models on large text corpora is a common practice in natural language processing. Fine-tuning of these models is then performed to achieve the best results on a variety of tasks. In this paper, we investigate the problem of catastrophic forgetting in transformer neural networks and question the common practice of fine-tuning with a flat learning rate for the entire network in this context. We perform a hyperparameter optimization process to find learning rate distributions that are better than a flat learning rate. We combine the learning rate distributions thus found and show that they generalize to better performance with respect to the problem of catastrophic forgetting.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-031-21753-1_25

2404.01317

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Debiasing Sentence Embedders through Contrastive Word Pairs

Kenneweg, Philip, Schröder, Sarah, Schulz, Alexander, Hammer, Barbara

arXiv.org Artificial IntelligenceMar-27-2024

Over the last years, various sentence embedders have been an integral part in the success of current machine learning approaches to Natural Language Processing (NLP). Unfortunately, multiple sources have shown that the bias, inherent in the datasets upon which these embedding methods are trained, is learned by them. A variety of different approaches to remove biases in embeddings exists in the literature. Most of these approaches are applicable to word embeddings and in fewer cases to sentence embeddings. It is problematic that most debiasing approaches are directly transferred from word embeddings, therefore these approaches fail to take into account the nonlinear nature of sentence embedders and the embeddings they produce. It has been shown in literature that bias information is still present if sentence embeddings are debiased using such methods. In this contribution, we explore an approach to remove linear and nonlinear bias information for NLP solutions, without impacting downstream performance. We compare our approach to common debiasing methods on classical bias metrics and on bias metrics which take nonlinear information into account.

accuracy, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

doi: 10.5220/0011615300003411

2403.18555

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Targeted Visualization of the Backbone of Encoder LLMs

Roberts, Isaac, Schulz, Alexander, Hermes, Luca, Hammer, Barbara

arXiv.org Artificial IntelligenceMar-26-2024

Attention based Large Language Models (LLMs) are the state-of-the-art in natural language processing (NLP). The two most common architectures are encoders such as BERT, and decoders like the GPT models. Despite the success of encoder models, on which we focus in this work, they also bear several risks, including issues with bias or their susceptibility for adversarial attacks, signifying the necessity for explainable AI to detect such issues. While there does exist various local explainability methods focusing on the prediction of single inputs, global methods based on dimensionality reduction for classification inspection, which have emerged in other domains and that go further than just using t-SNE in the embedding space, are not widely spread in NLP. To reduce this gap, we investigate the application of DeepView, a method for visualizing a part of the decision function together with a data set in two dimensions, to the NLP domain. While in previous work, DeepView has been used to inspect deep image classification models, we demonstrate how to apply it to BERT-based NLP classifiers and investigate its usability in this domain, including settings with adversarially perturbed input samples and pre-trained, fine-tuned, and multi-task models.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2403.18872

Country: Europe (0.14)

Genre: Research Report (0.64)

Industry: Government (0.34)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Semantic Properties of cosine based bias scores for word embeddings

Schröder, Sarah, Schulz, Alexander, Hinder, Fabian, Hammer, Barbara

arXiv.org Artificial IntelligenceJan-27-2024

In the domain of Natural Language Processing (NLP), many works have investigated social biases in terms of associations in the embeddings space. Early works [1, 2] introduced methods to measure and mitigate social biases based on cosine similarity in word embeddigs. With NLP research progressing to large language models and contextualized embeddings, doubts have been raised whether these methods are still suitable for fairness evaluation [3] and other works criticize that for instance the Word Embedding Association Test (WEAT) [2] fails to detect some kinds of biases [4, 5]. Overall there exists a great deal of bias measures in the literature, which not necessarily detect the same biases [6, 4, 5]. In general, researchers are questioning the usability of model intrinsic bias measures, such as cosine based methods [7, 8, 9]. There exist few papers that compare the performance of different bias scores [10, 11] and works that evaluate experimental setups for bias measurement [12]. However, to our knowledge, only two works investigate the properties of intrinsic bias scores on a theoretical level [5, 13]. To further close this gap, we evaluate the semantic properties of cosine based bias scores, focusing on bias quantification as opposed to bias detection. We make the following contributions: (i) We formalize the properties of trustworthiness and comparability as requirements for cosine based bias scores.

artificial intelligence, large language model, natural language, (15 more...)

arXiv.org Artificial Intelligence

2401.15499

Country: Europe > Germany (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)

Add feedback

BERT WEAVER: Using WEight AVERaging to enable lifelong learning for transformer-based models in biomedical semantic search engines

Kühnel, Lisa, Schulz, Alexander, Hammer, Barbara, Fluck, Juliane

arXiv.org Artificial IntelligenceOct-31-2023

Recent developments in transfer learning have boosted the advancements in natural language processing tasks. The performance is, however, dependent on high-quality, manually annotated training data. Especially in the biomedical domain, it has been shown that one training corpus is not enough to learn generic models that are able to efficiently predict on new data. Therefore, in order to be used in real world applications state-of-the-art models need the ability of lifelong learning to improve performance as soon as new data are available - without the need of re-training the whole model from scratch. We present WEAVER, a simple, yet efficient post-processing method that infuses old knowledge into the new model, thereby reducing catastrophic forgetting. We show that applying WEAVER in a sequential manner results in similar word embedding distributions as doing a combined training on all data at once, while being computationally more efficient. Because there is no need of data sharing, the presented method is also easily applicable to federated learning settings and can for example be beneficial for the mining of electronic health records from different clinics.

information retrieval, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2202.10101

Country:

Europe > Germany > North Rhine-Westphalia > Cologne Region (0.14)
North America > United States > New Mexico (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre:

Instructional Material (1.00)
Research Report > Promising Solution (0.48)

Industry:

Education > Educational Setting > Continuing Education (0.71)
Health & Medicine > Health Care Technology > Medical Record (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

"Why Here and Not There?" -- Diverse Contrasting Explanations of Dimensionality Reduction

Artelt, André, Schulz, Alexander, Hammer, Barbara

arXiv.org Artificial IntelligenceFeb-22-2023

Some approaches [14], [15] aim to infer global feature importance for a given data Transparency of machine learning (ML) based system, projection. Another work [16] estimates feature importance applied in the real world, is nowadays a widely accepted locally for a vicinity around a projected data point, using requirement - the importance of transparency was also recognized locally linear models. A recent paper [17] proposes to use by the policy makers and therefore made its way local feature importance explanations by computing a local into legal regulations like the EU's GDPR [1]. A popular linear approximation for each reduced dimension, extracting way of achieving transparency is by means of explanations [2] feature importances from the weight vectors. Further, saliency which then gave rise to the field of eXplainable AI (XAI) [3], map approaches such as the layer-wise relevance propagation [4]. Although a lot of different explanation methodologies (LRP) [18] could in principle be applied to a parametric for ML based systems have been developed [2], [4], it is dimensionality reduction mapping in order to obtain locally important to realize that it is still somewhat unclear what relevant features. However, these approaches do not provide exactly makes up a good explanation [5], [6]. Therefore contrasting explanations, in which we are interested here.

artificial intelligence, explanation, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2206.07391

Country:

North America > United States (0.28)
Europe > Middle East > Cyprus (0.14)

Genre: Research Report (0.82)

Industry:

Law (0.68)
Information Technology > Security & Privacy (0.66)
Health & Medicine > Therapeutic Area (0.56)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.65)

Add feedback

Reservoir Memory Machines as Neural Computers

Paaßen, Benjamin, Schulz, Alexander, Stewart, Terrence C., Hammer, Barbara

arXiv.org Machine LearningSep-14-2020

Differentiable neural computers extend artificial neural networks with an explicit memory without interference, thus enabling the model to perform classic computation tasks such as graph traversal. However, such models are difficult to train, requiring long training times and large datasets. In this work, we achieve some of the computational capabilities of differentiable neural computers with a model that can be trained extremely efficiently, namely an echo state network with an explicit memory without interference. This extension raises the computation power of echo state networks from strictly less than finite state machines to strictly more than finite state machines. Further, we demonstrate experimentally that our model performs comparably to its fully-trained deep version on several typical benchmark tasks for differentiable neural computers.

artificial intelligence, moore machine, neural network, (16 more...)

arXiv.org Machine Learning

2009.06342

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

DeepView: Visualizing the behavior of deep neural networks in a part of the data space

Schulz, Alexander, Hinder, Fabian, Hammer, Barbara

arXiv.org Machine LearningSep-19-2019

Machine learning models using deep architectures have been able to implement increasingly powerful and successful models. However, they also become increasingly more complex, more difficult to comprehend and easier to fool. So far, mostly methods have been proposed to investigate the decision of the model for a single given input datum. In this paper, we propose to visualize a part of the decision function of a deep neural network together with a part of the data set in two dimensions with discriminative dimensionality reduction. This enables us to inspect how different properties of the data are treated by the model, such as multimodality, label noise or biased data. Further, the presented approach is complementary to the mentioned interpretation methods from the literature and hence might be even more useful in combination with those.

deep learning, neural network, visualization, (18 more...)

arXiv.org Machine Learning

1909.09154

Country: North America > United States > Louisiana (0.14)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback