AITopics | Stollenwerk, Felix

Plotting

Stollenwerk, Felix

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The Mathematical Relationship Between Layer Normalization and Dynamic Activation Functions

Stollenwerk, Felix

arXiv.org Artificial IntelligenceMar-31-2025

A recent paper proposes Dynamic Tanh (DyT) as a drop-in replacement for layer normalization (LN). Although the method is empirically well-motivated and appealing from a practical point of view, it lacks a theoretical foundation. In this work, we shed light on the mathematical relationship between layer normalization and dynamic activation functions. In particular, we derive DyT from LN and show that a well-defined approximation is needed to do so. By dropping said approximation, an alternative activation function is obtained, which we call Dynamic Inverse Square Root Unit (DyISRU). DyISRU is the exact counterpart of layer normalization, and we demonstrate numerically that it indeed resembles LN more accurately than DyT does.

artificial intelligence, machine learning, normalization, (12 more...)

arXiv.org Artificial Intelligence

2503.21708

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Better Embeddings with Coupled Adam

Stollenwerk, Felix, Stollenwerk, Tobias

arXiv.org Artificial IntelligenceFeb-13-2025

Despite their remarkable capabilities, LLMs learn word representations that exhibit the undesirable yet poorly understood feature of anisotropy. In this paper, we argue that the second moment in Adam is a cause of anisotropic embeddings, and suggest a modified optimizer called Coupled Adam to mitigate the problem. Our experiments demonstrate that Coupled Adam significantly improves the quality of embeddings, while also leading to better upstream and downstream performance on large enough datasets.

artificial intelligence, better embedding

arXiv.org Artificial Intelligence

2502.08441

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence (0.53)

Add feedback

nerblackbox: A High-level Library for Named Entity Recognition in Python

Stollenwerk, Felix

arXiv.org Artificial IntelligenceDec-7-2023

We present nerblackbox, a python library to facilitate the use of state-of-the-art transformer-based models for named entity recognition. It provides simple-to-use yet powerful methods to access data and models from a wide range of sources, for fully automated model training and evaluation as well as versatile model inference. While many technical challenges are solved and hidden from the user by default, nerblackbox also offers fine-grained control and a rich set of customizable features. It is thus targeted both at application-oriented developers as well as machine learning experts and researchers.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2312.04306

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

Text Annotation Handbook: A Practical Guide for Machine Learning Projects

Stollenwerk, Felix, Öhman, Joey, Petrelli, Danila, Wallerö, Emma, Olsson, Fredrik, Bengtsson, Camilla, Horndahl, Andreas, Gandler, Gabriela Zarzar

arXiv.org Artificial IntelligenceOct-18-2023

This handbook is a hands-on guide on how to approach text annotation tasks. It provides a gentle introduction to the topic, an overview of theoretical concepts as well as practical advice. The topics covered are mostly technical, but business, ethical and regulatory issues are also touched upon. The focus lies on readability and conciseness rather than completeness and scientific rigor. Experience with annotation and knowledge of machine learning are useful but not required. The document may serve as a primer or reference book for a wide range of professions such as team leaders, project managers, IT architects, software developers and machine learning engineers.

artificial intelligence, machine learning project, text annotation handbook, (2 more...)

arXiv.org Artificial Intelligence

2310.1178

Genre:

Overview (0.53)
Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Annotated Job Ads with Named Entity Recognition

Stollenwerk, Felix, Fastlund, Niklas, Nyqvist, Anna, Öhman, Joey

arXiv.org Artificial IntelligenceOct-18-2023

We have trained a named entity recognition (NER) model that screens Swedish job ads for different kinds of useful information (e.g. skills required from a job seeker). It was obtained by fine-tuning KB-BERT. The biggest challenge we faced was the creation of a labelled dataset, which required manual annotation. This paper gives an overview of the methods we employed to make the annotation process more efficient and to ensure high quality data. We also report on the performance of the resulting model.

artificial intelligence, entity recognition, text processing, (2 more...)

arXiv.org Artificial Intelligence

2310.11769

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

Add feedback

GPT-SW3: An Autoregressive Language Model for the Nordic Languages

Ekgren, Ariel, Gyllensten, Amaru Cuba, Stollenwerk, Felix, Öhman, Joey, Isbister, Tim, Gogoulou, Evangelia, Carlsson, Fredrik, Heiman, Alice, Casademont, Judit, Sahlgren, Magnus

arXiv.org Artificial IntelligenceMay-23-2023

We have faced all of these challenges in our work on developing the first native LLM for the There is a growing interest in building and applying Nordic (or, more accurately, North Germanic) languages. Large Language Models (LLMs) for languages The LLM, which we call GPT-SW3, other than English. This interest has is a continuation of our previous Swedish-only been fuelled partly by the unprecedented popularity model (Ekgren et al., 2022), and is a collection of ChatGPT

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2305.12987

Country: Europe > Sweden (0.47)

Genre: Research Report (0.40)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

Training and Evaluation of a Multilingual Tokenizer for GPT-SW3

Stollenwerk, Felix

arXiv.org Artificial IntelligenceApr-28-2023

Generative language models are pre-trained on large amounts of raw text data. Virtually all language model architectures require the text data to be tokenized, which means that a text string is split into a sequence of tokens and subsequently mapped to a sequence of integers, see Figure 1. Figure 1: Text preprocessing for language models (simplified). The first step is referred to as tokenization, although sometimes both the first and second step are embraced by the same term. Note that the character which appears in the above example represents whitespace (more on this in Sec. 3). Modern subword tokenizers are designed such that frequently used words are not decomposed while rare words are split into meaningful tokens.

artificial intelligence, natural language, text processing, (16 more...)

arXiv.org Artificial Intelligence

2304.1478

Country: Europe (0.28)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.48)

Add feedback