Not enough data to create a plot.
Try a different view from the menu above.
Stollenwerk, Felix
The Mathematical Relationship Between Layer Normalization and Dynamic Activation Functions
Stollenwerk, Felix
A recent paper proposes Dynamic Tanh (DyT) as a drop-in replacement for layer normalization (LN). Although the method is empirically well-motivated and appealing from a practical point of view, it lacks a theoretical foundation. In this work, we shed light on the mathematical relationship between layer normalization and dynamic activation functions. In particular, we derive DyT from LN and show that a well-defined approximation is needed to do so. By dropping said approximation, an alternative activation function is obtained, which we call Dynamic Inverse Square Root Unit (DyISRU). DyISRU is the exact counterpart of layer normalization, and we demonstrate numerically that it indeed resembles LN more accurately than DyT does.
Better Embeddings with Coupled Adam
Stollenwerk, Felix, Stollenwerk, Tobias
Despite their remarkable capabilities, LLMs learn word representations that exhibit the undesirable yet poorly understood feature of anisotropy. In this paper, we argue that the second moment in Adam is a cause of anisotropic embeddings, and suggest a modified optimizer called Coupled Adam to mitigate the problem. Our experiments demonstrate that Coupled Adam significantly improves the quality of embeddings, while also leading to better upstream and downstream performance on large enough datasets.
nerblackbox: A High-level Library for Named Entity Recognition in Python
Stollenwerk, Felix
We present nerblackbox, a python library to facilitate the use of state-of-the-art transformer-based models for named entity recognition. It provides simple-to-use yet powerful methods to access data and models from a wide range of sources, for fully automated model training and evaluation as well as versatile model inference. While many technical challenges are solved and hidden from the user by default, nerblackbox also offers fine-grained control and a rich set of customizable features. It is thus targeted both at application-oriented developers as well as machine learning experts and researchers.
Text Annotation Handbook: A Practical Guide for Machine Learning Projects
Stollenwerk, Felix, รhman, Joey, Petrelli, Danila, Wallerรถ, Emma, Olsson, Fredrik, Bengtsson, Camilla, Horndahl, Andreas, Gandler, Gabriela Zarzar
This handbook is a hands-on guide on how to approach text annotation tasks. It provides a gentle introduction to the topic, an overview of theoretical concepts as well as practical advice. The topics covered are mostly technical, but business, ethical and regulatory issues are also touched upon. The focus lies on readability and conciseness rather than completeness and scientific rigor. Experience with annotation and knowledge of machine learning are useful but not required. The document may serve as a primer or reference book for a wide range of professions such as team leaders, project managers, IT architects, software developers and machine learning engineers.
Annotated Job Ads with Named Entity Recognition
Stollenwerk, Felix, Fastlund, Niklas, Nyqvist, Anna, รhman, Joey
We have trained a named entity recognition (NER) model that screens Swedish job ads for different kinds of useful information (e.g. skills required from a job seeker). It was obtained by fine-tuning KB-BERT. The biggest challenge we faced was the creation of a labelled dataset, which required manual annotation. This paper gives an overview of the methods we employed to make the annotation process more efficient and to ensure high quality data. We also report on the performance of the resulting model.
GPT-SW3: An Autoregressive Language Model for the Nordic Languages
Ekgren, Ariel, Gyllensten, Amaru Cuba, Stollenwerk, Felix, รhman, Joey, Isbister, Tim, Gogoulou, Evangelia, Carlsson, Fredrik, Heiman, Alice, Casademont, Judit, Sahlgren, Magnus
We have faced all of these challenges in our work on developing the first native LLM for the There is a growing interest in building and applying Nordic (or, more accurately, North Germanic) languages. Large Language Models (LLMs) for languages The LLM, which we call GPT-SW3, other than English. This interest has is a continuation of our previous Swedish-only been fuelled partly by the unprecedented popularity model (Ekgren et al., 2022), and is a collection of ChatGPT
Training and Evaluation of a Multilingual Tokenizer for GPT-SW3
Stollenwerk, Felix
Generative language models are pre-trained on large amounts of raw text data. Virtually all language model architectures require the text data to be tokenized, which means that a text string is split into a sequence of tokens and subsequently mapped to a sequence of integers, see Figure 1. Figure 1: Text preprocessing for language models (simplified). The first step is referred to as tokenization, although sometimes both the first and second step are embraced by the same term. Note that the character which appears in the above example represents whitespace (more on this in Sec. 3). Modern subword tokenizers are designed such that frequently used words are not decomposed while rare words are split into meaningful tokens.