Not enough data to create a plot.
Try a different view from the menu above.
Heumann, Christian
Modern Models, Medieval Texts: A POS Tagging Study of Old Occitan
Schöffel, Matthias, Wiedner, Marinus, Arias, Esteban Garces, Ruppert, Paula, Heumann, Christian, Aßenmacher, Matthias
Large language models (LLMs) have demonstrated remarkable capabilities in natural language processing, yet their effectiveness in handling historical languages remains largely unexplored. This study examines the performance of open-source LLMs in part-of-speech (POS) tagging for Old Occitan, a historical language characterized by non-standardized orthography and significant diachronic variation. Through comparative analysis of two distinct corpora-hagiographical and medical texts-we evaluate how current models handle the inherent challenges of processing a low-resource historical language. Our findings demonstrate critical limitations in LLM performance when confronted with extreme orthographic and syntactic variability. We provide detailed error analysis and specific recommendations for improving model performance in historical language processing. This research advances our understanding of LLM capabilities in challenging linguistic contexts while offering practical insights for both computational linguistics and historical language studies.
Decoding Decoded: Understanding Hyperparameter Effects in Open-Ended Text Generation
Arias, Esteban Garces, Li, Meimingwei, Heumann, Christian, Aßenmacher, Matthias
Decoding strategies for generative large language models (LLMs) are a critical but often underexplored aspect of text generation tasks. Guided by specific hyperparameters, these strategies aim to transform the raw probability distributions produced by language models into coherent, fluent text. In this study, we undertake a large-scale empirical assessment of a range of decoding methods, open-source LLMs, textual domains, and evaluation protocols to determine how hyperparameter choices shape the outputs. Our experiments include both factual (e.g., news) and creative (e.g., fiction) domains, and incorporate a broad suite of automatic evaluation metrics alongside human judgments. Through extensive sensitivity analyses, we distill practical recommendations for selecting and tuning hyperparameters, noting that optimal configurations vary across models and tasks. By synthesizing these insights, this study provides actionable guidance for refining decoding strategies, enabling researchers and practitioners to achieve higher-quality, more reliable, and context-appropriate text generation outcomes.
Towards Better Open-Ended Text Generation: A Multicriteria Evaluation Framework
Arias, Esteban Garces, Blocher, Hannah, Rodemann, Julian, Li, Meimingwei, Heumann, Christian, Aßenmacher, Matthias
Open-ended text generation has become a prominent task in natural language processing due to the rise of powerful (large) language models. However, evaluating the quality of these models and the employed decoding strategies remains challenging because of trade-offs among widely used metrics such as coherence, diversity, and perplexity. Decoding methods often excel in some metrics while underperforming in others, complicating the establishment of a clear ranking. In this paper, we present novel ranking strategies within this multicriteria framework. Specifically, we employ benchmarking approaches based on partial orderings and present a new summary metric designed to balance existing automatic indicators, providing a more holistic evaluation of text generation quality. Furthermore, we discuss the alignment of these approaches with human judgments. Our experiments demonstrate that the proposed methods offer a robust way to compare decoding strategies, exhibit similarities with human preferences, and serve as valuable tools in guiding model selection for open-ended text generation tasks. Finally, we suggest future directions for improving evaluation methodologies in text generation. Our codebase, datasets, and models are publicly available.
Introducing sgboost: A Practical Guide and Implementation of sparse-group boosting in R
Obster, Fabian, Heumann, Christian
This paper introduces the sgboost package in R, which implements sparse-group boosting for modeling high-dimensional data with natural groupings in covariates. Sparse-group boosting offers a flexible approach for both group and individual variable selection, reducing overfitting and enhancing model interpretability. The package uses regularization techniques based on the degrees of freedom of individual and group base-learners, and is designed to be used in conjunction with the mboost package. Through comparisons with existing methods and demonstration of its unique functionalities, this paper provides a practical guide on utilizing sparse-group boosting in R, accompanied by code examples to facilitate its application in various research domains. Overall, this paper serves as a valuable resource for researchers and practitioners seeking to use sparse-group boosting for efficient and interpretable high-dimensional data analysis.
Variational Approach for Efficient KL Divergence Estimation in Dirichlet Mixture Models
Pal, Samyajoy, Heumann, Christian
This study tackles the efficient estimation of Kullback-Leibler (KL) Divergence in Dirichlet Mixture Models (DMM), crucial for clustering compositional data. Despite the significance of DMMs, obtaining an analytically tractable solution for KL Divergence has proven elusive. Past approaches relied on computationally demanding Monte Carlo methods, motivating our introduction of a novel variational approach. Our method offers a closed-form solution, significantly enhancing computational efficiency for swift model comparisons and robust estimation evaluations. Validation using real and simulated data showcases its superior efficiency and accuracy over traditional Monte Carlo-based methods, opening new avenues for rapid exploration of diverse DMM models and advancing statistical analyses of compositional data.
Position Paper: Bridging the Gap Between Machine Learning and Sensitivity Analysis
Scholbeck, Christian A., Moosbauer, Julia, Casalicchio, Giuseppe, Gupta, Hoshin, Bischl, Bernd, Heumann, Christian
We argue that interpretations of machine learning (ML) models or the model-building process can bee seen as a form of sensitivity analysis (SA), a general methodology used to explain complex systems in many fields such as environmental modeling, engineering, or economics. We address both researchers and practitioners, calling attention to the benefits of a unified SA-based view of explanations in ML and the necessity to fully credit related work. We bridge the gap between both fields by formally describing how (a) the ML process is a system suitable for SA, (b) how existing ML interpretation methods relate to this perspective, and (c) how other SA techniques could be applied to ML.
fmeffects: An R Package for Forward Marginal Effects
Löwe, Holger, Scholbeck, Christian A., Heumann, Christian, Bischl, Bernd, Casalicchio, Giuseppe
Forward marginal effects (FMEs) (Scholbeck et al., 2022) provide simple yet accurate local modelagnostic explanations in terms of forward differences in prediction. They address questions of the form: If we change x by an amount h, what is the change in predicted outcome ŷ? For instance, given a medical study where a model is trained to predict a patient's disease risk, FMEs can tell us each patient's individual change in predicted risk due to losing 5kg in body weight. FMEs thus provide actionable and comprehensible advice for stakeholders, including ones without expertise in machine learning. If the change in predicted risk is substantial enough, doctors may recommend a tailored exercise and nutrition regimen.
How Prevalent is Gender Bias in ChatGPT? -- Exploring German and English ChatGPT Responses
Urchs, Stefanie, Thurner, Veronika, Aßenmacher, Matthias, Heumann, Christian, Thiemichen, Stephanie
With the introduction of ChatGPT, OpenAI made large language models (LLM) accessible to users with limited IT expertise. However, users with no background in natural language processing (NLP) might lack a proper understanding of LLMs. Thus the awareness of their inherent limitations, and therefore will take the systems' output at face value. In this paper, we systematically analyse prompts and the generated responses to identify possible problematic issues with a special focus on gender biases, which users need to be aware of when processing the system's output. We explore how ChatGPT reacts in English and German if prompted to answer from a female, male, or neutral perspective. In an in-depth investigation, we examine selected prompts and analyse to what extent responses differ if the system is prompted several times in an identical way. On this basis, we show that ChatGPT is indeed useful for helping non-IT users draft texts for their daily work. However, it is absolutely crucial to thoroughly check the system's responses for biases as well as for syntactic and grammatical mistakes.
A tailored Handwritten-Text-Recognition System for Medieval Latin
Koch, Philipp, Nuñez, Gilary Vera, Arias, Esteban Garces, Heumann, Christian, Schöffel, Matthias, Häberlin, Alexander, Aßenmacher, Matthias
The Bavarian Academy of Sciences and Humanities aims to digitize its Medieval Latin Dictionary. This dictionary entails record cards referring to lemmas in medieval Latin, a low-resource language. A crucial step of the digitization process is the Handwritten Text Recognition (HTR) of the handwritten lemmas found on these record cards. In our work, we introduce an end-to-end pipeline, tailored to the medieval Latin dictionary, for locating, extracting, and transcribing the lemmas. We employ two state-of-the-art (SOTA) image segmentation models to prepare the initial data set for the HTR task. Furthermore, we experiment with different transformer-based models and conduct a set of experiments to explore the capabilities of different combinations of vision encoders with a GPT-2 decoder. Additionally, we also apply extensive data augmentation resulting in a highly competitive model. The best-performing setup achieved a Character Error Rate (CER) of 0.015, which is even superior to the commercial Google Cloud Vision model, and shows more stable performance.
Classifying multilingual party manifestos: Domain transfer across country, time, and genre
Aßenmacher, Matthias, Sauter, Nadja, Heumann, Christian
Annotating costs of large corpora are still one of the main bottlenecks in empirical social science research. On the one hand, making use of the capabilities of domain transfer allows re-using annotated data sets and trained models. On the other hand, it is not clear how well domain transfer works and how reliable the results are for transfer across different dimensions. We explore the potential of domain transfer across geographical locations, languages, time, and genre in a large-scale database of political manifestos. First, we show the strong within-domain classification performance of fine-tuned transformer models. Second, we vary the genre of the test set across the aforementioned dimensions to test for the fine-tuned models' robustness and transferability. For switching genres, we use an external corpus of transcribed speeches from New Zealand politicians while for the other three dimensions, custom splits of the Manifesto database are used. While BERT achieves the best scores in the initial experiments across modalities, DistilBERT proves to be competitive at a lower computational expense and is thus used for further experiments across time and country. The results of the additional analysis show that (Distil)BERT can be applied to future data with similar performance. Moreover, we observe (partly) notable differences between the political manifestos of different countries of origin, even if these countries share a language or a cultural background.