Leonhard, David
Controlled Randomness Improves the Performance of Transformer Models
Deußer, Tobias, Zhao, Cong, Krämer, Wolfgang, Leonhard, David, Bauckhage, Christian, Sifa, Rafet
The emergence of pre-trained transformer models brought a massive breakthrough in the field of natural language processing. During pre-training, such transformer models can learn generic language representations with strong generalization capabilities by applying a self-supervised learning approach and leveraging large text corpora. These pretrained language models can be fine-tuned in various downstream tasks without needing to train from scratch compared to traditional training methods, significantly reducing training costs while achieving excellent performance. Models like BERT Devlin et al. (2019), ELECTRA Clark et al. (2020), or T5 Raffel et al. (2020) have achieved remarkable results on several language processing tasks and the most recent developments of even larger language models, made prominent by GPT-3 Brown et al. (2020) and GPT-4 OpenAI (2023) but not limited to these two
Improving Zero-Shot Text Matching for Financial Auditing with Large Language Models
Hillebrand, Lars, Berger, Armin, Deußer, Tobias, Dilmaghani, Tim, Khaled, Mohamed, Kliem, Bernd, Loitz, Rüdiger, Pielka, Maren, Leonhard, David, Bauckhage, Christian, Sifa, Rafet
Auditing financial documents is a very tedious and time-consuming process. As of today, it can already be simplified by employing AI-based solutions to recommend relevant text passages from a report for each legal requirement of rigorous accounting standards. However, these methods need to be fine-tuned regularly, and they require abundant annotated data, which is often lacking in industrial environments. Hence, we present ZeroShotALI, a novel recommender system that leverages a state-of-the-art large language model (LLM) in conjunction with a domain-specifically optimized transformer-based text-matching solution. We find that a two-step approach of first retrieving a number of best matching document sections per legal requirement with a custom BERT-based model and second filtering these selections using an LLM yields significant performance improvements over existing approaches.
sustain.AI: a Recommender System to analyze Sustainability Reports
Hillebrand, Lars, Pielka, Maren, Leonhard, David, Deußer, Tobias, Dilmaghani, Tim, Kliem, Bernd, Loitz, Rüdiger, Morad, Milad, Temath, Christian, Bell, Thiago, Stenzel, Robin, Sifa, Rafet
We present sustain.AI, an intelligent, context-aware recommender system that assists auditors and financial investors as well as the general public to efficiently analyze companies' sustainability reports. The tool leverages an end-to-end trainable architecture that couples a BERT-based encoding module with a multi-label classification head to match relevant text passages from sustainability reports to their respective law regulations from the Global Reporting Initiative (GRI) standards. We evaluate our model on two novel German sustainability reporting data sets and consistently achieve a significantly higher recommendation performance compared to multiple strong baselines. Furthermore, sustain.AI is publicly available Figure 1: A screenshot of the sustain.AI recommender tool.