Sifa, Rafet
Improving Natural Language Inference in Arabic using Transformer Models and Linguistically Informed Pre-Training
Deen, Mohammad Majd Saad Al, Pielka, Maren, Hees, Jörn, Abdou, Bouthaina Soulef, Sifa, Rafet
This paper addresses the classification of Arabic text data in the field of Natural Language Processing (NLP), with a particular focus on Natural Language Inference (NLI) and Contradiction Detection (CD). Arabic is considered a resource-poor language, meaning that there are few data sets available, which leads to limited availability of NLP methods. To overcome this limitation, we create a dedicated data set from publicly available resources. Subsequently, transformer-based machine learning models are being trained and evaluated. We find that a language-specific model (AraBERT) performs competitively with state-of-the-art multilingual approaches, when we apply linguistically informed pre-training methods such as Named Entity Recognition (NER). To our knowledge, this is the first large-scale evaluation for this task in Arabic, as well as the first application of multi-task pre-training in this context.
Word Sense Disambiguation as a Game of Neurosymbolic Darts
Dong, Tiansi, Sifa, Rafet
Word Sense Disambiguation (WSD) is one of the hardest tasks in natural language understanding and knowledge engineering. The glass ceiling of 80% F1 score is recently achieved through supervised deep-learning, enriched by a variety of knowledge graphs. Here, we propose a novel neurosymbolic methodology that is able to push the F1 score above 90%. The core of our methodology is a neurosymbolic sense embedding, in terms of a configuration of nested balls in n-dimensional space. The centre point of a ball well-preserves word embedding, which partially fix the locations of balls. Inclusion relations among balls precisely encode symbolic hypernym relations among senses, and enable simple logic deduction among sense embeddings, which cannot be realised before. We trained a Transformer to learn the mapping from a contextualized word embedding to its sense ball embedding, just like playing the game of darts (a game of shooting darts into a dartboard). A series of experiments are conducted by utilizing pre-training n-ball embeddings, which have the coverage of around 70% training data and 75% testing data in the benchmark WSD corpus. The F1 scores in experiments range from 90.1% to 100.0% in all six groups of test data-sets (each group has 4 testing data with different sizes of n-ball embeddings). Our novel neurosymbolic methodology has the potential to break the ceiling of deep-learning approaches for WSD. Limitations and extensions of our current works are listed.
sustain.AI: a Recommender System to analyze Sustainability Reports
Hillebrand, Lars, Pielka, Maren, Leonhard, David, Deußer, Tobias, Dilmaghani, Tim, Kliem, Bernd, Loitz, Rüdiger, Morad, Milad, Temath, Christian, Bell, Thiago, Stenzel, Robin, Sifa, Rafet
We present sustain.AI, an intelligent, context-aware recommender system that assists auditors and financial investors as well as the general public to efficiently analyze companies' sustainability reports. The tool leverages an end-to-end trainable architecture that couples a BERT-based encoding module with a multi-label classification head to match relevant text passages from sustainability reports to their respective law regulations from the Global Reporting Initiative (GRI) standards. We evaluate our model on two novel German sustainability reporting data sets and consistently achieve a significantly higher recommendation performance compared to multiple strong baselines. Furthermore, sustain.AI is publicly available Figure 1: A screenshot of the sustain.AI recommender tool.
Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization
Ramamurthy, Rajkumar, Ammanabrolu, Prithviraj, Brantley, Kianté, Hessel, Jack, Sifa, Rafet, Bauckhage, Christian, Hajishirzi, Hannaneh, Choi, Yejin
We tackle the problem of aligning pre-trained large language models (LMs) with human preferences. If we view text generation as a sequential decision-making problem, reinforcement learning (RL) appears to be a natural conceptual framework. However, using RL for LM-based generation faces empirical challenges, including training instability due to the combinatorial action space, as well as a lack of open-source libraries and benchmarks customized for LM alignment. Thus, a question rises in the research community: is RL a practical paradigm for NLP? To help answer this, we first introduce an open-source modular library, RL4LMs (Reinforcement Learning for Language Models), for optimizing language generators with RL. The library consists of on-policy RL algorithms that can be used to train any encoder or encoder-decoder LM in the HuggingFace library (Wolf et al. 2020) with an arbitrary reward function. Next, we present the GRUE (General Reinforced-language Understanding Evaluation) benchmark, a set of 6 language generation tasks which are supervised not by target strings, but by reward functions which capture automated measures of human preference. GRUE is the first leaderboard-style evaluation of RL algorithms for NLP tasks. Finally, we introduce an easy-to-use, performant RL algorithm, NLPO (Natural Language Policy Optimization) that learns to effectively reduce the combinatorial action space in language generation. We show 1) that RL techniques are generally better than supervised methods at aligning LMs to human preferences; and 2) that NLPO exhibits greater stability and performance than previous policy gradient methods (e.g., PPO (Schulman et al. 2017)), based on both automatic and human evaluations.
Towards Linguistically Informed Multi-Objective Pre-Training for Natural Language Inference
Pielka, Maren, Schmidt, Svetlana, Pucknat, Lisa, Sifa, Rafet
We introduce a linguistically enhanced combination of pre-training methods for transformers. The pre-training objectives include POS-tagging, synset prediction based on semantic knowledge graphs, and parent prediction based on dependency parse trees. Our approach achieves competitive results on the Natural Language Inference task, compared to the state of the art. Specifically for smaller models, the method results in a significant performance boost, emphasizing the fact that intelligent pre-training can make up for fewer parameters and help building more efficient models. Combining POS-tagging and synset prediction yields the overall best results.
Zero-Shot Text Matching for Automated Auditing using Sentence Transformers
Biesner, David, Pielka, Maren, Ramamurthy, Rajkumar, Dilmaghani, Tim, Kliem, Bernd, Loitz, Rüdiger, Sifa, Rafet
Natural language processing methods have several applications in automated auditing, including document or passage classification, information retrieval, and question answering. However, training such models requires a large amount of annotated data which is scarce in industrial settings. At the same time, techniques like zero-shot and unsupervised learning allow for application of models pre-trained using general domain data to unseen domains. In this work, we study the efficiency of unsupervised text matching using Sentence-Bert, a transformer-based model, by applying it to the semantic similarity of financial passages. Experimental results show that this model is robust to documents from in- and out-of-domain data.
KPI-EDGAR: A Novel Dataset and Accompanying Metric for Relation Extraction from Financial Documents
Deußer, Tobias, Ali, Syed Musharraf, Hillebrand, Lars, Nurchalifah, Desiana, Jacob, Basil, Bauckhage, Christian, Sifa, Rafet
We introduce KPI-EDGAR, a novel dataset for Joint Named Entity Recognition and Relation Extraction building on financial reports uploaded to the Electronic Data Gathering, Analysis, and Retrieval (EDGAR) system, where the main objective is to extract Key Performance Indicators (KPIs) from financial documents and link them to their numerical values and other attributes. We further provide four accompanying baselines for benchmarking potential future research. Additionally, we propose a new way of measuring the success of said extraction process by incorporating a word-level weighting scheme into the conventional F1 score to better model the inherently fuzzy borders of the entity pairs of a relation in this domain.
Generative Deep Learning Techniques for Password Generation
Biesner, David, Cvejoski, Kostadin, Georgiev, Bogdan, Sifa, Rafet, Krupicka, Erik
Password guessing approaches via deep learning have recently been investigated with significant breakthroughs in their ability to generate novel, realistic password candidates. In the present work we study a broad collection of deep learning and probabilistic based models in the light of password guessing: attention-based deep neural networks, autoencoding mechanisms and generative adversarial networks. We provide novel generative deep-learning models in terms of variational autoencoders exhibiting state-of-art sampling performance, yielding additional latent-space features such as interpolations and targeted sampling. Lastly, we perform a thorough empirical analysis in a unified controlled framework over well-known datasets (RockYou, LinkedIn, Youku, Zomato, Pwnd). Our results not only identify the most promising schemes driven by deep neural networks, but also illustrate the strengths of each approach in terms of generation variability and sample uniqueness.
NLPGym -- A toolkit for evaluating RL agents on Natural Language Processing Tasks
Ramamurthy, Rajkumar, Sifa, Rafet, Bauckhage, Christian
Reinforcement learning (RL) has recently shown impressive performance in complex game AI and robotics tasks. To a large extent, this is thanks to the availability of simulated environments such as OpenAI Gym, Atari Learning Environment, or Malmo which allow agents to learn complex tasks through interaction with virtual environments. While RL is also increasingly applied to natural language processing (NLP), there are no simulated textual environments available for researchers to apply and consistently benchmark RL on NLP tasks. With the work reported here, we therefore release NLPGym, an open-source Python toolkit that provides interactive textual environments for standard NLP tasks such as sequence tagging, multi-label classification, and question answering. We also present experimental results for 6 tasks using different RL algorithms which serve as baselines for further research. The toolkit is published at https://github.com/rajcscw/nlp-gym
Matrix and Tensor Factorization Based Game Content Recommender Systems: A Bottom-Up Architecture and a Comparative Online Evaluation
Sifa, Rafet (Fraunhofer IAIS) | Yawar, Raheel (Flying Sheep Studios) | Ramamurthy, Rajkumar (Fraunhofer IAIS) | Bauckhage, Christian (Fraunhofer IAIS)
Players of digital games face numerous choices as to what kind of games to play and what kind of game content or in-game activities to opt for. Among these, game content plays an important role in keeping players engaged so as to increase revenues for the gaming industry. However, while nowadays a lot of game content is generated using procedural content generation, automatically determining the kind of content that suits players' skills still poses challenges to game developers. Addressing this challenge, we present matrix- and tensor factorization based game content recommender systems for recommending quests in a single player role-playing game. We discuss the theory behind latent factor models for recommender systems and derive an algorithm for tensor factorizations to decompose collections of bipartite matrices. Extensive online bucket type tests reveal that our novel recommender system retained more players and recommended more engaging quests than handcrafted content-based and previous collaborative filtering approaches.