AI-assisted German Employment Contract Review: A Benchmark Dataset

Wardas, Oliver, Matthes, Florian

arXiv.org Artificial Intelligence 

Despite an increasing academic interest in Legal NLP research over the last years, AI-assisted contract review, especially in languages other than English, has received little attention [KATZ 2023]. One major hurdle for that may be the scarcity of sufficient, annotated training data. Semantic annotations of legal texts can only be done by legal experts, resulting in high costs and a scarcity of publicly available datasets. The situation worsens when legal texts, such as employment contracts, include sensitive personal information. A partnership with a German law firm specializing in Economic Law now enables us to conduct more research in this area. As part of a collaborative project, we aim to design, implement, and evaluate a prototypical AIbased system for assisting in the review and correction of German employment contracts. To initiate our research efforts and encourage further investigations and experiments by other researchers, we release an anonymized and annotated dataset of clauses from German employment contracts (License: CC BY-NC 4.0), along with their respective legality and categorization labels. Additionally, we provide benchmarks for both open-and closed-source baseline models.