A Appendix 1 A.1 Benchmarks

Neural Information Processing Systems 

All hyperparameters used to create the first version of benchmark are presented in Table 2. 2 Table 2: Hyperparameters for finetuning the language models.Max. Table 3: Accuracy performance of evaluated models on the test subsets. Additionally, we indicate datasets previously appeared in the KLEJ benchmark with *. HerBERT (base, cased) HerBERT (large, cased) PolBERT (base, cased) PolBERT (base, uncased) XLM-RoBERTa (paraphrase) CDSC-E* 94. 02 0. 33 93 . Additionally, we indicate datasets previously appeared in the KLEJ benchmark with *.