Parameterized Argumentation-based Reasoning Tasks for Benchmarking Generative Language Models