On the logical skills of large language models: evaluations using arbitrarily complex first-order logic problems