Benchmarking Defeasible Reasoning with Large Language Models -- Initial Experiments and Future Directions

Open in new window