In Table A, we repeat our experiments on 5000 test examples for each dataset (or the