The Limitations of Standardized Science Tests as Benchmarks for Artificial Intelligence Research: Position Paper