These two new AI benchmarks could help make models less biased