These new AI benchmarks could help make models less biased

MIT Technology Review 

"When we are focused on treating everybody exactly the same, it can be overly stringent," says Angelina Wang, a postdoc at the Stanford Institute for Human-Centered AI and RegLab, who is the lead author of the paper. "It's forcing people to be treated the same even when there are legitimate differences." Ignoring differences between groups may in fact make AI systems less fair. "Sometimes being able to differentiate between groups is actually useful to treat the people from different groups more fairly," says Isabelle Augenstein, a computer science professor at the University of Copenhagen, who was not involved in the research. Wang and her colleagues created eight new benchmarks to evaluate AI systems along two different dimensions that the team devised: descriptive and normative.