Goto

Collaborating Authors

 russakovsky


Men Also Do Laundry: Multi-Attribute Bias Amplification

Zhao, Dora, Andrews, Jerone T. A., Xiang, Alice

arXiv.org Artificial Intelligence

As computer vision systems become more widely deployed, there is increasing concern from both the research community and the public that these systems are not only reproducing but amplifying harmful social biases. The phenomenon of bias amplification, which is the focus of this work, refers to models amplifying inherent training set biases at test time. Existing metrics measure bias amplification with respect to single annotated attributes (e.g., $\texttt{computer}$). However, several visual datasets consist of images with multiple attribute annotations. We show models can learn to exploit correlations with respect to multiple attributes (e.g., {$\texttt{computer}$, $\texttt{keyboard}$}), which are not accounted for by current metrics. In addition, we show current metrics can give the erroneous impression that minimal or no bias amplification has occurred as they involve aggregating over positive and negative values. Further, these metrics lack a clear desired value, making them difficult to interpret. To address these shortcomings, we propose a new metric: Multi-Attribute Bias Amplification. We validate our proposed metric through an analysis of gender bias amplification on the COCO and imSitu datasets. Finally, we benchmark bias mitigation methods using our proposed metric, suggesting possible avenues for future bias mitigation


'Learning to see and learning to read': Artificial intelligence enters a new era

#artificialintelligence

For artificial intelligence to realize its potential -- to relieve humans from mundane tasks, make life easier, and eventually invent entirely new solutions to our problems -- computers will need to surpass us at two things that we humans do pretty well: see the world around us and understand our language. "Learning to see and learning to read are the two main things we need for the computer to do to gain knowledge," said Jen Rexford, chair of Princeton's computer science department and the Gordon Y.S. Wu Professor in Engineering. "We call these fields computer vision and natural language processing. These two fields have evolved independently but our faculty are bringing them together in interesting ways." In recent years, researchers at Princeton and beyond have made major strides in these two fields, opening up rapid progress across a variety of applications.


Competition Makes Big Datasets the Winners

Communications of the ACM

If there is one dataset that has become practically synonymous with deep learning, it is ImageNet. So much so that dataset creators routinely tout their offerings as "the ImageNet of …" for everything from chunks of software source code, as in IBM's Project CodeNet, to MusicNet, the University of Washington's collection of labelled music recordings. The main aim of the team at Stanford University that created ImageNet was scale. The researchers recognized the tendency of machine learning models at that time to overfit relatively small training datasets, limiting their ability to handle real-world inputs well. Crowdsourcing the job by recruiting recruiting casual workers from Amazon's Mechanical Turk website delivered a much larger dataset.


Trouble at the Source

Communications of the ACM

Machine learning (ML), systems, especially deep neural networks, can find subtle patterns in large datasets that give them powerful capabilities in image classification, speech recognition, natural-language processing, and other tasks. Despite this power--or rather because of it--these systems can be led astray by hidden regularities in the datasets used to train them. Issues occur when the training data contains systematic flaws due to the origin of the data or the biases of those preparing it. Another hazard is "over-fitting," in which a model predicts the limited training data well, but errs when presented with new data, either similar test data or the less-controlled examples encountered in the real world. This discrepancy resembles the well-known statistical issue in which clinical trial data has high "internal validity" on carefully selected subjects, but may have lower "external validity" for real patients.


Researchers Blur Faces That Launched a Thousand Algorithms

WIRED

In 2012, artificial intelligence researchers engineered a big leap in computer vision thanks, in part, to an unusually large set of images--thousands of everyday objects, people, and scenes in photos that were scraped from the web and labeled by hand. That data set, known as ImageNet, is still used in thousands of AI research projects and experiments today. But last week every human face included in ImageNet suddenly disappeared--after the researchers who manage the data set decided to blur them. Just as ImageNet helped usher in a new age of AI, efforts to fix it reflect challenges that affect countless AI programs, data sets, and products. "We were concerned about the issue of privacy," says Olga Russakovsky, an assistant professor at Princeton University and one of those responsible for managing ImageNet.


Facebook's new AI teaches itself to see with less human help

#artificialintelligence

Most artificial intelligence is still built on a foundation of human toil. Peer inside an AI algorithm and you'll find something constructed using data that was curated and labeled by an army of human workers. Now, Facebook has shown how some AI algorithms can learn to do useful work with far less human help. The company built an algorithm that learned to recognize objects in images with little help from labels. The Facebook algorithm, called Seer (for SElf-supERvised), fed on more than a billion images scraped from Instagram, deciding for itself which objects look alike. Images with whiskers, fur, and pointy ears, for example, were collected into one pile.


Facebook's New AI Teaches Itself to See With Less Human Help

WIRED

Most artificial intelligence is still built on a foundation of human toil. Peer inside an AI algorithm and you'll find something constructed using data that was curated and labeled by an army of human workers. Now, Facebook has shown how some AI algorithms can learn to do useful work with far less human help. The company built an algorithm that learned to recognize objects in images with little help from labels. The Facebook algorithm, called Seer (for SElf-supERvised), fed on more than a billion images scraped from Instagram, deciding for itself which objects look alike.


When AI sees a man, it thinks "official." A woman? "Smile"

#artificialintelligence

Turns out, computers do too. When US and European researchers fed pictures of members of Congress to Google's cloud image recognition service, the service applied three times as many annotations related to physical appearance to photos of women as it did to men. The top labels applied to men were "official" and "businessperson"; for women they were "smile" and "chin." The researchers administered their machine vision test to Google's artificial intelligence image service and those of rivals Amazon and Microsoft. Crowdworkers were paid to review the annotations those services applied to official photos of lawmakers and images those lawmakers tweeted.


Tool helps clear biases from computer vision

#artificialintelligence

Researchers at Princeton University have developed a tool that flags potential biases in sets of images used to train artificial intelligence (AI) systems. The work is part of a larger effort to remedy and prevent the biases that have crept into AI systems that influence everything from credit services to courtroom sentencing programs. Although the sources of bias in AI systems are varied, one major cause is stereotypical images contained in large sets of images collected from online sources that engineers use to develop computer vision, a branch of AI that allows computers to recognize people, objects and actions. Because the foundation of computer vision is built on these data sets, images that reflect societal stereotypes and biases can unintentionally influence computer vision models. To help stem this problem at its source, researchers in the Princeton Visual AI Lab have developed an open-source tool that automatically uncovers potential biases in visual data sets.


Tool helps clear biases from computer vision

#artificialintelligence

Although the sources of bias in AI systems are varied, one major cause is stereotypical images contained in large sets of images collected from online sources that engineers use to develop computer vision, a branch of AI that allows computers to recognize people, objects and actions. Because the foundation of computer vision is built on these data sets, images that reflect societal stereotypes and biases can unintentionally influence computer vision models. To help stem this problem at its source, researchers in the Princeton Visual AI Lab have developed an open-source tool that automatically uncovers potential biases in visual data sets. The tool allows data set creators and users to correct issues of underrepresentation or stereotypical portrayals before image collections are used to train computer vision models. In related work, members of the Visual AI Lab published a comparison of existing methods for preventing biases in computer vision models themselves, and proposed a new, more effective approach to bias mitigation.