Goto

Collaborating Authors

Actively Estimating Crowd Annotation Consensus

Journal of Artificial Intelligence Research

The rapid growth of storage capacity and processing power has caused machine learning applications to increasingly rely on using immense amounts of labeled data. It has become more important than ever to have fast and inexpensive ways to annotate vast amounts of data. With the emergence of crowdsourcing services, the research direction has gravitated toward putting the wisdom of crowds to better use. Unfortunately, spammers and inattentive annotators pose a threat to the quality and trustworthiness of the consensus. Thus, high quality consensus estimation from crowd annotated data requires a meticulous choice of the candidate annotator and the sample in need of a new annotation. Due to time and budget limitations, it is of utmost importance that this choice is carried out while the annotation collection is in progress. We call this process active crowd-labeling. To this end, we propose an active crowd-labeling approach for actively estimating consensus from continuous-valued crowd annotations. Our method is based on annotator models with unknown parameters, and Bayesian inference is employed to reach a consensus in the form of ordinal, binary, or continuous values. We introduce ranking functions for choosing the candidate annotator and sample pair for requesting an annotation. In addition, we propose a penalizing method for preventing annotator domination, investigate the explore-exploit trade-off for incorporating new annotators into the system, and study the effects of inducing a stopping criterion based on consensus quality. We also introduce the crowd-labeled Head Pose Annotations datasets. Experimental results on the benchmark datasets used in the literature and the Head Pose Annotations datasets suggest that our method provides high-quality consensus by using as few as one fifth of the annotations (~80% cost reduction), thereby providing a budget and time-sensitive solution to the crowd-labeling problem.


Weston

AAAI Conferences

Image annotation datasets are becoming larger and larger, with tens of millions of images and tens of thousands of possible annotations. We propose a strongly performing method that scales to such datasets by simultaneously learning to optimize precision at the top of the ranked list of annotations for a given image and learning a low-dimensional joint embedding space for both images and annotations. Our method, called Wsabie, both outperforms several baseline methods and is faster and consumes less memory.


jin-s13/COCO-WholeBody

#artificialintelligence

This is the official repo for ECCV2020 paper "Whole-Body Human Pose Estimation in the Wild". The repo contains COCO-WholeBody annotations proposed in this paper. COCO-WholeBody dataset is the first large-scale benchmark for whole-body pose estimation. It is an extension of COCO 2017 dataset with the same train/val split as COCO. Here is an example of one annotated image.


Actively Estimating Crowd Annotation Consensus

Journal of Artificial Intelligence Research

The rapid growth of storage capacity and processing power has caused machine learning applications to increasingly rely on using immense amounts of labeled data. It has become more important than ever to have fast and inexpensive ways to annotate vast amounts of data. With the emergence of crowdsourcing services, the research direction has gravitated toward putting the wisdom of crowds to better use. Unfortunately, spammers and inattentive annotators pose a threat to the quality and trustworthiness of the consensus. Thus, high quality consensus estimation from crowd annotated data requires a meticulous choice of the candidate annotator and the sample in need of a new annotation.


Releasing the World's Largest Street-level Imagery Dataset for Teaching Machines to See

#artificialintelligence

Today we present the Mapillary Vistas Dataset--the world's largest and most diverse publicly available, pixel-accurately and instance-specifically annotated street-level imagery dataset for empowering autonomous mobility and transport at the global scale. Since we started our expedition to collaboratively visualize the world with street-level images, we have collected more than 130 million images from places all around the globe. While this number keeps growing at a frantic pace, we are putting serious efforts into researching, implementing, publishing, and releasing smarter computer vision models that can help us understand the semantics within this data. As stated in an earlier blog post, we keep advancing supervised deep learning models as our primary workhorses to extract information that is valuable to our community and improve our products. However, such models are inherently hungry for data--and in particular for a lot of precisely annotated data.