Goto

Collaborating Authors

 Crowdsourcing


Erin Brockovich launches a crowdsourced AI data center map

Engadget

Most of the reports so far came from Texas. Erin Brockovich, the American environmental activist portrayed by Julia Roberts in the film named after her, has launched a new project that aims to give people a platform to speak up and voice concerns about AI data centers in their communities. The new Brockovich AI Data Center Reporting website centers on a map showing major operational AI data centers and facilities under construction in the US, along with projects reported by the community. Some of the reports could be for rumored or proposed projects, so not every dot on the map represents a data center that's already running. The website has received 2,716 reports so far, with the biggest chunk coming from Texas.


This viral Dutch Fish Doorbell is peak internet

PCWorld

When you purchase through links in our articles, we may earn a small commission. The Dutch Fish Doorbell mixes livestreams, crowdsourcing, and conservation in all of the best ways. Every spring in the Dutch city of Utrecht, thousands of fish attempt to migrate through the city's canals to reach spawning grounds, but locked flood gates stay shut for long stretches to manage water levels. So the city came up with a weirdly charming solution: a fish doorbell. The site, called Visdeurbel --or Fish Doorbell--lets anyone in the world help the fish out.


SatBird: Bird Species Distribution Modeling with Remote Sensing and Citizen Science Data

Neural Information Processing Systems

Biodiversity is declining at an unprecedented rate, impacting ecosystem services necessary to ensure food, water, and human health and well-being. Understanding the distribution of species and their habitats is crucial for conservation policy planning. However, traditional methods in ecology for species distribution models (SDMs) generally focus either on narrow sets of species or narrow geographical areas and there remain significant knowledge gaps about the distribution of species. A major reason for this is the limited availability of data traditionally used, due to the prohibitive amount of effort and expertise required for traditional field monitoring. The wide availability of remote sensing data and the growing adoption of citizen science tools to collect species observations data at low cost offer an opportunity for improving biodiversity monitoring and enabling the modelling of complex ecosystems. We introduce a novel task for mapping bird species to their habitats by predicting species encounter rates from satellite images, and present SatBird1, a satellite dataset of locations in the USA with labels derived from presence-absence observation data from the citizen science database eBird, considering summer (breeding) and winter seasons. We also provide a dataset in Kenya representing low-data regimes. We additionally provide environmental data and species range maps for each location.


Triple Eagle: Simple, Fast and Practical Budget-Feasible Mechanisms

Neural Information Processing Systems

We revisit the classical problem of designing Budget-Feasible Mechanisms (BFMs) for submodular valuation functions, which has been extensively studied since the seminal paper of Singer [FOCS'10] due to its wide applications in crowdsourcing and social marketing. We propose TripleEagle, a novel algorithmic framework for designing BFMs, based on which we present several simple yet effective BFMs that achieve better approximation ratios than the state-of-the-art work for both monotone and non-monotone submodular valuation functions. Moreover, our BFMs are the first in the literature to achieve linear complexities while ensuring obvious strategyproofness, making them more practical than the previous BFMs. We conduct extensive experiments to evaluate the empirical performance of our BFMs, and the experimental results strongly demonstrate the efficiency and effectiveness of our approach.


experiments

Neural Information Processing Systems

A.1 Experimental design Figure 1 summarizes the experimental design used for our experiments. The participants that went through our experiments are users from the online platform Amazon Mechanical Turk (AMT). Through this platform, users stay anonymous, hence, we do not collect any sensitive personal information about them. We prioritized users with a Master qualification (which is a qualification attributed by AMT to users who have proven to be of excellent quality) or normal users with high qualifications (number of HIT completed = 10000and HIT accepted > 98%). Before going through the experiment, participants are asked to read and agree to a consent form, which specifies: the objective and procedure of the experiment, as well as the time expected to completion ( 5 - 8 min) with the reward associated ($1.4), and finally, the risk, benefits, and confidentiality of taking part in this study.





Avoiding Imposters and Delinquents: Adversarial Crowdsourcing and Peer Prediction

Neural Information Processing Systems

We consider a crowdsourcing model in which nworkers are asked to rate the quality of nitems previously generated by other workers. An unknown set of αnworkers generate reliable ratings, while the remaining workers may behave arbitrarily and possibly adversarially. The manager of the experiment can also manually evaluate the quality of a small number of items, and wishes to curate together almost all of the high-quality items with at most anfraction of low-quality items.


Noisy Label Learning with Instance-Dependent Outliers: Identifiability via Crowd Wisdom

Neural Information Processing Systems

The generation of label noise is often modeled as a process involving a probability transition matrix (also interpreted as the) imposed onto the label distribution. Under this model, learning the ``ground-truth classifier''---i.e., the classifier that can be learned if no noise was present---and the confusion matrix boils down to a model identification problem. Prior works along this line demonstrated appealing empirical performance, yet identifiability of the model was mostly established by assuming an instance-invariant confusion matrix. Having an (occasionally) instance-dependent confusion matrix across data samples is apparently more realistic, but inevitably introduces outliers to the model. Our interest lies in confusion matrix-based noisy label learning with such outliers taken into consideration. We begin with pointing out that under the model of interest, using labels produced by only one annotator is fundamentally insufficient to detect the outliers or identify the ground-truth classifier. Then, we prove that by employing a crowdsourcing strategy involving multiple annotators, a carefully designed loss function can establish the desired model identifiability under reasonable conditions. Our development builds upon a link between the noisy label model and a column-corrupted matrix factorization mode---based on which we show that crowdsourced annotations distinguish nominal data and instance-dependent outliers using a low-dimensional subspace. Experiments show that our learning scheme substantially improves outlier detection and the classifier's testing accuracy.