imagemonkey
Seeing the Unseen: Errors and Bias in Visual Datasets
Introduction From face recognition in smartphones to automatic routing on self-driving cars, machine vision algorithms lie in the core of these features. These systems solve image based tasks by identifying and understanding objects, subsequently making decisions from these information. A large set of images where the featured objects were labelled, known as datasets, are commonly used to develop and enhance machine vision algorithms (Cox 2016). However, errors in datasets are usually induced or even magnified in algorithms, at times resulting in issues such as recognising black people as gorillas and misrepresenting ethnicities in search results (Nieva 2015; Prabhu and Birhane 2020). This essay tracks the errors in datasets and their impacts, revealing that a flawed dataset could be a result of limited categories, incomprehensive sourcing and poor classification.
[P] ImageMonkey - A public open source image database (x-post from /r/SideProject) • r/MachineLearning
The last three weeks I was working on ImageMonkey - check it out here: https://imagemonkey.io/. The idea originated while I was working on another project where at some point I wanted to integrate Machine Learning into my application. With all the great Machine Learning frameworks out there, it's really easy to get your foot into the door quickly. But while I was playing a little bit with the frameworks, I somehow realized that it's really hard to get some good training data. If you are lucky then there is some (annotated) training data online, if not...well, then you have to get your hands dirty and do the tedious work yourself.