Documentation for The Noisy Ostracods Dataset, He Wang

Neural Information Processing Systems 

The Noisy Ostracods dataset is a real-world taxonomy dataset characterized by various types of noise. It was created out of the need for a clean taxonomy dataset and the challenges we encountered during the cleaning process in our real use case. Our goal was to provide a benchmark for evaluating the performance of robust machine learning methods and label correction algorithms from a practical perspective. The imbalanced and fine-grained nature of the dataset introduces additional challenges to these methods. The document is made by adapting the most relevant questions from datasheets for datasets[1] according to the property of our datasets.