Goto

Collaborating Authors

 ifso



a7c4163b33286261b24c72fd3d1707c9-Supplemental-Datasets_and_Benchmarks.pdf

Neural Information Processing Systems

These datasets enable large-scale study of abuse detection for these languages. Anonymized comments: To further address privacy concerns, we anonymize our dataset. We combine thehate and offensivecategories in these datasets for training a binary classification model. We showthepercentage (%)ofemoticons present inourdatasetMACDinTable12. Infuture work,we will investigate in detail about the impact of emoticons on abuse detection. However,duetothe limited scale and diversity of abuse detection datasets in Indic languages, development of these models for Indic languages has been severely impeded.


SupplementaryMaterial: CARLANE: ALaneDetectionBenchmarkfor UnsupervisedDomainAdaptationfromSimulationto multipleReal-WorldDomains

Neural Information Processing Systems

Does the dataset contain all possible instancesorisitasample(notnecessarilyrandom) of instances from a larger set? If the dataset is a sample, then what is the larger set? Is the sample representative of the larger set (e.g., geographic coverage)? If so, please describe how this representativeness was validated/verified.


SupplementaryMaterial-WikiDO: ANewBenchmarkEvaluatingCross-ModalRetrieval forVision-LanguageModels

Neural Information Processing Systems

This has been addressed in7 prior work [4, 3] by finetuning VLMs on a given corpus for a given task [5] and8 conducting zero-shot evaluations on a new corpus [7]. However, the mere use of an9 unseen corpus for evaluation does not imply it is OOD. Q1 What do the instances that comprise the dataset represent (e.g., documents, photos,24 people,countries)? Pleaseprovideadescription.26 (a) We provide 384k image-text pairs. Q3 Does the dataset contain all possible instances or is it a sample (not necessarily ran-36 dom) of instances from a larger set? If the dataset is a sample, then what is the larger37 set?


aa7ef4c0f4aaabf376088a1a74e09d4c-Supplemental-Datasets_and_Benchmarks.pdf

Neural Information Processing Systems

Pleaseprovideadescription.531 We want to provide an open-source large-scale music dataset for the research com-532 munity. Such large datasets do not yet exist in this domain, and we believetheyare533 neededtodemocratize innovationinmusicresearch andML-assisted musiccreation.534


PROSPECT: LabeledTandemMassSpectrometry DatasetforMachineLearninginProteomics

Neural Information Processing Systems

PROSPECT provides value to proteomics and machine learning researchers by including several high-quality annotations and by being accessible in terms of format and structure for applying machinelearning.


LIPS-Learning IndustrialPhysicalSimulation benchmarksuite-Appendix

Neural Information Processing Systems

For each benchmark, we generate three different training datasets. If the dataset is a sample, then what is the larger set? Is the samplerepresentativeofthe larger set(e.g., geographic coverage)? The provided datasets are self-contained and will remain constant. However, more datasets could be generated using the proposed benchmarking platform.



ConfLab: ADataCollectionConcept,Dataset,and BenchmarkforMachineAnalysisofFree-Standing SocialInteractionsintheWild Appendices

Calanir Luthion

Neural Information Processing Systems

Is there anything afuture user could do to mitigate theseundesirableharms? Although ConfLab's long-term vision is towards developing technology to assist individuals in navigating social interactions, the data could also affect a community in unintended ways: for instance, cause worsened social satisfaction, alackofagency,stereotype newcomers andveterans, or benefit only those members of the community who make use of resulting applications at the expense of the rest. More nefarious uses involve exploiting the data for developing methods that harmfully surveilorprofile people.


EPIC-KITCHENSVISORBenchmark VIdeoSegmentationsandObjectRelations-Appendix

Neural Information Processing Systems

Is it possible to identify individuals (i.e., one or more natural persons), either directly or indirectly(i.e.,incombinationwithotherdata)fromthedataset?