A Appendix
–Neural Information Processing Systems
We compare with Indic and Non-Indic datasets. A.1 Comparison with existing datasets In this section, we compare our proposed MACD with existing datasets in detail in Table 10. We note that large scale datasets containing more than 50K samples exist for some non-Indic languages like English, Greek and Turkish language. These datasets enable large-scale study of abuse detection for these languages. However, for other languages, presence of large-scale datasets is still lacking. Next, we compare with Indic datasets and note that Indic datasets are small-scale as compared to non-Indic datasets. This shows that there is an immediate requirement for a dataset like MACD to fill this gap and foster advancements in abuse detection in Indic languages. Overall and at language level, MACD is one of the largest dataset for studying Indic languages. A.2 MACD dataset Explicit warning: We want to urge the community to be mindful of the fact that our dataset MACD contains comments which express abusive behaviour towards religion, region, gender etc. that might be abusive and depressing to the researchers.
Neural Information Processing Systems
May-23-2025, 05:51:59 GMT
- Genre:
- Research Report > New Finding (0.68)
- Industry:
- Information Technology (0.68)
- Law (0.68)
- Technology: