Maluku
Mapping Violence: Developing an Extensive Framework to Build a Bangla Sectarian Expression Dataset from Social Media Interactions
Tasnim, Nazia, Gupta, Sujan Sen, Shihab, Md. Istiak Hossain, Juee, Fatiha Islam, Tahsin, Arunima, Ghum, Pritom, Fatema, Kanij, Haque, Marshia, Farzana, Wasema, Nasir, Prionti, KhudaBukhsh, Ashique, Sadeque, Farig, Sushmit, Asif
Communal violence in online forums has become extremely prevalent in South Asia, where many communities of different cultures coexist and share resources. These societies exhibit a phenomenon characterized by strong bonds within their own groups and animosity towards others, leading to conflicts that frequently escalate into violent confrontations. To address this issue, we have developed the first comprehensive framework for the automatic detection of communal violence markers in online Bangla content accompanying the largest collection (13K raw sentences) of social media interactions that fall under the definition of four major violence class and their 16 coarse expressions. Our workflow introduces a 7-step expert annotation process incorporating insights from social scientists, linguists, and psychologists. By presenting data statistics and benchmarking performance using this dataset, we have determined that, aside from the category of Non-communal violence, Religio-communal violence is particularly pervasive in Bangla text. Moreover, we have substantiated the effectiveness of fine-tuning language models in identifying violent comments by conducting preliminary benchmarking on the state-of-the-art Bangla deep learning model.
- Asia > Myanmar (0.04)
- Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- (16 more...)
- Media > News (1.00)
- Law (1.00)
- Health & Medicine (1.00)
- (2 more...)
$\mu$PLAN: Summarizing using a Content Plan as Cross-Lingual Bridge
Huot, Fantine, Maynez, Joshua, Alberti, Chris, Amplayo, Reinald Kim, Agrawal, Priyanka, Fierro, Constanza, Narayan, Shashi, Lapata, Mirella
Cross-lingual summarization consists of generating a summary in one language given an input document in a different language, allowing for the dissemination of relevant content across speakers of other languages. However, this task remains challenging, mainly because of the need for cross-lingual datasets and the compounded difficulty of summarizing and translating. This work presents $\mu$PLAN, an approach to cross-lingual summarization that uses an intermediate planning step as a cross-lingual bridge. We formulate the plan as a sequence of entities that captures the conceptualization of the summary, i.e. identifying the salient content and expressing in which order to present the information, separate from the surface form. Using a multilingual knowledge base, we align the entities to their canonical designation across languages. $\mu$PLAN models first learn to generate the plan and then continue generating the summary conditioned on the plan and the input. We evaluate our methodology on the XWikis dataset on cross-lingual pairs across four languages and demonstrate that this planning objective achieves state-of-the-art performance in terms of ROUGE and faithfulness scores. Moreover, this planning approach improves the zero-shot transfer to new cross-lingual language pairs compared to non-planning baselines.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > California (0.14)
- North America > United States > Arkansas (0.05)
- (18 more...)
Viterbi Extraction tutorial with Hidden Markov Toolkit
Hatala, Zulkarnaen, Puturuhu, Victor
An algorithm used to extract HMM parameters is revisited. Most parts of the extraction process are taken from implemented Hidden Markov Toolkit (HTK) program under name HInit. The algorithm itself shows a few variations compared to another domain of implementations. The HMM model is introduced briefly based on the theory of Discrete Time Markov Chain. We schematically outline the Viterbi method implemented in HTK. Iterative definition of the method which is ready to be implemented in computer programs is reviewed. We also illustrate the method calculation precisely using manual calculation and extensive graphical illustration. The distribution of observation probability used is simply independent Gaussians r.v.s. The purpose of the content is not to justify the performance or accuracy of the method applied in a specific area. This writing merely to describe how the algorithm is performed. The whole content should enlighten the audience the insight of the Viterbi Extraction method used by HTK.