AITopics | banglabert

Collaborating Authors

banglabert

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Bangla Hate Speech Classification with Fine-tuned Transformer Models

Jafari, Yalda Keivan, Dey, Krishno

arXiv.org Artificial IntelligenceDec-3-2025

Hate speech recognition in low-resource languages remains a difficult problem due to insufficient datasets, orthographic heterogeneity, and linguistic variety. Bangla is spoken by more than 230 million people of Bangladesh and India (West Bengal). Despite the growing need for automated moderation on social media platforms, Bangla is significantly under-represented in computational resources. In this work, we study Subtask 1A and Subtask 1B of the BLP 2025 Shared Task on hate speech detection. We reproduce the official baselines (e.g., Majority, Random, Support Vector Machine) and also produce and consider Logistic Regression, Random Forest, and Decision Tree as baseline methods. We also utilized transformer-based models such as DistilBERT, BanglaBERT, m-BERT, and XLM-RoBERTa for hate speech classification. All the transformer-based models outperformed baseline methods for the subtasks, except for DistilBERT. Among the transformer-based models, BanglaBERT produces the best performance for both subtasks. Despite being smaller in size, BanglaBERT outperforms both m-BERT and XLM-RoBERTa, which suggests language-specific pre-training is very important. Our results highlight the potential and need for pre-trained language models for the low-resource Bangla language.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2512.02845

Country: Asia > India > West Bengal (0.24)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Accelerating Bangla NLP Tasks with Automatic Mixed Precision: Resource-Efficient Training Preserving Model Efficacy

Opi, Md Mehrab Hossain, Khan, Sumaiya, Rahman, Moshammad Farzana

arXiv.org Artificial IntelligenceDec-2-2025

Training models for Natural Language Processing (NLP) requires substantial computational resources and time, posing significant challenges, especially for NLP development in Bangla, where access to high-end hardware is often limited. In this work, we explore automatic mixed precision (AMP) training as a means to improve computational efficiency without sacrificing model performance. By leveraging a dynamic mix of 16-bit and 32-bit floating-point computations, AMP lowers GPU memory requirements and speeds up training without degrading model performance. We evaluate AMP across four standard Bangla NLP tasks, namely sentiment analysis, named entity recognition, error classification, and question answering, using four transformer-based models: BanglaBERT, BanglishBERT, XLM-R, and mBERT. Our results demonstrate that AMP accelerates training by 44.5% and reduces memory consumption by 17.6%, while maintaining F-1 score within 99.7% of the full-precision baselines. This empirical study highlights AMP's potential to democratize access to state-of-the-art NLP capabilities in hardware-constrained settings by lowering computational barriers.

bangla nlp task, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2512.00829

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.71)

Add feedback

Gradient Masters at BLP-2025 Task 1: Advancing Low-Resource NLP for Bengali using Ensemble-Based Adversarial Training for Hate Speech Detection

Hoque, Syed Mohaiminul, Rahman, Naimur, Hossain, Md Sakhawat

arXiv.org Artificial IntelligenceNov-25-2025

This paper introduces the approach of "Gradient Masters" for BLP-2025 Task 1: "Bangla Multitask Hate Speech Identification Shared Task". We present an ensemble-based fine-tuning strategy for addressing subtasks 1A (hate-type classification) and 1B (target group classification) in YouTube comments. We propose a hybrid approach on a Bangla Language Model, which outperformed the baseline models and secured the 6th position in subtask 1A with a micro F1 score of 73.23% and the third position in subtask 1B with 73.28%. We conducted extensive experiments that evaluated the robustness of the model throughout the development and evaluation phases, including comparisons with other Language Model variants, to measure generalization in low-resource Bangla hate speech scenarios and data set coverage. In addition, we provide a detailed analysis of our findings, exploring misclassification patterns in the detection of hate speech.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2511.18324

Country: Europe > Austria > Vienna (0.14)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.69)

Add feedback

Distinguishing Repetition Disfluency from Morphological Reduplication in Bangla ASR Transcripts: A Novel Corpus and Benchmarking Analysis

Arpa, Zaara Zabeen, Apurbo, Sadnam Sakib, Oishee, Nazia Karim Khan, Abrar, Ajwad

arXiv.org Artificial IntelligenceNov-18-2025

Automatic Speech Recognition (ASR) is now integral to digital interaction, driving applications from virtual assistants to automated subtitles on platforms like YouTube ( Dykes et al. 2023). Despite its wide adoption and significant advancements, ASR performance remains imperfect, particularly in Large Vocabulary Continuous Speech Recognition (L VCSR) and under real-world conditions like background noise or diverse accents ( Errattahi et al. 2018; Romana et al. 2024). This often results in a significant Word Error Rate (WER). A major, persistent source of error stems from speech disfluencies, which are interruptions in the smooth flow of speech, including filled pauses, hesitations, self-corrections, and most relevantly, repetitions ( Romana et al. 2024). Disfluencies are natural and frequent; one study found a 50% probability of a disfluency in a 10-13 word sentence ( Jamshid Lou and Johnson 2020b). Their presence creates noisy transcripts that are difficult to read and detrimental to downstream Natural Language Processing (NLP) tasks such as machine translation or information extraction ( Romana et al. 2024; Jamshid Lou and Johnson 2020a).

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2511.13159

Country:

North America > United States (0.46)
Asia > Middle East > UAE (0.28)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)

Add feedback

Retriv at BLP-2025 Task 1: A Transformer Ensemble and Multi-Task Learning Approach for Bangla Hate Speech Identification

Saha, Sourav, Asib, K M Nafi, Hoque, Mohammed Moshiul

arXiv.org Artificial IntelligenceNov-11-2025

This paper addresses the problem of Bangla hate speech identification, a socially impactful yet linguistically challenging task. As part of the "Bangla Multi-task Hate Speech Identification" shared task at the BLP Workshop, IJCNLP-AACL 2025, our team "Retriv" participated in all three subtasks: (1A) hate type classification, (1B) target group identification, and (1C) joint detection of type, severity, and target. For subtasks 1A and 1B, we employed a soft-voting ensemble of transformer models (BanglaBERT, MuRIL, IndicBERTv2). For subtask 1C, we trained three multitask variants and aggregated their predictions through a weighted voting ensemble. Our systems achieved micro-f1 scores of 72.75% (1A) and 72.69% (1B), and a weighted micro-f1 score of 72.62% (1C). On the shared task leaderboard, these corresponded to 9th, 10th, and 7th positions, respectively. These results highlight the promise of transformer ensembles and weighted multitask frameworks for advancing Bangla hate speech detection in low-resource contexts. We made experimental scripts publicly available for the community.

detection, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2511.07304

Country: North America > United States (0.46)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Social Media Sentiments Analysis on the July Revolution in Bangladesh: A Hybrid Transformer Based Machine Learning Approach

Hossen, Md. Sabbir, Saiduzzaman, Md., Shaha, Pabon

arXiv.org Artificial IntelligenceAug-6-2025

The July Revolution in Bangladesh marked a significant student-led mass uprising, uniting people across the nation to demand justice, accountability, and systemic reform. Social media platforms played a pivotal role in amplifying public sentiment and shaping discourse during this historic mass uprising. In this study, we present a hybrid transformer-based sentiment analysis framework to decode public opinion expressed in social media comments during and after the revolution. We used a brand new dataset of 4,200 Bangla comments collected from social media. The framework employs advanced transformer-based feature extraction techniques, including BanglaBERT, mBERT, XLM-RoBERTa, and the proposed hybrid XMB-BERT, to capture nuanced patterns in textual data. Principle Component Analysis (PCA) were utilized for dimensionality reduction to enhance computational efficiency. We explored eleven traditional and advanced machine learning classifiers for identifying sentiments. The proposed hybrid XMB-BERT with the voting classifier achieved an exceptional accuracy of 83.7% and outperform other model classifier combinations. This study underscores the potential of machine learning techniques to analyze social sentiment in low-resource languages like Bangla.

machine learning, natural language, sentiment analysis, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ECAI65401.2025.11095452

2507.11084

Country: Asia > Bangladesh (0.72)

Genre: Research Report > New Finding (0.35)

Industry:

Information Technology (0.68)
Media > News (0.46)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

How Effectively Can BERT Models Interpret Context and Detect Bengali Communal Violent Text?

Khondoker, Abdullah, Taufik, Enam Ahmed, Tashik, Md. Iftekhar Islam, Mahmud, S M Ishtiak, Sadeque, Farig

arXiv.org Artificial IntelligenceJun-25-2025

The spread of cyber hatred has led to communal violence, fueling aggression and conflicts between various religious, ethnic, and social groups, posing a significant threat to social harmony. Despite its critical importance, the classification of communal violent text remains an underexplored area in existing research. This study aims to enhance the accuracy of detecting text that incites communal violence, focusing specifically on Bengali textual data sourced from social media platforms. We introduce a fine-tuned BanglaBERT model tailored for this task, achieving a macro F1 score of 0.60. To address the issue of data imbalance, our dataset was expanded by adding 1,794 instances, which facilitated the development and evaluation of a fine-tuned ensemble model. This ensemble model demonstrated an improved performance, achieving a macro F1 score of 0.63, thus highlighting its effectiveness in this domain. In addition to quantitative performance metrics, qualitative analysis revealed instances where the models struggled with context understanding, leading to occasional misclassifications, even when predictions were made with high confidence. Through analyzing the cosine similarity between words, we identified certain limitations in the pre-trained BanglaBERT models, particularly in their ability to distinguish between closely related communal and non-communal terms. To further interpret the model's decisions, we applied LIME, which helped to uncover specific areas where the model struggled in understanding context, contributing to errors in classification. These findings highlight the promise of NLP and interpretability tools in reducing online communal violence. Our work contributes to the growing body of research in communal violence detection and offers a foundation for future studies aiming to refine these techniques for better accuracy and societal impact.

machine learning, natural language, violence, (20 more...)

arXiv.org Artificial Intelligence

2506.19831

Country: Asia > India (0.46)

Genre: Research Report > New Finding (0.93)

Industry: Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.93)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

How do datasets, developers, and models affect biases in a low-resourced language?

Das, Dipto, Guha, Shion, Semaan, Bryan

arXiv.org Artificial IntelligenceJun-10-2025

Sociotechnical systems, such as language technologies, frequently exhibit identity-based biases. These biases exacerbate the experiences of historically marginalized communities and remain understudied in low-resource contexts. While models and datasets specific to a language or with multilingual support are commonly recommended to address these biases, this paper empirically tests the effectiveness of such approaches in the context of gender, religion, and nationality-based identities in Bengali, a widely spoken but low-resourced language. We conducted an algorithmic audit of sentiment analysis models built on mBERT and BanglaBERT, which were fine-tuned using all Bengali sentiment analysis (BSA) datasets from Google Dataset Search. Our analyses showed that BSA models exhibit biases across different identity categories despite having similar semantic content and structure. We also examined the inconsistencies and uncertainties arising from combining pre-trained models and datasets created by individuals from diverse demographic backgrounds. We connected these findings to the broader discussions on epistemic injustice, AI alignment, and methodological decisions in algorithmic audits.

artificial intelligence, natural language, social media, (19 more...)

arXiv.org Artificial Intelligence

2506.06816

Country:

Europe (1.00)
Asia (0.94)
North America > United States > Colorado (0.28)
North America > Canada > Ontario (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Information Technology (1.00)
Health & Medicine (0.93)
Law > Civil Rights & Constitutional Law (0.93)
(2 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.67)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.67)
(2 more...)

Add feedback

CineXDrama: Relevance Detection and Sentiment Analysis of Bangla YouTube Comments on Movie-Drama using Transformers: Insights from Interpretability Tool

Rifa, Usafa Akther, Debnath, Pronay, Rafa, Busra Kamal, Hridi, Shamaun Safa, Rahman, Md. Aminur

arXiv.org Artificial IntelligenceDec-1-2024

In recent years, YouTube has become the leading platform for Bangla movies and dramas, where viewers express their opinions in comments that convey their sentiments about the content. However, not all comments are relevant for sentiment analysis, necessitating a filtering mechanism. We propose a system that first assesses the relevance of comments and then analyzes the sentiment of those deemed relevant. We introduce a dataset of 14,000 manually collected and preprocessed comments, annotated for relevance (relevant or irrelevant) and sentiment (positive or negative). Eight transformer models, including BanglaBERT, were used for classification tasks, with BanglaBERT achieving the highest accuracy (83.99% for relevance detection and 93.3% for sentiment analysis). The study also integrates LIME to interpret model decisions, enhancing transparency.

machine learning, natural language, sentiment analysis, (16 more...)

arXiv.org Artificial Intelligence

2411.06548

Country:

North America > United States > Washington > King County > Seattle (0.04)
Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.04)

Genre: Research Report (0.50)

Industry:

Leisure & Entertainment (0.95)
Media > Film (0.94)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.97)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Enhancing Sentiment Analysis in Bengali Texts: A Hybrid Approach Using Lexicon-Based Algorithm and Pretrained Language Model Bangla-BERT

Mahmud, Hemal, Mahmud, Hasan

arXiv.org Artificial IntelligenceNov-29-2024

Sentiment analysis (SA) is a process of identifying the emotional tone or polarity within a given text and aims to uncover the user's complex emotions and inner feelings. While sentiment analysis has been extensively studied for languages like English, research in Bengali, remains limited, particularly for fine-grained sentiment categorization. This work aims to connect this gap by developing a novel approach that integrates rule-based algorithms with pre-trained language models. We developed a dataset from scratch, comprising over 15,000 manually labeled reviews. Next, we constructed a Lexicon Data Dictionary, assigning polarity scores to the reviews. We developed a novel rule based algorithm Bangla Sentiment Polarity Score (BSPS), an approach capable of generating sentiment scores and classifying reviews into nine distinct sentiment categories. To assess the performance of this method, we evaluated the classified sentiments using BanglaBERT, a pre-trained transformer-based language model. We also performed sentiment classification directly with BanglaBERT on the original data and evaluated this model's results. Our analysis revealed that the BSPS + BanglaBERT hybrid approach outperformed the standalone BanglaBERT model, achieving higher accuracy, precision, and nuanced classification across the nine sentiment categories. The results of our study emphasize the value and effectiveness of combining rule-based and pre-trained language model approaches for enhanced sentiment analysis in Bengali and suggest pathways for future research and application in languages with similar linguistic complexities.

machine learning, natural language, sentiment, (19 more...)

arXiv.org Artificial Intelligence

2411.19584

Country:

Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.05)
Asia > Malaysia > Sabah > Kota Kinabalu (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback