AITopics | Hasan, Md Kamrul

Collaborating Authors

Hasan, Md Kamrul

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

BDA: Bangla Text Data Augmentation Framework

Tariquzzaman, Md., Anam, Audwit Nafi, Haque, Naimul, Kabir, Mohsinul, Mahmud, Hasan, Hasan, Md Kamrul

arXiv.org Artificial IntelligenceDec-26-2024

Data augmentation involves generating synthetic samples that resemble those in a given dataset. In resource-limited fields where high-quality data is scarce, augmentation plays a crucial role in increasing the volume of training data. This paper introduces a Bangla Text Data Augmentation (BDA) Framework that uses both pre-trained models and rule-based methods to create new variants of the text. A filtering process is included to ensure that the new text keeps the same meaning as the original while also adding variety in the words used. We conduct a comprehensive evaluation of the framework's effectiveness in Bangla text classification tasks. Our framework achieved significant improvement in F1 scores across five distinct datasets, delivering performance equivalent to models trained on 100% of the data while utilizing only 50% of the training dataset. Additionally, we explore the impact of data scarcity by progressively reducing the training data and augmenting it through BDA, resulting in notable F1 score enhancements. The study offers a thorough examination of BDA's performance, identifying key factors for optimal results and addressing its limitations through detailed analysis.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2412.08753

Country:

Asia (0.93)
North America (0.67)
Europe (0.67)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)

Add feedback

Math Word Problem Solving by Generating Linguistic Variants of Problem Statements

Raiyan, Syed Rifat, Faiyaz, Md. Nafis, Kabir, Shah Md. Jawad, Kabir, Mohsinul, Mahmud, Hasan, Hasan, Md Kamrul

arXiv.org Artificial IntelligenceJun-24-2023

The art of mathematical reasoning stands as a fundamental pillar of intellectual progress and is a central catalyst in cultivating human ingenuity. Researchers have recently published a plethora of works centered around the task of solving Math Word Problems (MWP) $-$ a crucial stride towards general AI. These existing models are susceptible to dependency on shallow heuristics and spurious correlations to derive the solution expressions. In order to ameliorate this issue, in this paper, we propose a framework for MWP solvers based on the generation of linguistic variants of the problem text. The approach involves solving each of the variant problems and electing the predicted expression with the majority of the votes. We use DeBERTa (Decoding-enhanced BERT with disentangled attention) as the encoder to leverage its rich textual representations and enhanced mask decoder to construct the solution expressions. Furthermore, we introduce a challenging dataset, $\mathrm{P\small{ARA}\normalsize{MAWPS}}$, consisting of paraphrased, adversarial, and inverse variants of selectively sampled MWPs from the benchmark $\mathrm{M\small{AWPS}}$ dataset. We extensively experiment on this dataset along with other benchmark datasets using some baseline MWP solver models. We show that training on linguistic variants of problem statements and voting on candidate predictions improve the mathematical reasoning and robustness of the model. We make our code and data publicly available.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2306.13899

Country:

Asia > Bangladesh (0.14)
North America > United States (0.14)
Europe > Spain (0.14)
Asia > Japan (0.14)

Genre: Research Report (0.82)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

BanglaBook: A Large-scale Bangla Dataset for Sentiment Analysis from Book Reviews

Kabir, Mohsinul, Mahfuz, Obayed Bin, Raiyan, Syed Rifat, Mahmud, Hasan, Hasan, Md Kamrul

arXiv.org Artificial IntelligenceJun-8-2023

The analysis of consumer sentiment, as expressed through reviews, can provide a wealth of insight regarding the quality of a product. While the study of sentiment analysis has been widely explored in many popular languages, relatively less attention has been given to the Bangla language, mostly due to a lack of relevant data and cross-domain adaptability. To address this limitation, we present BanglaBook, a large-scale dataset of Bangla book reviews consisting of 158,065 samples classified into three broad categories: positive, negative, and neutral. We provide a detailed statistical analysis of the dataset and employ a range of machine learning models to establish baselines including SVM, LSTM, and Bangla-BERT. Our findings demonstrate a substantial performance advantage of pre-trained models over models that rely on manually crafted features, emphasizing the necessity for additional training resources in this domain. Additionally, we conduct an in-depth error analysis by examining sentiment unigrams, which may provide insight into common classification errors in under-resourced languages like Bangla. Our codes and data are publicly available at https://github.com/mohsinulkabir14/BanglaBook.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2305.06595

Country:

Asia > Bangladesh (0.14)
North America > United States (0.14)

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
(2 more...)

Add feedback

Venn Diagram Multi-label Class Interpretation of Diabetic Foot Ulcer with Color and Sharpness Enhancement

Hasan, Md Mahamudul, Yap, Moi Hoon, Hasan, Md Kamrul

arXiv.org Artificial IntelligenceMay-5-2023

DFU is a severe complication of diabetes that can lead to amputation of the lower limb if not treated properly. Inspired by the 2021 Diabetic Foot Ulcer Grand Challenge, researchers designed automated multi-class classification of DFU, including infection, ischaemia, both of these conditions, and none of these conditions. However, it remains a challenge as classification accuracy is still not satisfactory. This paper proposes a Venn Diagram interpretation of multi-label CNN-based method, utilizing different image enhancement strategies, to improve the multi-class DFU classification. We propose to reduce the four classes into two since both class wounds can be interpreted as the simultaneous occurrence of infection and ischaemia and none class wounds as the absence of infection and ischaemia. We introduce a novel Venn Diagram representation block in the classifier to interpret all four classes from these two classes. To make our model more resilient, we propose enhancing the perceptual quality of DFU images, particularly blurry or inconsistently lit DFU images, by performing color and sharpness enhancements on them. We also employ a fine-tuned optimization technique, adaptive sharpness aware minimization, to improve the CNN model generalization performance. The proposed method is evaluated on the test dataset of DFUC2021, containing 5,734 images and the results are compared with the top-3 winning entries of DFUC2021. Our proposed approach outperforms these existing approaches and achieves Macro-Average F1, Recall and Precision scores of 0.6592, 0.6593, and 0.6652, respectively.Additionally, We perform ablation studies and image quality measurements to further interpret our proposed method. This proposed method will benefit patients with DFUs since it tackles the inconsistencies in captured images and can be employed for a more robust remote DFU wound classification.

artificial intelligence, machine learning, venn diagram multi-label class interpretation, (2 more...)

arXiv.org Artificial Intelligence

2305.01044

Genre: Research Report (0.66)

Industry: Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.89)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.53)

Add feedback

TextMI: Textualize Multimodal Information for Integrating Non-verbal Cues in Pre-trained Language Models

Hasan, Md Kamrul, Islam, Md Saiful, Lee, Sangwu, Rahman, Wasifur, Naim, Iftekhar, Khan, Mohammed Ibrahim, Hoque, Ehsan

arXiv.org Artificial IntelligenceMar-29-2023

Pre-trained large language models have recently achieved ground-breaking performance in a wide variety of language understanding tasks. However, the same model can not be applied to multimodal behavior understanding tasks (e.g., video sentiment/humor detection) unless non-verbal features (e.g., acoustic and visual) can be integrated with language. Jointly modeling multiple modalities significantly increases the model complexity, and makes the training process data-hungry. While an enormous amount of text data is available via the web, collecting large-scale multimodal behavioral video datasets is extremely expensive, both in terms of time and money. In this paper, we investigate whether large language models alone can successfully incorporate non-verbal information when they are presented in textual form. We present a way to convert the acoustic and visual information into corresponding textual descriptions and concatenate them with the spoken text. We feed this augmented input to a pre-trained BERT model and fine-tune it on three downstream multimodal tasks: sentiment, humor, and sarcasm detection. Our approach, TextMI, significantly reduces model complexity, adds interpretability to the model's decision, and can be applied for a diverse set of tasks while achieving superior (multimodal sarcasm detection) or near SOTA (multimodal sentiment analysis and multimodal humor detection) performance. We propose TextMI as a general, competitive baseline for multimodal behavioral analysis tasks, particularly in a low-resource setting.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2303.1543

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Artificial Intelligence > Vision > Face Recognition (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

M-BERT: Injecting Multimodal Information in the BERT Structure

Rahman, Wasifur, Hasan, Md Kamrul, Zadeh, Amir, Morency, Louis-Philippe, Hoque, Mohammed Ehsan

arXiv.org Machine LearningAug-15-2019

Multimodal language analysis is an emerging research area in natural language processing that models language in a multimodal manner. It aims to understand language from the modalities of text, visual, and acoustic by modeling both intra-modal and cross-modal interactions. BERT (Bidirectional Encoder Representations from Transformers) provides strong contextual language representations after training on large-scale unlabeled corpora. Fine-tuning the vanilla BERT model has shown promising results in building state-of-the-art models for diverse NLP tasks like question answering and language inference. However, fine-tuning BERT in the presence of information from other modalities remains an open research problem. In this paper, we inject multimodal information within the input space of BERT network for modeling multimodal language. The proposed injection method allows BERT to reach a new state of the art of $84.38\%$ binary accuracy on CMU-MOSI dataset (multimodal sentiment analysis) with a gap of 5.98 percent to the previous state of the art and 1.02 percent to the text-only BERT.

artificial intelligence, louis-philippe morency, neural network, (17 more...)

arXiv.org Machine Learning

1908.05787

Country: North America > United States (0.48)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

A Causality-Guided Prediction of the TED Talk Ratings from the Speech-Transcripts using Neural Networks

Tanveer, Md Iftekhar, Hasan, Md Kamrul, Gildea, Daniel, Hoque, M. Ehsan

arXiv.org Machine LearningMay-20-2019

Automated prediction of public speaking performance enables novel systems for tutoring public speaking skills. We use the largest open repository---TED Talks---to predict the ratings provided by the online viewers. The dataset contains over 2200 talk transcripts and the associated meta information including over 5.5 million ratings from spontaneous visitors to the website. We carefully removed the bias present in the dataset (e.g., the speakers' reputations, popularity gained by publicity, etc.) by modeling the data generating process using a causal diagram. We use a word sequence based recurrent architecture and a dependency tree based recursive architecture as the neural networks for predicting the TED talk ratings. Our neural network models can predict the ratings with an average F-score of 0.77 which largely outperforms the competitive baseline method.

deep learning, educational technology, transcript, (23 more...)

arXiv.org Machine Learning

1905.08392

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.68)

Industry: Education > Educational Setting > Online (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.74)

Add feedback

UR-FUNNY: A Multimodal Language Dataset for Understanding Humor

Hasan, Md Kamrul, Rahman, Wasifur, Zadeh, Amir, Zhong, Jianyuan, Tanveer, Md Iftekhar, Morency, Louis-Philippe, Mohammed, null, Hoque, null

arXiv.org Machine LearningApr-13-2019

Humor is a unique and creative communicative behavior displayed during social interactions. It is produced in a multimodal manner, through the usage of words (text), gestures (vision) and prosodic cues (acoustic). Understanding humor from these three modalities falls within boundaries of multimodal language; a recent research trend in natural language processing that models natural language as it happens in face-to-face communication. Although humor detection is an established research area in NLP, in a multimodal context it is an understudied area. This paper presents a diverse multimodal dataset, called UR-FUNNY, to open the door to understanding multimodal language used in expressing humor. The dataset and accompanying studies, present a framework in multimodal humor detection for the natural language processing community. UR-FUNNY is publicly available for research.

dataset, deep learning, neural network, (23 more...)

arXiv.org Machine Learning

1904.06618

Country:

North America > United States (0.28)
Asia > Middle East (0.28)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area > Immunology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.53)
Information Technology > Artificial Intelligence > Vision > Face Recognition (0.46)

Add feedback