AITopics | Chowdhury, Shammur Absar

Collaborating Authors

Chowdhury, Shammur Absar

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

BnTTS: Few-Shot Speaker Adaptation in Low-Resource Setting

Basher, Mohammad Jahid Ibna, Kowsher, Md, Islam, Md Saiful, Nandi, Rabindra Nath, Prottasha, Nusrat Jahan, Menon, Mehadi Hasan, Muntasir, Tareq Al, Chowdhury, Shammur Absar, Alam, Firoj, Yousefi, Niloofar, Garibay, Ozlem Ozmen

arXiv.org Artificial IntelligenceFeb-8-2025

This paper introduces BnTTS (Bangla Text-To-Speech), the first framework for Bangla speaker adaptation-based TTS, designed to bridge the gap in Bangla speech synthesis using minimal training data. Building upon the XTTS architecture, our approach integrates Bangla into a multilingual TTS pipeline, with modifications to account for the phonetic and linguistic characteristics of the language. We pre-train BnTTS on 3.85k hours of Bangla speech dataset with corresponding text labels and evaluate performance in both zero-shot and few-shot settings on our proposed test dataset. Empirical evaluations in few-shot settings show that BnTTS significantly improves the naturalness, intelligibility, and speaker fidelity of synthesized Bangla speech. Compared to state-of-the-art Bangla TTS systems, BnTTS exhibits superior performance in Subjective Mean Opinion Score (SMOS), Naturalness, and Clarity metrics.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2502.05729

Country:

Asia > Singapore (0.14)
North America > United States (0.14)
Europe > France (0.14)

Genre: Research Report (0.64)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.93)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

GenAI Content Detection Task 2: AI vs. Human -- Academic Essay Authenticity Challenge

Chowdhury, Shammur Absar, Almerekhi, Hind, Kutlu, Mucahid, Keles, Kaan Efe, Ahmad, Fatema, Mohiuddin, Tasnim, Mikros, George, Alam, Firoj

arXiv.org Artificial IntelligenceDec-24-2024

This paper presents a comprehensive overview of the first edition of the Academic Essay Authenticity Challenge, organized as part of the GenAI Content Detection shared tasks collocated with COLING 2025. This challenge focuses on detecting machine-generated vs. human-authored essays for academic purposes. The task is defined as follows: "Given an essay, identify whether it is generated by a machine or authored by a human.'' The challenge involves two languages: English and Arabic. During the evaluation phase, 25 teams submitted systems for English and 21 teams for Arabic, reflecting substantial interest in the task. Finally, seven teams submitted system description papers. The majority of submissions utilized fine-tuned transformer-based models, with one team employing Large Language Models (LLMs) such as Llama 2 and Llama 3. This paper outlines the task formulation, details the dataset construction process, and explains the evaluation framework. Additionally, we present a summary of the approaches adopted by participating teams. Nearly all submitted systems outperformed the n-gram-based baseline, with the top-performing systems achieving F1 scores exceeding 0.98 for both languages, indicating significant progress in the detection of machine-generated text.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2412.18274

Country:

North America (0.68)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.15)
Europe > Middle East > Malta (0.14)

Genre: Overview (1.00)

Industry:

Education (1.00)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

NativQA: Multilingual Culturally-Aligned Natural Query for LLMs

Hasan, Md. Arid, Hasanain, Maram, Ahmad, Fatema, Laskar, Sahinur Rahman, Upadhyay, Sunaya, Sukhadia, Vrunda N, Kutlu, Mucahid, Chowdhury, Shammur Absar, Alam, Firoj

arXiv.org Artificial IntelligenceJul-13-2024

Natural Question Answering (QA) datasets play a crucial role in developing and evaluating the capabilities of large language models (LLMs), ensuring their effective usage in real-world applications. Despite the numerous QA datasets that have been developed, there is a notable lack of region-specific datasets generated by native users in their own languages. This gap hinders the effective benchmarking of LLMs for regional and cultural specificities. In this study, we propose a scalable framework, NativQA, to seamlessly construct culturally and regionally aligned QA datasets in native languages, for LLM evaluation and tuning. Moreover, to demonstrate the efficacy of the proposed framework, we designed a multilingual natural QA dataset, MultiNativQA, consisting of ~72K QA pairs in seven languages, ranging from high to extremely low resource, based on queries from native speakers covering 18 topics. We benchmark the MultiNativQA dataset with open- and closed-source LLMs. We made both the framework NativQA and MultiNativQA dataset publicly available for the community. (https://nativqa.gitlab.io)

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2407.09823

Country:

Europe (1.00)
North America > United States (0.46)
Asia > Middle East > Qatar (0.16)
(3 more...)

Genre: Research Report > New Finding (0.66)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Children's Speech Recognition through Discrete Token Enhancement

Sukhadia, Vrunda N., Chowdhury, Shammur Absar

arXiv.org Artificial IntelligenceJun-24-2024

Children's speech recognition is considered a low-resource task mainly due to the lack of publicly available data. There are several reasons for such data scarcity, including expensive data collection and annotation processes, and data privacy, among others. Transforming speech signals into discrete tokens that do not carry sensitive information but capture both linguistic and acoustic information could be a solution for privacy concerns. In this study, we investigate the integration of discrete speech tokens into children's speech recognition systems as input without significantly degrading the ASR performance. Additionally, we explored single-view and multi-view strategies for creating these discrete labels. Furthermore, we tested the models for generalization capabilities with unseen domain and nativity dataset. Results reveal that the discrete token ASR for children achieves nearly equivalent performance with an approximate 83% reduction in parameters.

artificial intelligence, dataset, speech recognition, (13 more...)

arXiv.org Artificial Intelligence

2406.13431

Genre: Research Report > New Finding (0.66)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)

Add feedback

Pseudo-Labeling for Domain-Agnostic Bangla Automatic Speech Recognition

Nandi, Rabindra Nath, Menon, Mehadi Hasan, Muntasir, Tareq Al, Sarker, Sagor, Muhtaseem, Quazi Sarwar, Islam, Md. Tariqul, Chowdhury, Shammur Absar, Alam, Firoj

arXiv.org Artificial IntelligenceNov-6-2023

One of the major challenges for developing automatic speech recognition (ASR) for low-resource languages is the limited access to labeled data with domain-specific variations. In this study, we propose a pseudo-labeling approach to develop a large-scale domain-agnostic ASR dataset. With the proposed methodology, we developed a 20k+ hours labeled Bangla speech dataset covering diverse topics, speaking styles, dialects, noisy environments, and conversational scenarios. We then exploited the developed corpus to design a conformer-based ASR system. We benchmarked the trained ASR with publicly available datasets and compared it with other available models. To investigate the efficacy, we designed and developed a human-annotated domain-agnostic test set composed of news, telephony, and conversational data among others. Our results demonstrate the efficacy of the model trained on psuedo-label data for the designed test-set along with publicly-available Bangla datasets. The experimental resources will be publicly available.(https://github.com/hishab-nlp/Pseudo-Labeling-for-Domain-Agnostic-Bangla-ASR)

artificial intelligence, dataset, speech recognition, (10 more...)

arXiv.org Artificial Intelligence

2311.03196

Country:

Asia > Singapore (0.14)
Asia > Middle East > Qatar (0.14)
Europe > France (0.14)
Asia > Malaysia (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)

Add feedback

Automatic Pronunciation Assessment -- A Review

Kheir, Yassine El, Ali, Ahmed, Chowdhury, Shammur Absar

arXiv.org Artificial IntelligenceOct-21-2023

Pronunciation assessment and its application in computer-aided pronunciation training (CAPT) have seen impressive progress in recent years. With the rapid growth in language processing and deep learning over the past few years, there is a need for an updated review. In this paper, we review methods employed in pronunciation assessment for both phonemic and prosodic. We categorize the main challenges observed in prominent research trends, and highlight existing limitations, and available resources. This is followed by a discussion of the remaining challenges and possible directions for future work.

machine learning, natural language, pronunciation, (19 more...)

arXiv.org Artificial Intelligence

2310.13974

Country:

Asia > Middle East (0.14)
Asia > China (0.14)
North America > United States (0.14)
(2 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry: Education > Curriculum > Subject-Specific Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

L1-aware Multilingual Mispronunciation Detection Framework

Kheir, Yassine El, Chowdhury, Shammur Absar, Ali, Ahmed

arXiv.org Artificial IntelligenceSep-21-2023

The phonological discrepancies between a speaker's native (L1) and the non-native language (L2) serves as a major factor for mispronunciation. This paper introduces a novel multilingual MDD architecture, L1-MultiMDD, enriched with L1-aware speech representation. An end-to-end speech encoder is trained on the input signal and its corresponding reference phoneme sequence. First, an attention mechanism is deployed to align the input audio with the reference phoneme sequence. Afterwards, the L1-L2-speech embedding are extracted from an auxiliary model, pretrained in a multi-task setup identifying L1 and L2 language, and are infused with the primary network. Finally, the L1-MultiMDD is then optimized for a unified multilingual phoneme recognition task using connectionist temporal classification (CTC) loss for the target languages: English, Arabic, and Mandarin. Our experiments demonstrate the effectiveness of the proposed L1-MultiMDD framework on both seen -- L2-ARTIC, LATIC, and AraVoiceL2v2; and unseen -- EpaDB and Speechocean762 datasets. The consistent gains in PER, and false rejection rate (FRR) across all target languages confirm our approach's robustness, efficacy, and generalizability.

artificial intelligence, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2309.07719

Country:

Africa (0.46)
Asia > Middle East > Qatar (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

The complementary roles of non-verbal cues for Robust Pronunciation Assessment

Kheir, Yassine El, Chowdhury, Shammur Absar, Ali, Ahmed

arXiv.org Artificial IntelligenceSep-14-2023

Numerous investigations have explored a range of features and modeling approaches aimed at enhancing modeling Research on pronunciation assessment systems focuses performance. These explorations have encompassed the utilization on utilizing phonetic and phonological aspects of non-native of Goodness-of-Pronunciation (GOP) metrics [4, 5, (L2) speech, often neglecting the rich layer of information 6], the integration of manually crafted handful of non-verbal hidden within the non-verbal cues. In this study, we proposed features such as duration, energy, and pitch [7, 8, 9], as well a novel pronunciation assessment framework, IntraVerbalPA.

artificial intelligence, machine learning, representation, (18 more...)

arXiv.org Artificial Intelligence

2309.07739

Country: Asia > Middle East > Qatar (0.14)

Genre: Research Report > New Finding (0.67)

Industry: Education (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

LLMeBench: A Flexible Framework for Accelerating LLMs Benchmarking

Dalvi, Fahim, Hasanain, Maram, Boughorbel, Sabri, Mousi, Basel, Abdaljalil, Samir, Nazar, Nizi, Abdelali, Ahmed, Chowdhury, Shammur Absar, Mubarak, Hamdy, Ali, Ahmed, Hawasly, Majd, Durrani, Nadir, Alam, Firoj

arXiv.org Artificial IntelligenceAug-9-2023

The recent development and success of Large Language Models (LLMs) necessitate an evaluation of their performance across diverse NLP tasks in different languages. Although several frameworks have been developed and made publicly available, their customization capabilities for specific tasks and datasets are often complex for different users. In this study, we introduce the LLMeBench framework. Initially developed to evaluate Arabic NLP tasks using OpenAI's GPT and BLOOM models; it can be seamlessly customized for any NLP task and model, regardless of language. The framework also features zero- and few-shot learning settings. A new custom dataset can be added in less than 10 minutes, and users can use their own model API keys to evaluate the task at hand. The developed framework has been already tested on 31 unique NLP tasks using 53 publicly available datasets within 90 experimental setups, involving approximately 296K data points. We plan to open-source the framework for the community (https://github.com/qcri/LLMeBench/). A video demonstrating the framework is available online (https://youtu.be/FkQn4UjYA0s).

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2308.04945

Country:

Asia (0.14)
North America > Canada (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

MyVoice: Arabic Speech Resource Collaboration Platform

Elshahawy, Yousseif, Kheir, Yassine El, Chowdhury, Shammur Absar, Ali, Ahmed

arXiv.org Artificial IntelligenceJul-23-2023

We introduce MyVoice, a crowdsourcing platform designed to collect Arabic speech to enhance dialectal speech technologies. This platform offers an opportunity to design large dialectal speech datasets; and makes them publicly available. MyVoice allows contributors to select city/country-level fine-grained dialect and record the displayed utterances. Users can switch roles between contributors and annotators. The platform incorporates a quality assurance system that filters out low-quality and spurious recordings before sending them for validation. During the validation phase, contributors can assess the quality of recordings, annotate them, and provide feedback which is then reviewed by administrators. Furthermore, the platform offers flexibility to admin roles to add new data or tasks beyond dialectal speech and word collection, which are displayed to contributors. Thus, enabling collaborative efforts in gathering diverse and large Arabic speech data.

artificial intelligence, contributor, social media, (16 more...)

arXiv.org Artificial Intelligence

2308.02503

Country: Asia > Middle East > Qatar (0.16)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Speech (0.32)

Add feedback