AITopics | Chadha, Aman

Collaborating Authors

Chadha, Aman

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

KnowledgePrompts: Exploring the Abilities of Large Language Models to Solve Proportional Analogies via Knowledge-Enhanced Prompting

Wijesiriwardene, Thilini, Wickramarachchi, Ruwan, Vennam, Sreeram, Jain, Vinija, Chadha, Aman, Das, Amitava, Kumaraguru, Ponnurangam, Sheth, Amit

arXiv.org Artificial IntelligenceDec-18-2024

Making analogies is fundamental to cognition. Proportional analogies, which consist of four terms, are often used to assess linguistic and cognitive abilities. For instance, completing analogies like "Oxygen is to Gas as is to " requires identifying the semantic relationship (e.g., "type of") between the first pair of terms ("Oxygen" and "Gas") and finding a second pair that shares the same relationship (e.g., "Aluminum" and "Metal"). In this work, we introduce a 15K Multiple-Choice Question Answering (MCQA) dataset for proportional analogy completion and evaluate the performance of contemporary Large Language Models (LLMs) in various knowledge-enhanced prompt settings. Specifically, we augment prompts with three types of knowledge: exemplar, structured, and targeted. Our results show that despite extensive training data, solving proportional analogies remains challenging for current LLMs, with the best model achieving an accuracy of 55%. Notably, we find that providing targeted knowledge can better assist models in completing proportional analogies compared to providing exemplars or collections of structured knowledge. Our code and data are available at: https://github.com/Thiliniiw/KnowledgePrompts/

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2412.00869

Country:

Europe (1.00)
Asia (1.00)
North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Education (0.66)
Health & Medicine > Consumer Health (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Improving speaker verification robustness with synthetic emotional utterances

Koditala, Nikhil Kumar, Ju, Chelsea Jui-Ting, Li, Ruirui, Jin, Minho, Chadha, Aman, Stolcke, Andreas

arXiv.org Artificial IntelligenceNov-29-2024

A speaker verification (SV) system offers an authentication service designed to confirm whether a given speech sample originates from a specific speaker. This technology has paved the way for various personalized applications that cater to individual preferences. A noteworthy challenge faced by SV systems is their ability to perform consistently across a range of emotional spectra. Most existing models exhibit high error rates when dealing with emotional utterances compared to neutral ones. Consequently, this phenomenon often leads to missing out on speech of interest. This issue primarily stems from the limited availability of labeled emotional speech data, impeding the development of robust speaker representations that encompass diverse emotional states. To address this concern, we propose a novel approach employing the CycleGAN framework to serve as a data augmentation method. This technique synthesizes emotional speech segments for each specific speaker while preserving the unique vocal identity. Our experimental findings underscore the effectiveness of incorporating synthetic emotional data into the training process. The models trained using this augmented dataset consistently outperform the baseline models on the task of verifying speakers in emotional speech scenarios, reducing equal error rate by as much as 3.64% relative.

artificial intelligence, machine learning, utterance, (20 more...)

arXiv.org Artificial Intelligence

2412.00319

Country: North America > United States (0.29)

Genre: Research Report > New Finding (0.66)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Speech > Acoustic Processing (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages

Vayani, Ashmal, Dissanayake, Dinura, Watawana, Hasindri, Ahsan, Noor, Sasikumar, Nevasini, Thawakar, Omkar, Ademtew, Henok Biadglign, Hmaiti, Yahya, Kumar, Amandeep, Kuckreja, Kartik, Maslych, Mykola, Ghallabi, Wafa Al, Mihaylov, Mihail, Qin, Chao, Shaker, Abdelrahman M, Zhang, Mike, Ihsani, Mahardika Krisna, Esplana, Amiel, Gokani, Monil, Mirkin, Shachar, Singh, Harsh, Srivastava, Ashay, Hamerlik, Endre, Izzati, Fathinah Asma, Maani, Fadillah Adamsyah, Cavada, Sebastian, Chim, Jenny, Gupta, Rohit, Manjunath, Sanjay, Zhumakhanova, Kamila, Rabevohitra, Feno Heriniaina, Amirudin, Azril, Ridzuan, Muhammad, Kareem, Daniya, More, Ketan, Li, Kunyang, Shakya, Pramesh, Saad, Muhammad, Ghasemaghaei, Amirpouya, Djanibekov, Amirbek, Azizov, Dilshod, Jankovic, Branislava, Bhatia, Naman, Cabrera, Alvaro, Obando-Ceron, Johan, Otieno, Olympiah, Farestam, Fabian, Rabbani, Muztoba, Baliah, Sanoojan, Sanjeev, Santosh, Shtanchaev, Abduragim, Fatima, Maheen, Nguyen, Thao, Kareem, Amrin, Aremu, Toluwani, Xavier, Nathan, Bhatkal, Amit, Toyin, Hawau, Chadha, Aman, Cholakkal, Hisham, Anwer, Rao Muhammad, Felsberg, Michael, Laaksonen, Jorma, Solorio, Thamar, Choudhury, Monojit, Laptev, Ivan, Shah, Mubarak, Khan, Salman, Khan, Fahad

arXiv.org Artificial IntelligenceNov-26-2024

Existing Large Multimodal Models (LMMs) generally focus on only a few regions and languages. As LMMs continue to improve, it is increasingly important to ensure they understand cultural contexts, respect local sensitivities, and support low-resource languages, all while effectively integrating corresponding visual cues. In pursuit of culturally diverse global multimodal models, our proposed All Languages Matter Benchmark (ALM-bench) represents the largest and most comprehensive effort to date for evaluating LMMs across 100 languages. ALM-bench challenges existing models by testing their ability to understand and reason about culturally diverse images paired with text in various languages, including many low-resource languages traditionally underrepresented in LMM research. The benchmark offers a robust and nuanced evaluation framework featuring various question formats, including true/false, multiple choice, and open-ended questions, which are further divided into short and long-answer categories. ALM-bench design ensures a comprehensive assessment of a model's ability to handle varied levels of difficulty in visual and linguistic reasoning. To capture the rich tapestry of global cultures, ALM-bench carefully curates content from 13 distinct cultural aspects, ranging from traditions and rituals to famous personalities and celebrations. Through this, ALM-bench not only provides a rigorous testing ground for state-of-the-art open and closed-source LMMs but also highlights the importance of cultural and linguistic inclusivity, encouraging the development of models that can serve diverse global populations effectively. Our benchmark is publicly available.

category, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2411.16508

Country:

Europe > Spain (1.00)
Asia > India (0.93)
Asia > Middle East (0.67)
Africa > South Africa > Gauteng (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Media > Television (0.46)
Leisure & Entertainment > Sports > Martial Arts (0.46)
Government > Military (0.46)
Media > Music (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Information Management (0.92)
(2 more...)

Add feedback

Visual Counter Turing Test (VCT^2): Discovering the Challenges for AI-Generated Image Detection and Introducing Visual AI Index (V_AI)

Imanpour, Nasrin, Bajpai, Shashwat, Ghosh, Subhankar, Sankepally, Sainath Reddy, Borah, Abhilekh, Abdullah, Hasnat Md, Kosaraju, Nishoak, Dixit, Shreyas, Aziz, Ashhar, Biswas, Shwetangshu, Jain, Vinija, Chadha, Aman, Sheth, Amit, Das, Amitava

arXiv.org Artificial IntelligenceNov-24-2024

The proliferation of AI techniques for image generation, coupled with their increasing accessibility, has raised significant concerns about the potential misuse of these images to spread misinformation. Recent AI-generated image detection (AGID) methods include CNNDetection, NPR, DM Image Detection, Fake Image Detection, DIRE, LASTED, GAN Image Detection, AIDE, SSP, DRCT, RINE, OCC-CLIP, De-Fake, and Deep Fake Detection. However, we argue that the current state-of-the-art AGID techniques are inadequate for effectively detecting contemporary AI-generated images and advocate for a comprehensive reevaluation of these methods. We introduce the Visual Counter Turing Test (VCT^2), a benchmark comprising ~130K images generated by contemporary text-to-image models (Stable Diffusion 2.1, Stable Diffusion XL, Stable Diffusion 3, DALL-E 3, and Midjourney 6). VCT^2 includes two sets of prompts sourced from tweets by the New York Times Twitter account and captions from the MS COCO dataset. We also evaluate the performance of the aforementioned AGID techniques on the VCT$^2$ benchmark, highlighting their ineffectiveness in detecting AI-generated images. As image-generative AI models continue to evolve, the need for a quantifiable framework to evaluate these models becomes increasingly critical. To meet this need, we propose the Visual AI Index (V_AI), which assesses generated images from various visual perspectives, including texture complexity and object coherence, setting a new standard for evaluating image-generative AI models. To foster research in this domain, we make our https://huggingface.co/datasets/anonymous1233/COCO_AI and https://huggingface.co/datasets/anonymous1233/twitter_AI datasets publicly available.

artificial intelligence, machine learning, midjourney 6, (16 more...)

arXiv.org Artificial Intelligence

2411.16754

Country:

North America > United States (0.94)
Europe (0.93)

Genre: Research Report > New Finding (0.68)

Industry:

Media > News (1.00)
Information Technology (1.00)
Government (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Issues (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.93)

Add feedback

ViBe: A Text-to-Video Benchmark for Evaluating Hallucination in Large Multimodal Models

Rawte, Vipula, Jain, Sarthak, Sinha, Aarush, Kaushik, Garv, Bansal, Aman, Vishwanath, Prathiksha Rumale, Jain, Samyak Rajesh, Reganti, Aishwarya Naresh, Jain, Vinija, Chadha, Aman, Sheth, Amit P., Das, Amitava

arXiv.org Artificial IntelligenceNov-16-2024

Latest developments in Large Multimodal Models (LMMs) have broadened their capabilities to include video understanding. Specifically, Text-to-video (T2V) models have made significant progress in quality, comprehension, and duration, excelling at creating videos from simple textual prompts. Yet, they still frequently produce hallucinated content that clearly signals the video is AI-generated. We introduce ViBe: a large-scale Text-to-Video Benchmark of hallucinated videos from T2V models. We identify five major types of hallucination: Vanishing Subject, Numeric Variability, Temporal Dysmorphia, Omission Error, and Physical Incongruity. Using 10 open-source T2V models, we developed the first large-scale dataset of hallucinated videos, comprising 3,782 videos annotated by humans into these five categories. ViBe offers a unique resource for evaluating the reliability of T2V models and provides a foundation for improving hallucination detection and mitigation in video generation. We establish classification as a baseline and present various ensemble classifier configurations, with the TimeSFormer + CNN combination yielding the best performance, achieving 0.345 accuracy and 0.342 F1 score. This benchmark aims to drive the development of robust T2V models that produce videos more accurately aligned with input prompts.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2411.10867

Country: North America > United States (0.94)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Vision (0.89)

Add feedback

DM-Codec: Distilling Multimodal Representations for Speech Tokenization

Ahasan, Md Mubtasim, Fahim, Md, Mohiuddin, Tasnim, Rahman, A K M Mahbubur, Chadha, Aman, Iqbal, Tariq, Amin, M Ashraful, Islam, Md Mofijul, Ali, Amin Ahsan

arXiv.org Artificial IntelligenceOct-19-2024

Recent advancements in speech-language models have yielded significant improvements in speech tokenization and synthesis. However, effectively mapping the complex, multidimensional attributes of speech into discrete tokens remains challenging. Existing speech representations generally fall into two categories: acoustic tokens from audio codecs and semantic tokens from speech self-supervised learning models. Although recent efforts have unified acoustic and semantic tokens for improved performance, they overlook the crucial role of contextual representation in comprehensive speech modeling. Our empirical investigations reveal that the absence of contextual representations results in elevated Word Error Rate (WER) and Word Information Lost (WIL) scores in speech transcriptions. To address these limitations, we propose two novel distillation approaches: (1) a language model (LM)-guided distillation method that incorporates contextual information, and (2) a combined LM and self-supervised speech model (SM)-guided distillation technique that effectively distills multimodal representations (acoustic, semantic, and contextual) into a comprehensive speech tokenizer, termed DM-Codec. The DM-Codec architecture adopts a streamlined encoder-decoder framework with a Residual Vector Quantizer (RVQ) and incorporates the LM and SM during the training process. Experiments show DM-Codec significantly outperforms state-of-the-art speech tokenization models, reducing WER by up to 13.46%, WIL by 9.82%, and improving speech quality by 5.84% and intelligibility by 1.85% on the LibriSpeech benchmark dataset. In recent years, the advent of Large Language Models (LLMs) has revolutionized various domains, offering unprecedented advancements across a wide array of tasks (OpenAI, 2024). A critical component of this success has been the tokenization of input data, enabling vast amounts of information processing (Du et al., 2024; Rust et al., 2021).

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2410.15017

Country: Europe (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Overview of Factify5WQA: Fact Verification through 5W Question-Answering

Suresh, Suryavardan, Rani, Anku, Patwa, Parth, Reganti, Aishwarya, Jain, Vinija, Chadha, Aman, Das, Amitava, Sheth, Amit, Ekbal, Asif

arXiv.org Artificial IntelligenceOct-5-2024

Researchers have found that fake news spreads much times faster than real news [1]. This is a major problem, especially in today's world where social media is the key source of news for many among the younger population. Fact verification, thus, becomes an important task and many media sites contribute to the cause. Manual fact verification is a tedious task, given the volume of fake news online. The Factify5WQA shared task aims to increase research towards automated fake news detection by providing a dataset with an aspect-based question answering based fact verification method. Each claim and its supporting document is associated with 5W questions that help compare the two information sources. The objective performance measure in the task is done by comparing answers using BLEU score to measure the accuracy of the answers, followed by an accuracy measure of the classification. The task had submissions using custom training setup and pre-trained language-models among others. The best performing team posted an accuracy of 69.56%, which is a near 35% improvement over the baseline.

machine learning, natural language, question answering, (15 more...)

arXiv.org Artificial Intelligence

2410.04236

Country: North America > United States > Minnesota (0.28)

Genre: Research Report (0.65)

Industry: Media > News (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.61)

Add feedback

MedVisionLlama: Leveraging Pre-Trained Large Language Model Layers to Enhance Medical Image Segmentation

Kumar, Gurucharan Marthi Krishna, Chadha, Aman, Mendola, Janine, Shmuel, Amir

arXiv.org Artificial IntelligenceOct-4-2024

Large Language Models (LLMs), known for their versatility in textual data, are increasingly being explored for their potential to enhance medical image segmentation, a crucial task for accurate diagnostic imaging. This study explores enhancing Vision Transformers (ViTs) for medical image segmentation by integrating pre-trained LLM transformer blocks. Our approach, which incorporates a frozen LLM transformer block into the encoder of a ViT-based model, leads to substantial improvements in segmentation performance across various medical imaging modalities. We propose a Hybrid Attention Mechanism that combines global and local feature learning with a Multi-Scale Fusion Block for aggregating features across different scales. The enhanced model shows significant performance gains, including an average Dice score increase from 0.74 to 0.79 and improvements in accuracy, precision, and the Jaccard Index. These results demonstrate the effectiveness of LLM-based transformers in refining medical image segmentation, highlighting their potential to significantly boost model accuracy and robustness. The source code and our implementation are available at: https://bit.ly/3zf2CVs

large language model, machine learning, segmentation, (18 more...)

arXiv.org Artificial Intelligence

2410.02458

Country: Europe (0.93)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

How Well Do LLMs Represent Values Across Cultures? Empirical Analysis of LLM Responses Based on Hofstede Cultural Dimensions

Kharchenko, Julia, Roosta, Tanya, Chadha, Aman, Shah, Chirag

arXiv.org Artificial IntelligenceJun-20-2024

Large Language Models (LLMs) attempt to imitate human behavior by responding to humans in a way that pleases them, including by adhering to their values. However, humans come from diverse cultures with different values. It is critical to understand whether LLMs showcase different values to the user based on the stereotypical values of a user's known country. We prompt different LLMs with a series of advice requests based on 5 Hofstede Cultural Dimensions -- a quantifiable way of representing the values of a country. Throughout each prompt, we incorporate personas representing 36 different countries and, separately, languages predominantly tied to each country to analyze the consistency in the LLMs' cultural understanding. Through our analysis of the responses, we found that LLMs can differentiate between one side of a value and another, as well as understand that countries have differing values, but will not always uphold the values when giving advice, and fail to understand the need to answer differently based on different cultural values. Rooted in these findings, we present recommendations for training value-aligned and culturally sensitive LLMs. More importantly, the methodology and the framework developed here can help further understand and mitigate culture and language alignment issues with LLMs.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2406.14805

Country:

Europe (1.00)
Asia (1.00)
North America > United States > California > Santa Clara County (0.14)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.46)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Investigating Annotator Bias in Large Language Models for Hate Speech Detection

Das, Amit, Zhang, Zheng, Jamshidi, Fatemeh, Jain, Vinija, Chadha, Aman, Raychawdhary, Nilanjana, Sandage, Mary, Pope, Lauramarie, Dozier, Gerry, Seals, Cheryl

arXiv.org Artificial IntelligenceJun-18-2024

Data annotation, the practice of assigning descriptive labels to raw data, is pivotal in optimizing the performance of machine learning models. However, it is a resource-intensive process susceptible to biases introduced by annotators. The emergence of sophisticated Large Language Models (LLMs), like ChatGPT presents a unique opportunity to modernize and streamline this complex procedure. While existing research extensively evaluates the efficacy of LLMs, as annotators, this paper delves into the biases present in LLMs, specifically GPT 3.5 and GPT 4o when annotating hate speech data. Our research contributes to understanding biases in four key categories: gender, race, religion, and disability. Specifically targeting highly vulnerable groups within these categories, we analyze annotator biases. Furthermore, we conduct a comprehensive examination of potential factors contributing to these biases by scrutinizing the annotated data. We introduce our custom hate speech detection dataset, HateSpeechCorpus, to conduct this research. Additionally, we perform the same experiments on the ETHOS (Mollas et al., 2022) dataset also for comparative analysis. This paper serves as a crucial resource, guiding researchers and practitioners in harnessing the potential of LLMs for dataannotation, thereby fostering advancements in this critical field. The HateSpeechCorpus dataset is available here: https://github.com/AmitDasRup123/HateSpeechCorpus

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2406.11109

Country: North America > Canada (0.14)

Genre: Research Report > New Finding (0.93)

Industry: Health & Medicine > Therapeutic Area (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback