AITopics | Yacoob, Yaser

Collaborating Authors

Yacoob, Yaser

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination & Visual Illusion in Large Vision-Language Models

Guan, Tianrui, Liu, Fuxiao, Wu, Xiyang, Xian, Ruiqi, Li, Zongxia, Liu, Xiaoyu, Wang, Xijun, Chen, Lichang, Huang, Furong, Yacoob, Yaser, Manocha, Dinesh, Zhou, Tianyi

arXiv.org Artificial IntelligenceNov-28-2023

We introduce HallusionBench, a comprehensive benchmark designed for the evaluation of image-context reasoning. This benchmark presents significant challenges to advanced large visual-language models (LVLMs), such as GPT-4V(Vision) and LLaVA-1.5, by emphasizing nuanced understanding and interpretation of visual data. The benchmark comprises 346 images paired with 1129 questions, all meticulously crafted by human experts. We introduce a novel structure for these visual questions designed to establish control groups. This structure enables us to conduct a quantitative analysis of the models' response tendencies, logical consistency, and various failure modes. In our evaluation on HallusionBench, we benchmarked 13 different models, highlighting a 31.42% question-pair accuracy achieved by the state-of-the-art GPT-4V. Notably, all other evaluated models achieve accuracy below 16%. Moreover, our analysis not only highlights the observed failure modes, including language hallucination and visual illusion, but also deepens an understanding of these pitfalls. Our comprehensive case studies within HallusionBench shed light on the challenges of hallucination and illusion in LVLMs. Based on these insights, we suggest potential pathways for their future improvement. The benchmark and codebase can be accessed at https://github.com/tianyi-lab/HallusionBench.

large language model, llava-1, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2310.14566

Country:

Asia (1.00)
North America > United States > Maryland (0.14)
Europe > United Kingdom > England (0.14)

Genre: Research Report (1.00)

Industry:

Government > Regional Government > North America Government > United States Government (0.92)
Leisure & Entertainment > Sports (0.70)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

MMC: Advancing Multimodal Chart Understanding with Large-scale Instruction Tuning

Liu, Fuxiao, Wang, Xiaoyang, Yao, Wenlin, Chen, Jianshu, Song, Kaiqiang, Cho, Sangwoo, Yacoob, Yaser, Yu, Dong

arXiv.org Artificial IntelligenceNov-15-2023

With the rapid development of large language models (LLMs) and their integration into large multimodal models (LMMs), there has been impressive progress in zero-shot completion of user-oriented vision-language tasks. However, a gap remains in the domain of chart image understanding due to the distinct abstract components in charts. To address this, we introduce a large-scale MultiModal Chart Instruction (MMC-Instruction) dataset comprising 600k instances supporting diverse tasks and chart types. Leveraging this data, we develop MultiModal Chart Assistant (MMCA), an LMM that achieves state-of-the-art performance on existing chart QA benchmarks. Recognizing the need for a comprehensive evaluation of LMM chart understanding, we also propose a MultiModal Chart Benchmark (MMC-Benchmark), a comprehensive human-annotated benchmark with 9 distinct tasks evaluating reasoning capabilities over charts. Extensive experiments on MMC-Benchmark reveal the limitations of existing LMMs on correctly interpreting charts, even for the most recent GPT-4V model. Our work provides an instruction-tuning methodology and benchmark to advance multimodal understanding of charts.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2311.10774

Country:

Europe (0.68)
Asia (0.48)
North America > United States > Maryland (0.14)

Genre: Research Report (0.64)

Industry:

Health & Medicine (0.51)
Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.73)

Add feedback

Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning

Liu, Fuxiao, Lin, Kevin, Li, Linjie, Wang, Jianfeng, Yacoob, Yaser, Wang, Lijuan

arXiv.org Artificial IntelligenceSep-29-2023

Despite the promising progress in multi-modal tasks, current large multi-modal models (LMMs) are prone to hallucinating inconsistent descriptions with respect to the associated image and human instructions. This paper addresses this issue by introducing the first large and diverse visual instruction tuning dataset, named Large-scale Robust Visual (LRV)-Instruction. Our dataset comprises 400k visual instructions generated by GPT4, covering 16 vision-and-language tasks with open-ended instructions and answers. Unlike existing studies that primarily focus on positive instruction samples, we design LRV-Instruction to include both positive and negative instructions for more robust visual instruction tuning. Our negative instructions are designed at three semantic levels: (i) Nonexistent Object Manipulation, (ii) Existent Object Manipulation and (iii) Knowledge Manipulation. To efficiently measure the hallucination generated by LMMs, we propose GPT4-Assisted Visual Instruction Evaluation (GAVIE), a stable approach to evaluate visual instruction tuning like human experts. GAVIE does not require human-annotated groundtruth answers and can adapt to diverse instruction formats. We conduct comprehensive experiments to investigate the hallucination of LMMs. Our results demonstrate existing LMMs exhibit significant hallucinations when presented with our negative instructions, particularly Existent Object and Knowledge Manipulation instructions. Moreover, we successfully mitigate hallucination by finetuning MiniGPT4 and mPLUG-Owl on LRV-Instruction while improving performance on several public datasets compared to state-of-the-art methods. Additionally, we observed that a balanced ratio of positive and negative instances in the training data leads to a more robust model.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2306.14565

Country:

Asia (0.68)
North America > United States > Maryland (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment (0.68)
Health & Medicine > Therapeutic Area (0.47)
Consumer Products & Services > Food, Beverage, Tobacco & Cannabis (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

COVID-VTS: Fact Extraction and Verification on Short Video Platforms

Liu, Fuxiao, Yacoob, Yaser, Shrivastava, Abhinav

arXiv.org Artificial IntelligenceFeb-15-2023

We introduce a new benchmark, COVID-VTS, for fact-checking multi-modal information involving short-duration videos with COVID19- focused information from both the real world and machine generation. We propose, TwtrDetective, an effective model incorporating cross-media consistency checking to detect token-level malicious tampering in different modalities, and generate explanations. Due to the scarcity of training data, we also develop an efficient and scalable approach to automatically generate misleading video posts by event manipulation or adversarial matching. We investigate several state-of-the-art models and demonstrate the superiority of TwtrDetective.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2302.07919

Country:

Europe (0.68)
North America > United States (0.28)

Genre: Research Report > Promising Solution (0.34)

Industry:

Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.97)
Health & Medicine > Pharmaceuticals & Biotechnology (0.69)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback