AITopics | coherence relation

Collaborating Authors

coherence relation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

IRONIC: Coherence-Aware Reasoning Chains for Multi-Modal Sarcasm Detection

Ramakrishnan, Aashish Anantha, Ramakrishnan, Aadarsh Anantha, Lee, Dongwon

arXiv.org Artificial IntelligenceAug-26-2025

Interpreting figurative language such as sarcasm across multi-modal inputs presents unique challenges, often requiring task-specific fine-tuning and extensive reasoning steps. However, current Chain-of-Thought approaches do not efficiently leverage the same cognitive processes that enable humans to identify sarcasm. We present IRONIC, an in-context learning framework that leverages Multi-modal Coherence Relations to analyze referential, analogical and pragmatic image-text linkages. Our experiments show that IRONIC achieves state-of-the-art performance on zero-shot Multi-modal Sarcasm Detection across different baselines. This demonstrates the need for incorporating linguistic and cognitive insights into the design of multi-modal reasoning strategies. Our code is available at: https://github.com/aashish2000/IRONIC

computational linguistic, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2505.16258

Country: North America > United States (0.48)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
(2 more...)

Add feedback

RONA: Pragmatically Diverse Image Captioning with Coherence Relations

Ramakrishnan, Aashish Anantha, Ramakrishnan, Aadarsh Anantha, Lee, Dongwon

arXiv.org Artificial IntelligenceMar-13-2025

Writing Assistants (e.g., Grammarly, Microsoft Copilot) traditionally generate diverse image captions by employing syntactic and semantic variations to describe image components. However, human-written captions prioritize conveying a central message alongside visual descriptions using pragmatic cues. To enhance pragmatic diversity, it is essential to explore alternative ways of communicating these messages in conjunction with visual content. To address this challenge, we propose RONA, a novel prompting strategy for Multi-modal Large Language Models (MLLM) that leverages Coherence Relations as an axis for variation. We demonstrate that RONA generates captions with better overall diversity and ground-truth alignment, compared to MLLM baselines across multiple domains. Our code is available at: https://github.com/aashish2000/RONA

caption, computational linguistic, relation, (14 more...)

arXiv.org Artificial Intelligence

2503.10997

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Pennsylvania (0.04)
North America > United States > California > San Mateo County > Menlo Park (0.04)
(5 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

CORDIAL: Can Multimodal Large Language Models Effectively Understand Coherence Relationships?

Ramakrishnan, Aashish Anantha, Ramakrishnan, Aadarsh Anantha, Lee, Dongwon

arXiv.org Artificial IntelligenceFeb-16-2025

Multimodal Large Language Models (MLLMs) are renowned for their superior instruction-following and reasoning capabilities across diverse problem domains. However, existing benchmarks primarily focus on assessing factual and logical correctness in downstream tasks, with limited emphasis on evaluating MLLMs' ability to interpret pragmatic cues and intermodal relationships. To address this gap, we assess the competency of MLLMs in performing Multimodal Discourse Analysis (MDA) using Coherence Relations. Our benchmark, CORDIAL, encompasses a broad spectrum of Coherence Relations across 3 different discourse domains at varying levels of granularity. Through our experiments on 10+ MLLMs employing different prompting strategies, we show that even top models like Gemini 1.5 Pro and GPT-4o fail to match the performance of simple classifier-based baselines. This study emphasizes the need to move beyond similarity-based metrics and adopt a discourse-driven framework for evaluating MLLMs, providing a more nuanced assessment of their capabilities. The benchmark and code are available at: https://github.com/aashish2000/CORDIAL.

coherence relation, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2502.113

Country:

Europe (0.46)
North America > United States (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Coherence-Driven Multimodal Safety Dialogue with Active Learning for Embodied Agents

Hassan, Sabit, Chung, Hye-Young, Tan, Xiang Zhi, Alikhani, Malihe

arXiv.org Artificial IntelligenceOct-17-2024

When assisting people in daily tasks, robots need to accurately interpret visual cues and respond effectively in diverse safety-critical situations, such as sharp objects on the floor. In this context, we present M-CoDAL, a multimodal-dialogue system specifically designed for embodied agents to better understand and communicate in safety-critical situations. The system leverages discourse coherence relations to enhance its contextual understanding and communication abilities. To train this system, we introduce a novel clustering-based active learning mechanism that utilizes an external Large Language Model (LLM) to identify informative instances. Our approach is evaluated using a newly created multimodal dataset comprising 1K safety violations extracted from 2K Reddit images. These violations are annotated using a Large Multimodal Model (LMM) and verified by human annotators. Results with this dataset demonstrate that our approach improves resolution of safety situations, user sentiment, as well as safety of the conversation. Next, we deploy our dialogue system on a Hello Robot Stretch robot and conduct a within-subject user study with real-world participants. In the study, participants role-play two safety scenarios with different levels of severity with the robot and receive interventions from our model and a baseline system powered by OpenAI's ChatGPT. The study results corroborate and extend the findings from automated evaluation, showing that our proposed system is more persuasive and competent in a real-world embodied agent setting.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2410.14141

Country:

South America > Colombia > Meta Department > Villavicencio (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
Asia > Indonesia > Bali (0.04)
(15 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Questionnaire & Opinion Survey (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

COSMic: A Coherence-Aware Generation Metric for Image Descriptions

İnan, Mert, Sharma, Piyush, Khalid, Baber, Soricut, Radu, Stone, Matthew, Alikhani, Malihe

arXiv.org Artificial IntelligenceSep-11-2021

Developers of text generation models rely on automated evaluation metrics as a stand-in for slow and expensive manual evaluations. However, image captioning metrics have struggled to give accurate learned estimates of the semantic and pragmatic success of output text. We address this weakness by introducing the first discourse-aware learned generation metric for evaluating image descriptions. Our approach is inspired by computational theories of discourse for capturing information goals using coherence. We present a dataset of image$\unicode{x2013}$description pairs annotated with coherence relations. We then train a coherence-aware metric on a subset of the Conceptual Captions dataset and measure its effectiveness$\unicode{x2014}$its ability to predict human ratings of output captions$\unicode{x2014}$on a test set composed of out-of-domain images. We demonstrate a higher Kendall Correlation Coefficient for our proposed metric with the human judgments for the results of a number of state-of-the-art coherence-aware caption generation models when compared to several other metrics including recently proposed learned metrics such as BLEURT and BERTScore.

caption, dataset, relation, (15 more...)

arXiv.org Artificial Intelligence

2109.05281

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > California > Los Angeles County > Los Angeles (0.04)
(4 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Clue: Cross-modal Coherence Modeling for Caption Generation

Alikhani, Malihe, Sharma, Piyush, Li, Shengjie, Soricut, Radu, Stone, Matthew

arXiv.org Artificial IntelligenceMay-2-2020

We use coherence relations inspired by computational models of discourse to study the information needs and goals of image captioning. Using an annotation protocol specifically devised for capturing image--caption coherence relations, we annotate 10,000 instances from publicly-available image--caption pairs. We introduce a new task for learning inferences in imagery and text, coherence relation prediction, and show that these coherence annotations can be exploited to learn relation classifiers as an intermediary step, and also train coherence-aware, controllable image captioning models. The results show a dramatic improvement in the consistency and quality of the generated captions with respect to information needs specified via coherence relations.

machine learning, natural language, relation, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2020.acl-main.583

2005.00908

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > New York > New York County > New York City (0.04)
(5 more...)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Communications > Social Media (0.93)
Information Technology > Artificial Intelligence > Vision (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Towards Predicting Difficulty of Reading Comprehension Questions

Desai, Takshak (University of Texas at Dallas) | Moldovan, Dan I. (University of Texas at Dallas)

AAAI ConferencesMay-15-2019

We present a corpus and approach to deduce the difficulty of questions asked in a reading comprehension test. A feature-driven model is designed that associates each question with a difficulty level. This would eliminate the laborious task of manually annotating questions in a computerized testing environment. Experiments performed on our corpus show that our model can classify questions with a micro F-score of 0.68.

artificial intelligence, relation, text processing, (21 more...)

AAAI Conferences

The Thirty-Second International Flairs Conference

Country: North America > United States > Texas (0.14)

Industry:

Energy > Oil & Gas (1.00)
Education > Assessment & Standards > Student Performance (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.46)

Add feedback

Discourse-Based Objectives for Fast Unsupervised Sentence Representation Learning

Jernite, Yacine, Bowman, Samuel R., Sontag, David

arXiv.org Machine LearningApr-23-2017

This work presents a novel objective function for the unsupervised training of neural network sentence encoders. It exploits signals from paragraph-level discourse coherence to train these models to understand text. Our objective is purely discriminative, allowing us to train models many times faster than was possible under prior methods, and it yields models which perform well in extrinsic evaluations.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Machine Learning

1705.00557

Country: