AITopics | Ramakrishnan, Aashish Anantha

Collaborating Authors

Ramakrishnan, Aashish Anantha

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

RONA: Pragmatically Diverse Image Captioning with Coherence Relations

Ramakrishnan, Aashish Anantha, Ramakrishnan, Aadarsh Anantha, Lee, Dongwon

arXiv.org Artificial IntelligenceMar-13-2025

Writing Assistants (e.g., Grammarly, Microsoft Copilot) traditionally generate diverse image captions by employing syntactic and semantic variations to describe image components. However, human-written captions prioritize conveying a central message alongside visual descriptions using pragmatic cues. To enhance pragmatic diversity, it is essential to explore alternative ways of communicating these messages in conjunction with visual content. To address this challenge, we propose RONA, a novel prompting strategy for Multi-modal Large Language Models (MLLM) that leverages Coherence Relations as an axis for variation. We demonstrate that RONA generates captions with better overall diversity and ground-truth alignment, compared to MLLM baselines across multiple domains. Our code is available at: https://github.com/aashish2000/RONA

caption, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2503.10997

Country:

North America > United States (0.48)
Europe > United Kingdom > England (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

CORDIAL: Can Multimodal Large Language Models Effectively Understand Coherence Relationships?

Ramakrishnan, Aashish Anantha, Ramakrishnan, Aadarsh Anantha, Lee, Dongwon

arXiv.org Artificial IntelligenceFeb-16-2025

Multimodal Large Language Models (MLLMs) are renowned for their superior instruction-following and reasoning capabilities across diverse problem domains. However, existing benchmarks primarily focus on assessing factual and logical correctness in downstream tasks, with limited emphasis on evaluating MLLMs' ability to interpret pragmatic cues and intermodal relationships. To address this gap, we assess the competency of MLLMs in performing Multimodal Discourse Analysis (MDA) using Coherence Relations. Our benchmark, CORDIAL, encompasses a broad spectrum of Coherence Relations across 3 different discourse domains at varying levels of granularity. Through our experiments on 10+ MLLMs employing different prompting strategies, we show that even top models like Gemini 1.5 Pro and GPT-4o fail to match the performance of simple classifier-based baselines. This study emphasizes the need to move beyond similarity-based metrics and adopt a discourse-driven framework for evaluating MLLMs, providing a more nuanced assessment of their capabilities. The benchmark and code are available at: https://github.com/aashish2000/CORDIAL.

coherence relation, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2502.113

Country:

Europe (0.46)
North America > United States (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

From Intentions to Techniques: A Comprehensive Taxonomy and Challenges in Text Watermarking for Large Language Models

Lalai, Harsh Nishant, Ramakrishnan, Aashish Anantha, Shah, Raj Sanjay, Lee, Dongwon

arXiv.org Artificial IntelligenceJun-16-2024

With the rapid growth of Large Language Models (LLMs), safeguarding textual content against unauthorized use is crucial. Text watermarking offers a vital solution, protecting both - LLM-generated and plain text sources. This paper presents a unified overview of different perspectives behind designing watermarking techniques, through a comprehensive survey of the research literature. Our work has two key advantages, (1) we analyze research based on the specific intentions behind different watermarking techniques, evaluation datasets used, watermarking addition, and removal methods to construct a cohesive taxonomy. (2) We highlight the gaps and open challenges in text watermarking to promote research in protecting text authorship. This extensive coverage and detailed analysis sets our work apart, offering valuable insights into the evolving landscape of text watermarking in language models.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2406.11106

Country:

North America > United States > New York (0.14)
North America > United States > Maryland (0.14)

Genre:

Overview (1.00)
Research Report (0.82)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ANCHOR: LLM-driven News Subject Conditioning for Text-to-Image Synthesis

Ramakrishnan, Aashish Anantha, Huang, Sharon X., Lee, Dongwon

arXiv.org Artificial IntelligenceApr-15-2024

Text-to-Image (T2I) Synthesis has made tremendous strides in enhancing synthesized image quality, but current datasets evaluate model performance only on descriptive, instruction-based prompts. Real-world news image captions take a more pragmatic approach, providing high-level situational and Named-Entity (NE) information and limited physical object descriptions, making them abstractive. To evaluate the ability of T2I models to capture intended subjects from news captions, we introduce the Abstractive News Captions with High-level cOntext Representation (ANCHOR) dataset, containing 70K+ samples sourced from 5 different news media organizations. With Large Language Models (LLM) achieving success in language and commonsense reasoning tasks, we explore the ability of different LLMs to identify and understand key subjects from abstractive captions. Our proposed method Subject-Aware Finetuning (SAFE), selects and enhances the representation of key subjects in synthesized images by leveraging LLM-generated subject weights. It also adapts to the domain distribution of news images and captions through custom Domain Fine-tuning, outperforming current T2I baselines on ANCHOR. By launching the ANCHOR dataset, we hope to motivate research in furthering the Natural Language Understanding (NLU) capabilities of T2I models.

caption, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2404.10141

Country:

North America > United States > Pennsylvania (0.14)
North America > United States > Iowa (0.14)

Genre: Research Report (1.00)

Industry:

Government (0.67)
Media > News (0.66)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback