AITopics | Patnaik, Sohan

Collaborating Authors

Patnaik, Sohan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SERPENT-VLM : Self-Refining Radiology Report Generation Using Vision Language Models

Kapadnis, Manav Nitin, Patnaik, Sohan, Nandy, Abhilash, Ray, Sourjyadip, Goyal, Pawan, Sheet, Debdoot

arXiv.org Artificial IntelligenceApr-27-2024

Radiology Report Generation (R2Gen) demonstrates how Multi-modal Large Language Models (MLLMs) can automate the creation of accurate and coherent radiological reports. Existing methods often hallucinate details in text-based reports that don't accurately reflect the image content. To mitigate this, we introduce a novel strategy, SERPENT-VLM (SElf Refining Radiology RePort GENeraTion using Vision Language Models), which improves the R2Gen task by integrating a self-refining mechanism into the MLLM framework. We employ a unique self-supervised loss that leverages similarity between pooled image representations and the contextual representations of the generated radiological text, alongside the standard Causal Language Modeling objective, to refine image-text representations. This allows the model to scrutinize and align the generated text through dynamic interaction between a given image and the generated text, therefore reducing hallucination and continuously enhancing nuanced report generation. SERPENT-VLM outperforms existing baselines such as LLaVA-Med, BiomedGPT, etc., achieving SoTA performance on the IU X-ray and Radiology Objects in COntext (ROCO) datasets, and also proves to be robust against noisy images. A qualitative case study emphasizes the significant advancements towards more sophisticated MLLM frameworks for R2Gen, opening paths for further research into self-supervised refinement in the medical imaging domain.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2404.17912

Country: North America > United States (0.46)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Nuclear Medicine (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

CABINET: Content Relevance based Noise Reduction for Table Question Answering

Patnaik, Sohan, Changwal, Heril, Aggarwal, Milan, Bhatia, Sumit, Kumar, Yaman, Krishnamurthy, Balaji

arXiv.org Artificial IntelligenceFeb-4-2024

Table understanding capability of Large Language Models (LLMs) has been extensively studied through the task of question-answering (QA) over tables. Typically, only a small part of the whole table is relevant to derive the answer for a given question. The irrelevant parts act as noise and are distracting information, resulting in sub-optimal performance due to the vulnerability of LLMs to noise. To mitigate this, we propose CABINET (Content RelevAnce-Based NoIse ReductioN for TablE QuesTion-Answering) - a framework to enable LLMs to focus on relevant tabular data by suppressing extraneous information. CABINET comprises an Unsupervised Relevance Scorer (URS), trained differentially with the QA LLM, that weighs the table content based on its relevance to the input question before feeding it to the question-answering LLM (QA LLM). To further aid the relevance scorer, CABINET employs a weakly supervised module that generates a parsing statement describing the criteria of rows and columns relevant to the question and highlights the content of corresponding table cells. CABINET significantly outperforms various tabular LLM baselines, as well as GPT3-based in-context learning methods, is more robust to noise, maintains outperformance on tables of varying sizes, and establishes new SoTA performance on WikiTQ, FeTaQA, and WikiSQL datasets. We release our code and datasets at https://github.com/Sohanpatnaik106/CABINET_QA.

computational linguistic, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2402.01155

Country:

Europe (0.68)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.64)

Industry:

Media (0.93)
Leisure & Entertainment > Sports (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

$FastDoc$: Domain-Specific Fast Pre-training Technique using Document-Level Metadata and Taxonomy

Nandy, Abhilash, Kapadnis, Manav Nitin, Patnaik, Sohan, Butala, Yash Parag, Goyal, Pawan, Ganguly, Niloy

arXiv.org Artificial IntelligenceNov-14-2023

As the demand for sophisticated Natural Language Processing (NLP) models continues to grow, so does the need for efficient pre-training techniques. Current NLP models undergo resource-intensive pre-training. In response, we introduce $FastDoc$ (Fast Pre-training Technique using Document-Level Metadata and Taxonomy), a novel approach designed to significantly reduce computational demands. $FastDoc$ leverages document metadata and domain-specific taxonomy as supervision signals. It involves continual pre-training of an open-domain transformer encoder using sentence-level embeddings, followed by fine-tuning using token-level embeddings. We evaluate $FastDoc$ on six tasks across nine datasets spanning three distinct domains. Remarkably, $FastDoc$ achieves remarkable compute reductions of approximately 1,000x, 4,500x, 500x compared to competitive approaches in Customer Support, Scientific, and Legal domains, respectively. Importantly, these efficiency gains do not compromise performance relative to competitive baselines. Furthermore, reduced pre-training data mitigates catastrophic forgetting, ensuring consistent performance in open-domain scenarios. $FastDoc$ offers a promising solution for resource-efficient pre-training, with potential applications spanning various domains.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2306.0619

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > Promising Solution (0.54)

Industry:

Information Technology (0.93)
Health & Medicine (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback