AITopics | Kapadnis, Manav Nitin

Collaborating Authors

Kapadnis, Manav Nitin

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SERPENT-VLM : Self-Refining Radiology Report Generation Using Vision Language Models

Kapadnis, Manav Nitin, Patnaik, Sohan, Nandy, Abhilash, Ray, Sourjyadip, Goyal, Pawan, Sheet, Debdoot

arXiv.org Artificial IntelligenceApr-27-2024

Radiology Report Generation (R2Gen) demonstrates how Multi-modal Large Language Models (MLLMs) can automate the creation of accurate and coherent radiological reports. Existing methods often hallucinate details in text-based reports that don't accurately reflect the image content. To mitigate this, we introduce a novel strategy, SERPENT-VLM (SElf Refining Radiology RePort GENeraTion using Vision Language Models), which improves the R2Gen task by integrating a self-refining mechanism into the MLLM framework. We employ a unique self-supervised loss that leverages similarity between pooled image representations and the contextual representations of the generated radiological text, alongside the standard Causal Language Modeling objective, to refine image-text representations. This allows the model to scrutinize and align the generated text through dynamic interaction between a given image and the generated text, therefore reducing hallucination and continuously enhancing nuanced report generation. SERPENT-VLM outperforms existing baselines such as LLaVA-Med, BiomedGPT, etc., achieving SoTA performance on the IU X-ray and Radiology Objects in COntext (ROCO) datasets, and also proves to be robust against noisy images. A qualitative case study emphasizes the significant advancements towards more sophisticated MLLM frameworks for R2Gen, opening paths for further research into self-supervised refinement in the medical imaging domain.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2404.17912

Country: North America > United States (0.46)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Nuclear Medicine (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

$FastDoc$: Domain-Specific Fast Pre-training Technique using Document-Level Metadata and Taxonomy

Nandy, Abhilash, Kapadnis, Manav Nitin, Patnaik, Sohan, Butala, Yash Parag, Goyal, Pawan, Ganguly, Niloy

arXiv.org Artificial IntelligenceNov-14-2023

As the demand for sophisticated Natural Language Processing (NLP) models continues to grow, so does the need for efficient pre-training techniques. Current NLP models undergo resource-intensive pre-training. In response, we introduce $FastDoc$ (Fast Pre-training Technique using Document-Level Metadata and Taxonomy), a novel approach designed to significantly reduce computational demands. $FastDoc$ leverages document metadata and domain-specific taxonomy as supervision signals. It involves continual pre-training of an open-domain transformer encoder using sentence-level embeddings, followed by fine-tuning using token-level embeddings. We evaluate $FastDoc$ on six tasks across nine datasets spanning three distinct domains. Remarkably, $FastDoc$ achieves remarkable compute reductions of approximately 1,000x, 4,500x, 500x compared to competitive approaches in Customer Support, Scientific, and Legal domains, respectively. Importantly, these efficiency gains do not compromise performance relative to competitive baselines. Furthermore, reduced pre-training data mitigates catastrophic forgetting, ensuring consistent performance in open-domain scenarios. $FastDoc$ offers a promising solution for resource-efficient pre-training, with potential applications spanning various domains.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2306.0619

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > Promising Solution (0.54)

Industry:

Information Technology (0.93)
Health & Medicine (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

CLMSM: A Multi-Task Learning Framework for Pre-training on Procedural Text

Nandy, Abhilash, Kapadnis, Manav Nitin, Goyal, Pawan, Ganguly, Niloy

arXiv.org Artificial IntelligenceOct-22-2023

In this paper, we propose CLMSM, a domain-specific, continual pre-training framework, that learns from a large set of procedural recipes. CLMSM uses a Multi-Task Learning Framework to optimize two objectives - a) Contrastive Learning using hard triplets to learn fine-grained differences across entities in the procedures, and b) a novel Mask-Step Modelling objective to learn step-wise context of a procedure. We test the performance of CLMSM on the downstream tasks of tracking entities and aligning actions between two procedures on three datasets, one of which is an open-domain dataset not conforming with the pre-training dataset. We show that CLMSM not only outperforms baselines on recipes (in-domain) but is also able to generalize to open-domain procedural NLP tasks.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2310.14326

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > Experimental Study (0.46)

Industry: Education (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback