AITopics | Dasgupta, Subhrajyoti

Collaborating Authors

Dasgupta, Subhrajyoti

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs

Chowdhury, Sanjoy, Nag, Sayan, Dasgupta, Subhrajyoti, Wang, Yaoting, Elhoseiny, Mohamed, Gao, Ruohan, Manocha, Dinesh

arXiv.org Artificial IntelligenceJan-3-2025

With the rapid advancement of Multi-modal Large Language Models (MLLMs), several diagnostic benchmarks have recently been developed to assess these models' multi-modal reasoning proficiency. However, these benchmarks are restricted to assessing primarily the visual aspect and do not examine the holistic audio-visual (AV) understanding. Moreover, currently, there are no benchmarks that investigate the capabilities of AVLLMs to calibrate their responses when presented with perturbed inputs. To this end, we introduce Audio-Visual Trustworthiness assessment Benchmark (AVTrustBench), comprising 600K samples spanning over 9 meticulously crafted tasks, evaluating the capabilities of AVLLMs across three distinct dimensions: Adversarial attack, Compositional reasoning, and Modality-specific dependency. Using our benchmark we extensively evaluate 13 state-of-the-art AVLLMs. The findings reveal that the majority of existing models fall significantly short of achieving human-like comprehension, offering valuable insights for future research directions. To alleviate the limitations in the existing approaches, we further propose a robust, model-agnostic calibrated audio-visual preference optimization based training strategy CAVPref, obtaining a gain up to 30.19% across all 9 tasks. We will publicly release our code and benchmark to facilitate future research in this direction.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2501.02135

Country:

North America > United States > Maryland (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report > New Finding (0.92)

Industry:

Media (0.46)
Information Technology > Security & Privacy (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Meerkat: Audio-Visual Large Language Model for Grounding in Space and Time

Chowdhury, Sanjoy, Nag, Sayan, Dasgupta, Subhrajyoti, Chen, Jun, Elhoseiny, Mohamed, Gao, Ruohan, Manocha, Dinesh

arXiv.org Artificial IntelligenceJul-3-2024

Leveraging Large Language Models' remarkable proficiency in text-based tasks, recent works on Multi-modal LLMs (MLLMs) extend them to other modalities like vision and audio. However, the progress in these directions has been mostly focused on tasks that only require a coarse-grained understanding of the audio-visual semantics. We present Meerkat, an audio-visual LLM equipped with a fine-grained understanding of image and audio both spatially and temporally. With a new modality alignment module based on optimal transport and a cross-attention module that enforces audio-visual consistency, Meerkat can tackle challenging tasks such as audio referred image grounding, image guided audio temporal localization, and audio-visual fact-checking. Moreover, we carefully curate a large dataset AVFIT that comprises 3M instruction tuning samples collected from open-source datasets, and introduce MeerkatBench that unifies five challenging audio-visual tasks. We achieve state-of-the-art performance on all these downstream tasks with a relative improvement of up to 37.12%.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2407.01851

Country:

North America > United States > Maryland (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment > Sports (1.00)
Transportation > Ground > Road (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

UnShadowNet: Illumination Critic Guided Contrastive Learning For Shadow Removal

Dasgupta, Subhrajyoti, Das, Arindam, Yogamani, Senthil, Das, Sudip, Eising, Ciaran, Bursuc, Andrei, Bhattacharya, Ujjwal

arXiv.org Artificial IntelligenceAug-24-2023

Shadows are frequently encountered natural phenomena that significantly hinder the performance of computer vision perception systems in practical settings, e.g., autonomous driving. A solution to this would be to eliminate shadow regions from the images before the processing of the perception system. Yet, training such a solution requires pairs of aligned shadowed and non-shadowed images which are difficult to obtain. We introduce a novel weakly supervised shadow removal framework UnShadowNet trained using contrastive learning. It is composed of a DeShadower network responsible for the removal of the extracted shadow under the guidance of an Illumination network which is trained adversarially by the illumination critic and a Refinement network to further remove artefacts. We show that UnShadowNet can be easily extended to a fully-supervised set-up to exploit the ground-truth when available. UnShadowNet outperforms existing state-of-the-art approaches on three publicly available shadow datasets (ISTD, adjusted ISTD, SRD) in both the weakly and fully supervised setups.

artificial intelligence, computer vision, machine learning, (12 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ACCESS.2023.3305576

2203.15441

Country:

Europe (0.68)
North America > United States > Maryland (0.28)
Asia > India > West Bengal > Kolkata (0.14)

Genre: Research Report > Promising Solution (0.34)

Industry:

Automobiles & Trucks (0.67)
Transportation > Ground > Road (0.49)
Information Technology > Robotics & Automation (0.35)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback