AITopics | Chaubey, Ashutosh

Collaborating Authors

Chaubey, Ashutosh

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ContextIQ: A Multimodal Expert-Based Video Retrieval System for Contextual Advertising

Chaubey, Ashutosh, Agarwaal, Anoubhav, Roy, Sartaki Sinha, Agrawal, Aayush, Ghose, Susmita

arXiv.org Artificial IntelligenceNov-6-2024

Contextual advertising serves ads that are aligned to the content that the user is viewing. The rapid growth of video content on social platforms and streaming services, along with privacy concerns, has increased the need for contextual advertising. Placing the right ad in the right context creates a seamless and pleasant ad viewing experience, resulting in higher audience engagement and, ultimately, better ad monetization. From a technology standpoint, effective contextual advertising requires a video retrieval system capable of understanding complex video content at a very granular level. Current text-to-video retrieval models based on joint multimodal training demand large datasets and computational resources, limiting their practicality and lacking the key functionalities required for ad ecosystem integration. We introduce ContextIQ, a multimodal expert-based video retrieval system designed specifically for contextual advertising. ContextIQ utilizes modality-specific experts-video, audio, transcript (captions), and metadata such as objects, actions, emotion, etc.-to create semantically rich video representations. We show that our system, without joint training, achieves better or comparable results to state-of-the-art models and commercial solutions on multiple text-to-video retrieval benchmarks. Our ablation studies highlight the benefits of leveraging multiple modalities for enhanced video retrieval accuracy instead of using a vision-language model alone. Furthermore, we show how video retrieval systems such as ContextIQ can be used for contextual advertising in an ad ecosystem while also addressing concerns related to brand safety and filtering inappropriate content.

large language model, machine learning, natural language, (24 more...)

arXiv.org Artificial Intelligence

2410.22233

Country:

North America > United States > New York (0.14)
North America > United States > California (0.14)

Genre: Research Report (1.00)

Industry:

Media (1.00)
Leisure & Entertainment (1.00)
Information Technology > Security & Privacy (0.66)
Information Technology > Services (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(3 more...)

Add feedback

Meta-Learning Framework for End-to-End Imposter Identification in Unseen Speaker Recognition

Chaubey, Ashutosh, Sinha, Sparsh, Ghose, Susmita

arXiv.org Artificial IntelligenceSep-30-2023

Speaker identification systems are deployed in diverse environments, often different from the lab conditions on which they are trained and tested. In this paper, first, we show the problem of generalization using fixed thresholds (computed using EER metric) for imposter identification in unseen speaker recognition and then introduce a robust speaker-specific thresholding technique for better performance. Secondly, inspired by the recent use of meta-learning techniques in speaker verification, we propose an end-to-end meta-learning framework for imposter detection which decouples the problem of imposter detection from unseen speaker identification. Thus, unlike most prior works that use some heuristics to detect imposters, the proposed network learns to detect imposters by leveraging the utterances of the enrolled speakers. Furthermore, we show the efficacy of the proposed techniques on VoxCeleb1, VCTK and the FFSVC 2022 datasets, beating the baselines by up to 10%.

machine learning, pattern recognition, utterance, (18 more...)

arXiv.org Artificial Intelligence

2306.00952

Genre: Research Report > New Finding (0.34)

Industry: Information Technology (0.47)

Technology:

Information Technology > Artificial Intelligence > Speech > Acoustic Processing (0.97)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Speech Recognition (0.63)

Add feedback

Improved Relation Networks for End-to-End Speaker Verification and Identification

Chaubey, Ashutosh, Sinha, Sparsh, Ghose, Susmita

arXiv.org Artificial IntelligenceJul-21-2022

Speaker identification systems in a real-world scenario are tasked to identify a speaker amongst a set of enrolled speakers given just a few samples for each enrolled speaker. This paper demonstrates the effectiveness of meta-learning and relation networks for this use case. We propose improved relation networks for speaker verification and few-shot (unseen) speaker identification. The use of relation networks facilitates joint training of the frontend speaker encoder and the backend model. Inspired by the use of prototypical networks in speaker verification and to increase the discriminability of the speaker embeddings, we train the model to classify samples in the current episode amongst all speakers present in the training set. Furthermore, we propose a new training regime for faster model convergence by extracting more information from a given meta-learning episode with negligible extra computation. We evaluate the proposed techniques on VoxCeleb, SITW and VCTK datasets on the tasks of speaker verification and unseen speaker identification. The proposed approach outperforms the existing approaches consistently on both tasks.

artificial intelligence, speaker verification, speech recognition, (16 more...)

arXiv.org Artificial Intelligence

2203.17218

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Speech > Acoustic Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback