AITopics | Mishra, Abhijit

Collaborating Authors

Mishra, Abhijit

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ReVision: A Dataset and Baseline VLM for Privacy-Preserving Task-Oriented Visual Instruction Rewriting

Mishra, Abhijit, Noh, Richard, Fu, Hsiang, Li, Mingda, Kim, Minji

arXiv.org Artificial IntelligenceFeb-20-2025

Efficient and privacy-preserving multimodal interaction is essential as AR, VR, and modern smartphones with powerful cameras become primary interfaces for human-computer communication. Existing powerful large vision-language models (VLMs) enabling multimodal interaction often rely on cloud-based processing, raising significant concerns about (1) visual privacy by transmitting sensitive vision data to servers, and (2) their limited real-time, on-device usability. This paper explores Visual Instruction Rewriting, a novel approach that transforms multimodal instructions into text-only commands, allowing seamless integration of lightweight on-device instruction rewriter VLMs (250M parameters) with existing conversational AI systems, enhancing vision data privacy. To achieve this, we present a dataset of over 39,000 examples across 14 domains and develop a compact VLM, pretrained on image captioning datasets and fine-tuned for instruction rewriting. Experimental results, evaluated through NLG metrics such as BLEU, METEOR, and ROUGE, along with semantic parsing analysis, demonstrate that even a quantized version of the model (<500MB storage footprint) can achieve effective instruction rewriting, thus enabling privacy-focused, multimodal AI applications.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.1478

Country:

Oceania > Australia (0.14)
North America > United States > Texas (0.14)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology > Security & Privacy (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
(2 more...)

Add feedback

A Survey on Bridging EEG Signals and Generative AI: From Image and Text to Beyond

Shukla, Shreya, Torres, Jose, Mishra, Abhijit, Gwizdka, Jacek, Roychowdhury, Shounak

arXiv.org Artificial IntelligenceFeb-17-2025

Integration of Brain-Computer Interfaces (BCIs) and Generative Artificial Intelligence (GenAI) has opened new frontiers in brain signal decoding, enabling assistive communication, neural representation learning, and multimodal integration. BCIs, particularly those leveraging Electroencephalography (EEG), provide a non-invasive means of translating neural activity into meaningful outputs. Recent advances in deep learning, including Generative Adversarial Networks (GANs) and Transformer-based Large Language Models (LLMs), have significantly improved EEG-based generation of images, text, and speech. This paper provides a literature review of the state-of-the-art in EEG-based multimodal generation, focusing on (i) EEG-to-image generation through GANs, Variational Autoencoders (VAEs), and Diffusion Models, and (ii) EEG-to-text generation leveraging Transformer based language models and contrastive learning methods. Additionally, we discuss the emerging domain of EEG-to-speech synthesis, an evolving multimodal frontier. We highlight key datasets, use cases, challenges, and EEG feature encoding methods that underpin generative approaches. By providing a structured overview of EEG-based generative AI, this survey aims to equip researchers and practitioners with insights to advance neural decoding, enhance assistive technologies, and expand the frontiers of brain-computer interaction.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2502.12048

Genre: Overview (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback

ETF: An Entity Tracing Framework for Hallucination Detection in Code Summaries

Maharaj, Kishan, Munigala, Vitobha, Tamilselvam, Srikanth G., Kumar, Prince, Sen, Sayandeep, Kodeswaran, Palani, Mishra, Abhijit, Bhattacharyya, Pushpak

arXiv.org Artificial IntelligenceDec-18-2024

Recent advancements in large language models (LLMs) have significantly enhanced their ability to understand both natural language and code, driving their use in tasks like natural language-to-code (NL2Code) and code summarization. However, LLMs are prone to hallucination-outputs that stray from intended meanings. Detecting hallucinations in code summarization is especially difficult due to the complex interplay between programming and natural languages. We introduce a first-of-its-kind dataset with $\sim$10K samples, curated specifically for hallucination detection in code summarization. We further propose a novel Entity Tracing Framework (ETF) that a) utilizes static program analysis to identify code entities from the program and b) uses LLMs to map and verify these entities and their intents within generated code summaries. Our experimental analysis demonstrates the effectiveness of the framework, leading to a 0.73 F1 score. This approach provides an interpretable method for detecting hallucinations by grounding entities, allowing us to evaluate summary accuracy.

artificial intelligence, large language model, natural language, (14 more...)

arXiv.org Artificial Intelligence

2410.14748

Country: North America > United States > Texas (0.28)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Thought2Text: Text Generation from EEG Signal using Large Language Models (LLMs)

Mishra, Abhijit, Shukla, Shreya, Torres, Jose, Gwizdka, Jacek, Roychowdhury, Shounak

arXiv.org Artificial IntelligenceOct-9-2024

Decoding and expressing brain activity in a comprehensible form is a challenging frontier in AI. This paper presents Thought2Text, which uses instruction-tuned Large Language Models (LLMs) fine-tuned with EEG data to achieve this goal. The approach involves three stages: (1) training an EEG encoder for visual feature extraction, (2) fine-tuning LLMs on image and text data, enabling multimodal description generation, and (3) further fine-tuning on EEG embeddings to generate text directly from EEG during inference. Experiments on a public EEG dataset collected for six subjects with image stimuli demonstrate the efficacy of multimodal LLMs (LLaMa-v3, Mistral-v0.3, Qwen2.5), validated using traditional language generation evaluation metrics, GPT-4 based assessments, and evaluations by human expert. This approach marks a significant advancement towards portable, low-cost "thoughts-to-text" technology with potential applications in both neuroscience and natural language processing (NLP).

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2410.07507

Country: North America > United States > Texas (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.87)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SentinelLMs: Encrypted Input Adaptation and Fine-tuning of Language Models for Private and Secure Inference

Mishra, Abhijit, Li, Mingda, Deo, Soham

arXiv.org Artificial IntelligenceDec-28-2023

This paper addresses the privacy and security concerns associated with deep neural language models, which serve as crucial components in various modern AI-based applications. These models are often used after being pre-trained and fine-tuned for specific tasks, with deployment on servers accessed through the internet. However, this introduces two fundamental risks: (a) the transmission of user inputs to the server via the network gives rise to interception vulnerabilities, and (b) privacy concerns emerge as organizations that deploy such models store user data with restricted context. To address this, we propose a novel method to adapt and fine-tune transformer-based language models on passkey-encrypted user-specific text. The original pre-trained language model first undergoes a quick adaptation (without any further pre-training) with a series of irreversible transformations applied to the tokenizer and token embeddings. This enables the model to perform inference on encrypted inputs while preventing reverse engineering of text from model parameters and intermediate outputs. After adaptation, models are fine-tuned on encrypted versions of existing training datasets. Experimental evaluation employing adapted versions of renowned models (e.g., BERT, RoBERTa) across established benchmark English and multilingual datasets for text classification and sequence labeling shows that encrypted models achieve performance parity with their original counterparts. This serves to safeguard performance, privacy, and security cohesively.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2312.17342

Country:

North America > United States > Texas (0.14)
North America > United States > Minnesota (0.14)

Genre: Research Report > Promising Solution (0.49)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Can Open Domain Question Answering Systems Answer Visual Knowledge Questions?

Zhang, Jiawen, Mishra, Abhijit, S, Avinesh P. V., Patwardhan, Siddharth, Agarwal, Sachin

arXiv.org Artificial IntelligenceFeb-9-2022

The task of Outside Knowledge Visual Question Answering (OKVQA) requires an automatic system to answer natural language questions about pictures and images using external knowledge. We observe that many visual questions, which contain deictic referential phrases referring to entities in the image, can be rewritten as "non-grounded" questions and can be answered by existing text-based question answering systems. This allows for the reuse of existing text-based Open Domain Question Answering (QA) Systems for visual question answering. In this work, we propose a potentially data-efficient approach that reuses existing systems for (a) image analysis, (b) question rewriting, and (c) text-based question answering to answer such visual questions. Given an image and a question pertaining to that image (a visual question), we first extract the entities present in the image using pre-trained object and scene classifiers. Using these detected entities, the visual questions can be rewritten so as to be answerable by open domain QA systems. We explore two rewriting strategies: (1) an unsupervised method using BERT for masking and rewriting, and (2) a weakly supervised approach that combines adaptive rewriting and reinforcement learning techniques to use the implicit feedback from the QA system. We test our strategies on the publicly available OKVQA dataset and obtain a competitive performance with state-of-the-art models while using only 10% of the training data.

arxiv preprint arxiv, machine learning, question answering, (19 more...)

arXiv.org Artificial Intelligence

2202.04306

Country: North America > United States (0.28)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Template Controllable keywords-to-text Generation

Mishra, Abhijit, Chowdhury, Md Faisal Mahbub, Manohar, Sagar, Gutfreund, Dan, Sankaranarayanan, Karthik

arXiv.org Artificial IntelligenceNov-7-2020

This paper proposes a novel neural model for the understudied task of generating text from keywords. The model takes as input a set of un-ordered keywords, and part-of-speech (POS) based template instructions. This makes it ideal for surface realization in any NLG setup. The framework is based on the encode-attend-decode paradigm, where keywords and templates are encoded first, and the decoder judiciously attends over the contexts derived from the encoded keywords and templates to generate the sentences. Training exploits weak supervision, as the model trains on a large amount of labeled data with keywords and POS based templates prepared through completely automatic means. Qualitative and quantitative performance analyses on publicly available test-data in various domains reveal our system's superiority over baselines, built using state-of-the-art neural machine translation and controllable transfer techniques. Our approach is indifferent to the order of input keywords.

deep learning, keyword, neural network, (21 more...)

arXiv.org Artificial Intelligence

2011.03722

Country:

Europe (0.94)
North America > United States (0.46)
Asia > Middle East > Qatar (0.14)

Genre: Research Report (0.50)

Industry: Media > Film (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Modeling Topical Coherence in Discourse without Supervision

Shrivastava, Disha, Mishra, Abhijit, Sankaranarayanan, Karthik

arXiv.org Artificial IntelligenceSep-2-2018

Coherence of text is an important attribute to be measured for both manually and automatically generated discourse; but well-defined quantitative metrics for it are still elusive. In this paper, we present a metric for scoring topical coherence of an input paragraph on a real-valued scale by analyzing its underlying topical structure. We first extract all possible topics that the sentences of a paragraph of text are related to. Coherence of this text is then measured by computing: (a) the degree of uncertainty of the topics with respect to the paragraph, and (b) the relatedness between these topics. All components of our modular framework rely only on unlabeled data and WordNet, thus making it completely unsupervised, which is an important feature for general-purpose usage of any metric. Experiments are conducted on two datasets - a publicly available dataset for essay grading (representing human discourse), and a synthetic dataset constructed by mixing content from multiple paragraphs covering diverse topics. Our evaluation shows that the measured coherence scores are positively correlated with the ground truth for both the datasets. Further validation to our coherence scores is provided by conducting human evaluation on the synthetic data, showing a significant agreement of 79.3%

educational technology, paragraph, text processing, (22 more...)

arXiv.org Artificial Intelligence

1809.0041

Country: Asia (0.29)

Genre: Research Report > Experimental Study (0.46)

Industry: Education > Educational Technology > Educational Software (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Cognition-Cognizant Sentiment Analysis With Multitask Subjectivity Summarization Based on Annotators' Gaze Behavior

Mishra, Abhijit (IBM Research AI ) | Tamilselvam, Srikanth (IBM Research AI ) | Dasgupta, Riddhiman (IBM Research AI ) | Nagar, Seema (IBM Research AI ) | Dey, Kuntal (IBM Research AI )

AAAI ConferencesFeb-8-2018

For document level sentiment analysis (SA), Subjectivity Extraction, ie., extracting the relevant subjective portions of the text that cover the overall sentiment expressed in the document, is an important step. Subjectivity Extraction, however, is a hard problem for systems, as it demands a great deal of world knowledge and reasoning. Humans, on the other hand, are good at extracting relevant subjective summaries from an opinionated document (say, a movie review), while inferring the sentiment expressed in it. This capability is manifested in their eye-movement behavior while reading: words pertaining to the subjective summary of the text attract a lot more attention in the form of gaze-fixations and/or saccadic patterns. We propose a multi-task deep neural framework for document level sentiment analysis that learns to predict the overall sentiment expressed in the given input document, by simultaneously learning to predict human gaze behavior and auxiliary linguistic tasks like part-of-speech and syntactic properties of words in the document. For this, a multi-task learning algorithm based on multi-layer shared LSTM augmented with task specific classifiers is proposed. With this composite multi-task network, we obtain performance competitive with or better than state-of-the-art approaches in SA. Moreover, the availability of gaze predictions as an auxiliary output helps interpret the system better; for instance, gaze predictions reveal that the system indeed performs subjectivity extraction better, which accounts for improvement in document level sentiment analysis performance.

deep learning, neural network, sentiment analysis, (18 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country:

Asia (0.28)
North America > United States (0.14)

Genre:

Research Report (0.48)
Overview (0.48)

Industry:

Media > Film (1.00)
Leisure & Entertainment (0.90)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Scanpath Complexity: Modeling Reading Effort Using Gaze Information

Mishra, Abhijit (Indian Institute of Technology Bombay) | Kanojia, Diptesh (IITB-Monash Research Academy, India) | Nagar, Seema (IBM Research, India) | Dey, Kuntal (IBM Research, India) | Bhattacharyya, Pushpak (Indian Institute of Technology Bombay)

AAAI ConferencesFeb-14-2017

Measuring reading effort is useful for practical purposes such as designing learning material and personalizing text comprehension environment. We propose a quantification of reading effort by measuring the complexity of eye-movement patterns of readers. We call the measure Scanpath Complexity. Scanpath complexity is modeled as a function of various properties of gaze fixations and saccades- the basic parameters of eye movement behavior. We demonstrate the effectiveness of our scanpath complexity measure by showing that its correlation with different measures of lexical and syntactic complexity as well as standard readability metrics is better than popular baseline measures based on fixation alone.

artificial intelligence, complexity, neural network, (20 more...)

AAAI Conferences

Thirty-First AAAI Conference on Artificial Intelligence

Country: Europe (0.14)

Genre: Research Report (0.94)

Industry:

Education (0.66)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Human Computer Interaction (0.71)

Add feedback