Goto

Collaborating Authors

 Roychowdhury, Shounak


A Survey on Bridging EEG Signals and Generative AI: From Image and Text to Beyond

arXiv.org Artificial Intelligence

Integration of Brain-Computer Interfaces (BCIs) and Generative Artificial Intelligence (GenAI) has opened new frontiers in brain signal decoding, enabling assistive communication, neural representation learning, and multimodal integration. BCIs, particularly those leveraging Electroencephalography (EEG), provide a non-invasive means of translating neural activity into meaningful outputs. Recent advances in deep learning, including Generative Adversarial Networks (GANs) and Transformer-based Large Language Models (LLMs), have significantly improved EEG-based generation of images, text, and speech. This paper provides a literature review of the state-of-the-art in EEG-based multimodal generation, focusing on (i) EEG-to-image generation through GANs, Variational Autoencoders (VAEs), and Diffusion Models, and (ii) EEG-to-text generation leveraging Transformer based language models and contrastive learning methods. Additionally, we discuss the emerging domain of EEG-to-speech synthesis, an evolving multimodal frontier. We highlight key datasets, use cases, challenges, and EEG feature encoding methods that underpin generative approaches. By providing a structured overview of EEG-based generative AI, this survey aims to equip researchers and practitioners with insights to advance neural decoding, enhance assistive technologies, and expand the frontiers of brain-computer interaction.


Thought2Text: Text Generation from EEG Signal using Large Language Models (LLMs)

arXiv.org Artificial Intelligence

Decoding and expressing brain activity in a comprehensible form is a challenging frontier in AI. This paper presents Thought2Text, which uses instruction-tuned Large Language Models (LLMs) fine-tuned with EEG data to achieve this goal. The approach involves three stages: (1) training an EEG encoder for visual feature extraction, (2) fine-tuning LLMs on image and text data, enabling multimodal description generation, and (3) further fine-tuning on EEG embeddings to generate text directly from EEG during inference. Experiments on a public EEG dataset collected for six subjects with image stimuli demonstrate the efficacy of multimodal LLMs (LLaMa-v3, Mistral-v0.3, Qwen2.5), validated using traditional language generation evaluation metrics, GPT-4 based assessments, and evaluations by human expert. This approach marks a significant advancement towards portable, low-cost "thoughts-to-text" technology with potential applications in both neuroscience and natural language processing (NLP).


Robust Laplacian Eigenmaps Using Global Information

AAAI Conferences

The Laplacian Eigenmap is a popular method for non-linear dimension reduction and data representation. This graph based method uses a Graph Laplacian matrix that closely approximates the Laplace-Beltrami operator which has properties that help to learn the structure of data lying on Riemaniann manifolds. However, the Graph Laplacian used in this method is derived from an intermediate graph that is built using local neighborhood information. In this paper we show that it possible to encapsulate global information represented by a Minimum Spanning Tree on the data set and use it for effective dimension reduction when local information is limited. The ability of MSTs to capture intrinsic dimension and intrinsic entropy of manifolds has been shown in a recent study. Based on that result we show that the use of local neighborhood and global graph can preserve the locality of the manifold. The experimental results validate the simultaneous use of local and global information for non-linear dimension reduction.