Goto

Collaborating Authors

 Chen, Chi-Sheng


Quantum Generative Models for Image Generation: Insights from MNIST and MedMNIST

arXiv.org Artificial Intelligence

Quantum generative models offer a promising new direction in machine learning by leveraging quantum circuits to enhance data generation capabilities. In this study, we propose a hybrid quantum-classical image generation framework that integrates variational quantum circuits into a diffusion-based model. To improve training dynamics and generation quality, we introduce two novel noise strategies: intrinsic quantum-generated noise and a tailored noise scheduling mechanism. Our method is built upon a lightweight U-Net architecture, with the quantum layer embedded in the bottleneck module to isolate its effect. We evaluate our model on MNIST and MedMNIST datasets to examine its feasibility and performance. Notably, our results reveal that under limited data conditions (fewer than 100 training images), the quantum-enhanced model generates images with higher perceptual quality and distributional similarity than its classical counterpart using the same architecture. While the quantum model shows advantages on grayscale data such as MNIST, its performance is more nuanced on complex, color-rich datasets like PathMNIST. These findings highlight both the potential and current limitations of quantum generative models and lay the groundwork for future developments in low-resource and biomedical image generation.


Large Cognition Model: Towards Pretrained EEG Foundation Model

arXiv.org Artificial Intelligence

Electroencephalography provides a non-invasive window into brain activity, offering valuable insights for neurological research, brain-computer interfaces, and clinical diagnostics. However, the development of robust machine learning models for EEG analysis is hindered by the scarcity of large-scale, well-annotated datasets and the inherent variability of EEG signals across subjects and recording conditions. Inspired by the success of foundation models in natural language processing and computer vision, we propose the Large Cognition Model-a transformer-based foundation model designed to generalize across diverse EEG datasets and downstream tasks. Unlike traditional approaches, our proposed transformer-based architecture demonstrates strong generalization capabilities across datasets and tasks, even without pretraining, surpassing some existing EEG universal models on specific downstream applications. LCM leverages large-scale self-supervised learning techniques to capture universal EEG representations, enabling efficient fine-tuning for applications such as cognitive state decoding, disease classification, and neurofeedback systems. We introduce a novel architecture that integrates temporal and spectral attention mechanisms, optimizing the model's ability to extract meaningful features from raw EEG signals. Extensive evaluations demonstrate that LCM outperforms state-of-the-art approaches across multiple EEG benchmarks, exhibiting strong cross-subject and cross-task generalization. Our findings highlight the potential of pretrained EEG foundation models to accelerate advancements in neuroscience, personalized medicine, and BCI technology.


Optimizing Supply Chain Networks with the Power of Graph Neural Networks

arXiv.org Artificial Intelligence

Graph Neural Networks (GNNs) have emerged as transformative tools for modeling complex relational data, offering unprecedented capabilities in tasks like forecasting and optimization. This study investigates the application of GNNs to demand forecasting within supply chain networks using the SupplyGraph dataset, a benchmark for graph-based supply chain analysis. By leveraging advanced GNN methodologies, we enhance the accuracy of forecasting models, uncover latent dependencies, and address temporal complexities inherent in supply chain operations. Comparative analyses demonstrate that GNN-based models significantly outperform traditional approaches, including Multilayer Perceptrons (MLPs) and Graph Convolutional Networks (GCNs), particularly in single-node demand forecasting tasks. The integration of graph representation learning with temporal data highlights GNNs' potential to revolutionize predictive capabilities for inventory management, production scheduling, and logistics optimization. This work underscores the pivotal role of forecasting in supply chain management and provides a robust framework for advancing research and applications in this domain.


Psycho Gundam: Electroencephalography based real-time robotic control system with deep learning

arXiv.org Artificial Intelligence

The Psycho Frame, a sophisticated system primarily used in Universal Century (U.C.) series mobile suits for NEWTYPE pilots, has evolved as an integral component in harnessing the latent potential of mental energy. Its ability to amplify and resonate with the pilot's psyche enables real-time mental control, creating unique applications such as psychomagnetic fields and sensory-based weaponry. This paper presents the development of a novel robotic control system inspired by the Psycho Frame, combining electroencephalography (EEG) and deep learning for real-time control of robotic systems. By capturing and interpreting brainwave data through EEG, the system extends human cognitive commands to robotic actions, reflecting the seamless synchronization of thought and machine, much like the Psyco Frame's integration with a Newtype pilot's mental faculties. This research demonstrates how modern AI techniques can expand the limits of human-machine interaction, potentially transcending traditional input methods and enabling a deeper, more intuitive control of complex robotic systems.


NECOMIMI: Neural-Cognitive Multimodal EEG-informed Image Generation with Diffusion Models

arXiv.org Artificial Intelligence

NECOMIMI (NEural-COgnitive MultImodal EEG-Informed Image Generation with Diffusion Models) introduces a novel framework for generating images directly from EEG signals using advanced diffusion models. Unlike previous works that focused solely on EEG-image classification through contrastive learning, NECOMIMI extends this task to image generation. The proposed NERV EEG encoder demonstrates state-of-the-art (SoTA) performance across multiple zero-shot classification tasks, including 2-way, 4-way, and 200-way, and achieves top results in our newly proposed Category-based Assessment Table (CAT) Score, which evaluates the quality of EEG-generated images based on semantic concepts. A key discovery of this work is that the model tends to generate abstract or generalized images, such as landscapes, rather than specific objects, highlighting the inherent challenges of translating noisy and low-resolution EEG data into detailed visual outputs. Additionally, we introduce the CAT Score as a new metric tailored for EEG-to-image evaluation and establish a benchmark on the ThingsEEG dataset. This study underscores the potential of EEG-to-image generation while revealing the complexities and challenges that remain in bridging neural activity with visual representation.


Res-VMamba: Fine-Grained Food Category Visual Classification Using Selective State Space Models with Deep Residual Learning

arXiv.org Artificial Intelligence

Food classification is the foundation for developing food vision tasks and plays a key role in the burgeoning field of computational nutrition. Due to the complexity of food requiring fine-grained classification, recent academic research mainly modifies Convolutional Neural Networks (CNNs) and/or Vision Transformers (ViTs) to perform food category classification. However, to learn fine-grained features, the CNN backbone needs additional structural design, whereas ViT, containing the self-attention module, has increased computational complexity. In recent months, a new Sequence State Space (S4) model, through a Selection mechanism and computation with a Scan (S6), colloquially termed Mamba, has demonstrated superior performance and computation efficiency compared to the Transformer architecture. The VMamba model, which incorporates the Mamba mechanism into image tasks (such as classification), currently establishes the state-of-the-art (SOTA) on the ImageNet dataset. In this research, we introduce an academically underestimated food dataset CNFOOD-241, and pioneer the integration of a residual learning framework within the VMamba model to concurrently harness both global and local state features inherent in the original VMamba architectural design. The research results show that VMamba surpasses current SOTA models in fine-grained and food classification. The proposed Res-VMamba further improves the classification accuracy to 79.54\% without pretrained weight. Our findings elucidate that our proposed methodology establishes a new benchmark for SOTA performance in food recognition on the CNFOOD-241 dataset. The code can be obtained on GitHub: https://github.com/ChiShengChen/ResVMamba.