Goto

Collaborating Authors

 Hu, Yong


SSVEP-BiMA: Bifocal Masking Attention Leveraging Native and Symmetric-Antisymmetric Components for Robust SSVEP Decoding

arXiv.org Artificial Intelligence

Brain-computer interface (BCI) based on steady-state visual evoked potentials (SSVEP) is a popular paradigm for its simplicity and high information transfer rate (ITR). Accurate and fast SSVEP decoding is crucial for reliable BCI performance. However, conventional decoding methods demand longer time windows, and deep learning models typically require subject-specific fine-tuning, leaving challenges in achieving optimal performance in cross-subject settings. This paper proposed a biofocal masking attention-based method (SSVEP-BiMA) that synergistically leverages the native and symmetric-antisymmetric components for decoding SSVEP. By utilizing multiple signal representations, the network is able to integrate features from a wider range of sample perspectives, leading to more generalized and comprehensive feature learning, which enhances both prediction accuracy and robustness. We performed experiments on two public datasets, and the results demonstrate that our proposed method surpasses baseline approaches in both accuracy and ITR. We believe that this work will contribute to the development of more efficient SSVEP-based BCI systems.


Multi-modal Retrieval Augmented Multi-modal Generation: A Benchmark, Evaluate Metrics and Strong Baselines

arXiv.org Artificial Intelligence

This paper investigates an intriguing task of Multi-modal Retrieval Augmented Multi-modal Generation (M$^2$RAG). This task requires foundation models to browse multi-modal web pages, with mixed text and images, and generate multi-modal responses for solving user queries, which exhibits better information density and readability. Given the early researching stage of M$^2$RAG task, there is a lack of systematic studies and analysis. To fill this gap, we construct a benchmark for M$^2$RAG task, equipped with a suite of text-modal metrics and multi-modal metrics to analyze the capabilities of existing foundation models. Besides, we also propose several effective methods for foundation models to accomplish this task, based on the comprehensive evaluation results on our benchmark. Extensive experimental results reveal several intriguing phenomena worth further research.


Domain-Specific Improvement on Psychotherapy Chatbot Using Assistant

arXiv.org Artificial Intelligence

Large language models (LLMs) have demonstrated impressive generalization capabilities on specific tasks with human-written instruction data. However, the limited quantity, diversity, and professional expertise of such instruction data raise concerns about the performance of LLMs in psychotherapy tasks when provided with domain-specific instructions. To address this, we firstly propose Domain-Specific Assistant Instructions based on AlexanderStreet therapy, and secondly, we use an adaption fine-tuning method and retrieval augmented generation method to improve pre-trained LLMs. Through quantitative evaluation of linguistic quality using automatic and human evaluation, we observe that pre-trained LLMs on Psychotherapy Assistant Instructions outperform state-of-the-art LLMs response baselines. Our Assistant-Instruction approach offers a half-annotation method to align pre-trained LLMs with instructions and provide pre-trained LLMs with more psychotherapy knowledge.


Quantized Embedding Vectors for Controllable Diffusion Language Models

arXiv.org Artificial Intelligence

Improving the controllability, portability, and inference speed of diffusion language models (DLMs) is a key challenge in natural language generation. While recent research has shown significant success in complex text generation with language models, the memory and computational power are still very demanding and fall short of expectations, which naturally results in low portability and instability for the models. To mitigate these issues, numerous well-established methods were proposed for neural network quantization. To further enhance their portability of independent deployment as well as improve their stability evaluated by language perplexity, we propose a novel approach called the Quantized Embedding Controllable Diffusion Language Model (QE-CDLM). QE-CDLM builds upon the recent successful controllable DLMs by remodeling the task-specific embedding space via quantization. This leads to a gradient-based controller for the generation tasks, and more stable intermediate latent variables are obtained, which naturally brings in an accelerated convergence as well as better controllability. Additionally, the adaption fine-tuning method is employed to reduce tunable weights. Experimental results on five challenging fine-grained control tasks demonstrate that QE-CDLM compares favorably to existing methods in terms of quality and feasibility, achieving better perplexity and lightweight fine-tuning.


CSCD-IME: Correcting Spelling Errors Generated by Pinyin IME

arXiv.org Artificial Intelligence

Chinese Spelling Correction (CSC) is a task to detect and correct spelling mistakes in texts. In fact, most of Chinese input is based on pinyin input method, so the study of spelling errors in this process is more practical and valuable. However, there is still no research dedicated to this essential scenario. In this paper, we first present a Chinese Spelling Correction Dataset for errors generated by pinyin IME (CSCD-IME), including 40,000 annotated sentences from real posts of official media on Sina Weibo. Furthermore, we propose a novel method to automatically construct large-scale and high-quality pseudo data by simulating the input through pinyin IME. A series of analyses and experiments on CSCD-IME show that spelling errors produced by pinyin IME hold a particular distribution at pinyin level and semantic level and are challenging enough. Meanwhile, our proposed pseudo-data construction method can better fit this error distribution and improve the performance of CSC systems. Finally, we provide a useful guide to using pseudo data, including the data scale, the data source, and the training strategy.


giMLPs: Gate with Inhibition Mechanism in MLPs

arXiv.org Artificial Intelligence

This paper presents a new model architecture, gate with inhibition MLP (giMLP).The gate with inhibition on CycleMLP (gi-CycleMLP) can produce equal performance on the ImageNet classification task, and it also improves the BERT, Roberta, and DeBERTaV3 models depending on two novel techniques. The first is the gating MLP, where matrix multiplications between the MLP and the trunk Attention input in further adjust models' adaptation. The second is inhibition which inhibits or enhances the branch adjustment, and with the inhibition levels increasing, it offers models more muscular features restriction. We show that the giCycleMLP with a lower inhibition level can be competitive with the original CycleMLP in terms of ImageNet classification accuracy. In addition, we also show through a comprehensive empirical study that these techniques significantly improve the performance of fine-tuning NLU downstream tasks. As for the gate with inhibition MLPs on DeBERTa (giDeBERTa) fine-tuning, we find it can achieve appealing results on most parts of NLU tasks without any extra pretraining again. We also find that with the use of Gate With Inhibition, the activation function should have a short and smooth negative tail, with which the unimportant features or the features that hurt models can be moderately inhibited. The experiments on ImageNet and twelve language downstream tasks demonstrate the effectiveness of Gate With Inhibition, both for image classification and for enhancing the capacity of nature language fine-tuning without any extra pretraining.


Principal Model Analysis Based on Partial Least Squares

arXiv.org Machine Learning

Motivated by the Bagging Partial Least Squares (PLS) and Principal Component Analysis (PCA) algorithms, we propose a Principal Model Analysis (PMA) method in this paper. In the proposed PMA algorithm, the PCA and the PLS are combined. In the method, multiple PLS models are trained on sub-training sets, derived from the original training set based on the random sampling with replacement method. The regression coefficients of all the sub-PLS models are fused in a joint regression coefficient matrix. The final projection direction is then estimated by performing the PCA on the joint regression coefficient matrix. The proposed PMA method is compared with other traditional dimension reduction methods, such as PLS, Bagging PLS, Linear discriminant analysis (LDA) and PLS-LDA. Experimental results on six public datasets show that our proposed method can achieve better classification performance and is usually more stable.