Europe
Supplementary Material for DeWave: Discrete Encoding of EEGWaves for EEG to Text Translation
In this material, we will give more technical details as well as additional experiments to support the main paper. The overview of the proposed framework, DeWave, is illustrated in Figure 6. Ground Bush attended the University of Texas at Austin, where he graduated Phi Beta Kappa with a Truth Bachelor's degree in Latin American Studies in 1973, taking only two and a half years to complete his work, and obtaining generally excellent grades. Predictwas the University of California at Austin in where he studied in Beta Kappa in a degree of degree in history American Studies in 1975. ZuCo stands for Zurich Cognitive Language Processing Corpus (ZuCo), a dataset that includes both raw and preprocessed eye-tracking and electroencephalography (EEG) data. The data is collected by having human subjects read given text corpora while simultaneously recording both their eye-tracking signals and EEG waves.
DeWave: Discrete EEGWaves Encoding for Brain Dynamics to Text Translation
The translation of brain dynamics into natural language is pivotal for braincomputer interfaces (BCIs). With the swift advancement of large language models, such as ChatGPT, the need to bridge the gap between the brain and languages becomes increasingly pressing. Current methods, however, require eye-tracking fixations or event markers to segment brain dynamics into word-level features, which can restrict the practical application of these systems. To tackle these issues, we introduce a novel framework, DeWave, that integrates discrete encoding sequences into open-vocabulary EEG-to-text translation tasks. DeWave uses a quantized variational encoder to derive discrete codex encoding and align it with pre-trained language models. This discrete codex representation brings forth two advantages: 1) it realizes translation on raw waves without marker by introducing text-EEG contrastive alignment training, and 2) it alleviates the interference caused by individual differences in EEG waves through an invariant discrete codex with or without markers.
Ensembling Graph Predictions for AMRParsing
In many machine learning tasks, models are trained to predict structure data such as graphs. For example, in natural language processing, it is very common to parse texts into dependency trees or abstract meaning representation (AMR) graphs. On the other hand, ensemble methods combine predictions from multiple models to create a new one that is more robust and accurate than individual predictions. In the literature, there are many ensembling techniques proposed for classification or regression problems, however, ensemble graph prediction has not been studied thoroughly. In this work, we formalize this problem as mining the largest graph that is the most supported by a collection of graph predictions. As the problem is NP-Hard, we propose an efficient heuristic algorithm to approximate the optimal solution. To validate our approach, we carried out experiments in AMR parsing problems. The experimental results demonstrate that the proposed approach can combine the strength of state-of-the-art AMR parsers to create new predictions that are more accurate than any individual models in five standard benchmark datasets.
Ensembling Graph Predictions for AMRParsing
In many machine learning tasks, models are trained to predict structure data such as graphs. For example, in natural language processing, it is very common to parse texts into dependency trees or abstract meaning representation (AMR) graphs. On the other hand, ensemble methods combine predictions from multiple models to create a new one that is more robust and accurate than individual predictions. In the literature, there are many ensembling techniques proposed for classification or regression problems, however, ensemble graph prediction has not been studied thoroughly. In this work, we formalize this problem as mining the largest graph that is the most supported by a collection of graph predictions. As the problem is NP-Hard, we propose an efficient heuristic algorithm to approximate the optimal solution. To validate our approach, we carried out experiments in AMR parsing problems. The experimental results demonstrate that the proposed approach can combine the strength of state-of-the-art AMR parsers to create new predictions that are more accurate than any individual models in five standard benchmark datasets.
Active clustering for labeling training data
We also algorithm family, propose as a conjecture that they reach the minimum average items and analyze their complexity. In the second model, we analyze a specific the algorithms that minimize the average number of queries required to cluster the independently following a fixed distribution. In the first model, we characterize they form is drawn uniformly, the other one where each item chooses its class items, we consider two random models for the classes: one where the set partition classes (which can be labeled cheaply at the very end of the process). Given the cheap task of answering pairwise queries, and the computer groups the items into for training data gathering where the human experts perform the comparatively to see whether they belong to the same class. Thus motivated, we propose a setting determining the correct labels is much more expensive than comparing two items most practical cases rely on humans-in-the-loop to label the data. The process of has a high impact on the performance of the learned function.
Multilingual Pre-training with Universal Dependency Learning
The pre-trained language model (PrLM) demonstrates domination in downstream natural language processing tasks, in which multilingual PrLM takes advantage of language universality to alleviate the issue of limited resources for low-resource languages. Despite its successes, the performance of multilingual PrLM is still unsatisfactory, when multilingual PrLMs only focus on plain text and ignore obvious universal linguistic structure clues. Existing PrLMs have shown that monolingual linguistic structure knowledge may bring about better performance. Thus we propose a novel multilingual PrLM that supports both explicit universal dependency parsing and implicit language modeling. Syntax in terms of universal dependency parse serves as not only pre-training objective but also learned representation in our model, which brings unprecedented PrLM interpretability and convenience in downstream task use. Our model outperforms two popular multilingual PrLM, multilingual-BERT and XLM-R, on cross-lingual natural language understanding (NLU) benchmarks and linguistic structure parsing datasets, demonstrating the effectiveness and stronger cross-lingual modeling capabilities of our approach.