Not enough data to create a plot.
Try a different view from the menu above.
Fu, Li
Do self-supervised speech and language models extract similar representations as human brain?
Chen, Peili, He, Linyang, Fu, Li, Fan, Lu, Chang, Edward F., Li, Yuanning
Speech and language models trained through self-supervised learning (SSL) demonstrate strong alignment with brain activity during speech and language perception. However, given their distinct training modalities, it remains unclear whether they correlate with the same neural aspects. We directly address this question by evaluating the brain prediction performance of two representative SSL models, Wav2Vec2.0 and GPT-2, designed for speech and language tasks. Our findings reveal that both models accurately predict speech responses in the auditory cortex, with a significant correlation between their brain predictions. Notably, shared speech contextual information between Wav2Vec2.0 and GPT-2 accounts for the majority of explained variance in brain activity, surpassing static semantic and lower-level acoustic-phonetic information. These results underscore the convergence of speech contextual representations in SSL models and their alignment with the neural network underlying speech perception, offering valuable insights into both SSL models and the neural basis of speech and language processing.
Defining implication relation for classical logic
Fu, Li
In classical logic, "P implies Q" is equivalent to "not-P or Q". It is well known that the equivalence is problematic. Actually, from "P implies Q", "not-P or Q" can be inferred ("Implication-to-disjunction" is valid), while from "not-P or Q", "P implies Q" cannot be inferred in general ("Disjunction-to-implication" is not generally valid), so the equivalence between them is invalid in general. This work aims to remove exactly the incorrect Disjunction-to-implication from classical logic (CL). The paper proposes a logical system (IRL) with the expected properties: (1) CL is simply obtained by adding Disjunction-to-implication to IRL, and (2) Disjunction-to-implication is independent of IRL (either Disjunction-to-implication or its negation cannot be derived in IRL) in the general case. In other words, IRL is just the system obtained by exactly removing Disjunction-to-implication from CL.
Incremental Learning for End-to-End Automatic Speech Recognition
Fu, Li, Li, Xiaoxiao, Zi, Libo
We propose a new incremental learning for end-to-end Automatic Speech Recognition (ASR) to extend the model's capacity on a new task while retaining the performance on previous ones. The proposed method is effective without accessing to the old dataset to address the issues of high retraining cost and unavailable old dataset. To achieve this, both attention distillation and knowledge distillation are applied to preserve the ability of the old model during the progressive learning. With an ASR model pre-trained on 12,000h Mandarin speech, we test our proposed method on 300h new scenario task and 1h new named entities task. Experiments show that our method yields 3.25% and 0.88% absolute Character Error Rate (CER) reduction on the new scenario, when compared with the pre-trained model and the full-data retraining baseline, respectively. It even yields a surprising 0.37% absolute CER reduction on the new scenario than the fine-tuning. For the new named entities task, our method significantly improves the accuracy compared with the pre-trained model, i.e. 16.95% absolute CER reduction. For both of the new task adaptions, the new models still maintain a same accuracy with the retraining baseline on the old tasks.