Goto

Collaborating Authors

 dsu


Compact Speech Translation Models via Discrete Speech Units Pretraining

Lam, Tsz Kin, Birch, Alexandra, Haddow, Barry

arXiv.org Artificial Intelligence

We propose a pretraining method to use Self-Supervised Speech (SSS) model to creating more compact Speech-to-text Translation. In contrast to using the SSS model for initialization, our method is more suitable to memory constrained scenario such as on-device deployment. Our method is based on Discrete Speech Units (DSU) extracted from the SSS model. In the first step, our method pretrains two smaller encoder-decoder models on 1) Filterbank-to-DSU (Fbk-to-DSU) and 2) DSU-to-Translation (DSU-to-Trl) data respectively. The DSU thus become the distillation inputs of the smaller models. Subsequently, the encoder from the Fbk-to-DSU model and the decoder from the DSU-to-Trl model are taken to initialise the compact model. Finally, the compact model is finetuned on the paired Fbk-Trl data. In addition to being compact, our method requires no transcripts, making it applicable to low-resource settings. It also avoids speech discretization in inference and is more robust to the DSU tokenization. Evaluation on CoVoST-2 (X-En) shows that our method has consistent improvement over the baseline in three metrics while being compact i.e., only half the SSS model size.


DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding

Shon, Suwon, Kim, Kwangyoun, Hsu, Yi-Te, Sridhar, Prashant, Watanabe, Shinji, Livescu, Karen

arXiv.org Artificial Intelligence

The integration of pre-trained text-based large language models (LLM) with speech input has enabled instruction-following capabilities for diverse speech tasks. This integration requires the use of a speech encoder, a speech adapter, and an LLM, trained on diverse tasks. We propose the use of discrete speech units (DSU), rather than continuous-valued speech encoder outputs, that are converted to the LLM token embedding space using the speech adapter. We generate DSU using a self-supervised speech encoder followed by k-means clustering. The proposed model shows robust performance on speech inputs from seen/unseen domains and instruction-following capability in spoken question answering. We also explore various types of DSU extracted from different layers of the self-supervised speech encoder, as well as Mel frequency Cepstral Coefficients (MFCC). Our findings suggest that the ASR task and datasets are not crucial in instruction-tuning for spoken question answering tasks.


Test-Time Style Shifting: Handling Arbitrary Styles in Domain Generalization

Park, Jungwuk, Han, Dong-Jun, Kim, Soyeong, Moon, Jaekyun

arXiv.org Artificial Intelligence

In domain generalization (DG), the target domain is unknown when the model is being trained, and the trained model should successfully work on an arbitrary (and possibly unseen) target domain during inference. This is a difficult problem, and despite active studies in recent years, it remains a great challenge. In this paper, we take a simple yet effective approach to tackle this issue. We propose test-time style shifting, which shifts the style of the test sample (that has a large style gap with the source domains) to the nearest source domain that the model is already familiar with, before making the prediction. This strategy enables the model to handle any target domains with arbitrary style statistics, without additional model update at test-time. Additionally, we propose style balancing, which provides a great platform for maximizing the advantage of test-time style shifting by handling the DG-specific imbalance issues. The proposed ideas are easy to implement and successfully work in conjunction with various other DG schemes. Experimental results on different datasets show the effectiveness of our methods.


DSU to offer artificial intelligence degrees

#artificialintelligence

A pair of artificial intelligence degrees are coming to Dakota State University. In March, the Board of Regents approved a bachelor of science degree in AI, which will be offered by the Beacom College of Computer and Cyber Sciences. Now another program, geared more toward the workplace, has the green light. The Board of Regents has given Dakota State University the okay to offer a bachelor of science in Artificial Intelligence in Organizations. Instead of focusing on the computer science side of AI, this degree will be offered through the College and Business and Information Systems.