Zhao, Yiwen
ESPnet-SpeechLM: An Open Speech Language Model Toolkit
Tian, Jinchuan, Shi, Jiatong, Chen, William, Arora, Siddhant, Masuyama, Yoshiki, Maekaku, Takashi, Wu, Yihan, Peng, Junyi, Bharadwaj, Shikhar, Zhao, Yiwen, Cornell, Samuele, Peng, Yifan, Yue, Xiang, Yang, Chao-Han Huck, Neubig, Graham, Watanabe, Shinji
We present ESPnet-SpeechLM, an open toolkit designed to democratize the development of speech language models (SpeechLMs) and voice-driven agentic applications. The toolkit standardizes speech processing tasks by framing them as universal sequential modeling problems, encompassing a cohesive workflow of data preprocessing, pre-training, inference, and task evaluation. With ESPnet-SpeechLM, users can easily define task templates and configure key settings, enabling seamless and streamlined SpeechLM development. The toolkit ensures flexibility, efficiency, and scalability by offering highly configurable modules for every stage of the workflow. To illustrate its capabilities, we provide multiple use cases demonstrating how competitive SpeechLMs can be constructed with ESPnet-SpeechLM, including a 1.7B-parameter model pre-trained on both text and speech tasks, across diverse benchmarks. The toolkit and its recipes are fully transparent and reproducible at: https://github.com/espnet/espnet/tree/speechlm.
Guided Time-optimal Model Predictive Control of a Multi-rotor
Zhang, Guangyu, Zheng, Yongjie, He, Yuqing, Yang, Liying, Nie, Hongyu, Huang, Chaoxiong, Zhao, Yiwen
Time-optimal control of a multi-rotor remains an open problem due to the under-actuation and nonlinearity of its dynamics, which make it difficult to solve this problem directly. In this paper, the time-optimal control problem of the multi-rotor is studied. Firstly, a thrust limit optimal decomposition method is proposed, which can reasonably decompose the limited thrust into three directions according to the current state and the target state. As a result, the thrust limit constraint is decomposed as a linear constraint. With the linear constraint and decoupled dynamics, a time-optimal guidance trajectory can be obtained. Then, a cost function is defined based on the time-optimal guidance trajectory, which has a quadratic form and can be used to evaluate the time-optimal performance of the system outputs. Finally, based on the cost function, the time-optimal control problem is reformulated as an MPC (Model Predictive Control) problem. The experimental results demonstrate the feasibility and validity of the proposed methods.