post-processing strategy
Achieving Timestamp Prediction While Recognizing with Non-Autoregressive End-to-End ASR Model
Shi, Xian, Chen, Yanni, Zhang, Shiliang, Yan, Zhijie
Conventional ASR systems use frame-level phoneme posterior to conduct force-alignment~(FA) and provide timestamps, while end-to-end ASR systems especially AED based ones are short of such ability. This paper proposes to perform timestamp prediction~(TP) while recognizing by utilizing continuous integrate-and-fire~(CIF) mechanism in non-autoregressive ASR model - Paraformer. Foucing on the fire place bias issue of CIF, we conduct post-processing strategies including fire-delay and silence insertion. Besides, we propose to use scaled-CIF to smooth the weights of CIF output, which is proved beneficial for both ASR and TP task. Accumulated averaging shift~(AAS) and diarization error rate~(DER) are adopted to measure the quality of timestamps and we compare these metrics of proposed system and conventional hybrid force-alignment system. The experiment results over manually-marked timestamps testset show that the proposed optimization methods significantly improve the accuracy of CIF timestamps, reducing 66.7\% and 82.1\% of AAS and DER respectively. Comparing to Kaldi force-alignment trained with the same data, optimized CIF timestamps achieved 12.3\% relative AAS reduction.
1Cademy @ Causal News Corpus 2022: Enhance Causal Span Detection via Beam-Search-based Position Selector
Chen, Xingran, Zhang, Ge, Nik, Adam, Li, Mingyu, Fu, Jie
In this paper, we present our approach and empirical observations for Cause-Effect Signal Span Detection -- Subtask 2 of Shared task 3~\cite{tan-etal-2022-event} at CASE 2022. The shared task aims to extract the cause, effect, and signal spans from a given causal sentence. We model the task as a reading comprehension (RC) problem and apply a token-level RC-based span prediction paradigm to the task as the baseline. We explore different training objectives to fine-tune the model, as well as data augmentation (DA) tricks based on the language model (LM) for performance improvement. Additionally, we propose an efficient beam-search post-processing strategy to due with the drawbacks of span detection to obtain a further performance gain. Our approach achieves an average $F_1$ score of 54.15 and ranks \textbf{$1^{st}$} in the CASE competition. Our code is available at \url{https://github.com/Gzhang-umich/1CademyTeamOfCASE}.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- South America > Colombia > Meta Department > Villavicencio (0.04)
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
- (3 more...)
Simultaneous Identification of Tweet Purpose and Position
Iyer, Rahul Radhakrishnan, Pei, Yulong, Sycara, Katia
Tweet classification has attracted considerable attention recently. Most of the existing work on tweet classification focuses on topic classification, which classifies tweets into several predefined categories, and sentiment classification, which classifies tweets into positive, negative and neutral. Since tweets are different from conventional text in that they generally are of limited length and contain informal, irregular or new words, so it is difficult to determine user intention to publish a tweet and user attitude towards certain topic. In this paper, we aim to simultaneously classify tweet purpose, i.e., the intention for user to publish a tweet, and position, i.e., supporting, opposing or being neutral to a given topic. By transforming this problem to a multi-label classification problem, a multi-label classification method with post-processing is proposed. Experiments on real-world data sets demonstrate the effectiveness of this method and the results outperform the individual classification methods.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > Oregon (0.04)
- Asia (0.04)
- Law (0.48)
- Information Technology (0.46)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.68)
- Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.68)
- Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.50)