Huang, Chenyang
Learning second-order TVD flux limiters using differentiable solvers
Huang, Chenyang, Sebastian, Amal S., Viswanathan, Venkatasubramanian
This paper presents a data-driven framework for learning optimal second-order total variation diminishing (TVD) flux limiters via differentiable simulations. In our fully differentiable finite volume solvers, the limiter functions are replaced by neural networks. By representing the limiter as a pointwise convex linear combination of the Minmod and Superbee limiters, we enforce both second-order accuracy and TVD constraints at all stages of training. Our approach leverages gradient-based optimization through automatic differentiation, allowing a direct backpropagation of errors from numerical solutions to the limiter parameters. We demonstrate the effectiveness of this method on various hyperbolic conservation laws, including the linear advection equation, the Burgers' equation, and the one-dimensional Euler equations. Remarkably, a limiter trained solely on linear advection exhibits strong generalizability, surpassing the accuracy of most classical flux limiters across a range of problems with shocks and discontinuities. The learned flux limiters can be readily integrated into existing computational fluid dynamics codes, and the proposed methodology also offers a flexible pathway to systematically develop and optimize flux limiters for complex flow problems.
A Decoding Algorithm for Length-Control Summarization Based on Directed Acyclic Transformers
Huang, Chenyang, Zhou, Hao, Jen, Cameron, Zheng, Kangjie, Zaïane, Osmar R., Mou, Lili
Length-control summarization aims to condense long texts into a short one within a certain length limit. Previous approaches often use autoregressive (AR) models and treat the length requirement as a soft constraint, which may not always be satisfied. In this study, we propose a novel length-control decoding algorithm based on the Directed Acyclic Transformer (DAT). Our approach allows for multiple plausible sequence fragments and predicts a \emph{path} to connect them. In addition, we propose a Sequence Maximum a Posteriori (SeqMAP) decoding algorithm that marginalizes different possible paths and finds the most probable summary satisfying the length budget. Our algorithm is based on beam search, which further facilitates a reranker for performance improvement. Experimental results on the Gigaword and DUC2004 datasets demonstrate our state-of-the-art performance for length-control summarization.
Multilingual Non-Autoregressive Machine Translation without Knowledge Distillation
Huang, Chenyang, Huang, Fei, Zheng, Zaixiang, Zaïane, Osmar R., Zhou, Hao, Mou, Lili
Multilingual neural machine translation (MNMT) aims at using one single model for multiple translation directions. Recent work applies non-autoregressive Transformers to improve the efficiency of MNMT, but requires expensive knowledge distillation (KD) processes. To this end, we propose an M-DAT approach to non-autoregressive multilingual machine translation. Our system leverages the recent advance of the directed acyclic Transformer (DAT), which does not require KD. We further propose a pivot back-translation (PivotBT) approach to improve the generalization to unseen translation directions. Experiments show that our M-DAT achieves state-of-the-art performance in non-autoregressive MNMT.
OTTAWA: Optimal TransporT Adaptive Word Aligner for Hallucination and Omission Translation Errors Detection
Huang, Chenyang, Ghaddar, Abbas, Kobyzev, Ivan, Rezagholizadeh, Mehdi, Zaiane, Osmar R., Chen, Boxing
Recently, there has been considerable attention on detecting hallucinations and omissions in Machine Translation (MT) systems. The two dominant approaches to tackle this task involve analyzing the MT system's internal states or relying on the output of external tools, such as sentence similarity or MT quality estimators. In this work, we introduce OTTAWA, a novel Optimal Transport (OT)-based word aligner specifically designed to enhance the detection of hallucinations and omissions in MT systems. Our approach explicitly models the missing alignments by introducing a "null" vector, for which we propose a novel one-side constrained OT setting to allow an adaptive null alignment. Our approach yields competitive results compared to state-of-the-art methods across 18 language pairs on the HalOmi benchmark. In addition, it shows promising features, such as the ability to distinguish between both error types and perform word-level detection without accessing the MT system's internal states.
Enhancing Argument Summarization: Prioritizing Exhaustiveness in Key Point Generation and Introducing an Automatic Coverage Evaluation Metric
Khosravani, Mohammad, Huang, Chenyang, Trabelsi, Amine
The proliferation of social media platforms has given rise to the amount of online debates and arguments. Consequently, the need for automatic summarization methods for such debates is imperative, however this area of summarization is rather understudied. The Key Point Analysis (KPA) task formulates argument summarization as representing the summary of a large collection of arguments in the form of concise sentences in bullet-style format, called key points. A sub-task of KPA, called Key Point Generation (KPG), focuses on generating these key points given the arguments. This paper introduces a novel extractive approach for key point generation, that outperforms previous state-of-the-art methods for the task. Our method utilizes an extractive clustering based approach that offers concise, high quality generated key points with higher coverage of reference summaries, and less redundant outputs. In addition, we show that the existing evaluation metrics for summarization such as ROUGE are incapable of differentiating between generated key points of different qualities. To this end, we propose a new evaluation metric for assessing the generated key points by their coverage. Our code can be accessed online.
EBBS: An Ensemble with Bi-Level Beam Search for Zero-Shot Machine Translation
Wen, Yuqiao, Shayegh, Behzad, Huang, Chenyang, Cao, Yanshuai, Mou, Lili
Machine translation is a widely applicable NLP task that translates a text from a source language to a target language Brown et al. (1990); Bahdanau et al. (2015). The Transformer architecture Vaswani et al. (2017) and pretrained large language models Radford et al. (2019); Raffel et al. (2020); Lewis et al. (2020) have largely improved translation performance, especially in the supervised setting, where a model can learn from large volumes of parallel corpora. However, machine translation remains challenging for low-resource languages, because there are not enough data for large neural networks to learn these languages. We specifically focus on multilingual translation in the zero-shot setting, where the system is required to translate between unseen language pairs. Since collecting parallel data and training individual models for every translation pair are prohibitively expensive, it is common to build a single multilingual system Johnson et al. (2017); Fan et al. (2021) that can perform translation for all language pairs, most of which are zero-shot translation directions with few exceptions (e.g., English). These models work by prepending a language-indicator token, and zero-shot ability emerges as the model generalizes from trained language pairs to unseen ones (Liu et al., 2021; Wicks and Duh, 2022).
Basic and Depression Specific Emotion Identification in Tweets: Multi-label Classification Experiments
Farruque, Nawshad, Huang, Chenyang, Zaiane, Osmar, Goebel, Randy
We choose our basic emotions from a hybrid emotion model consisting of the common emotions from four highly regarded psychological models of emotions. Moreover, we augment that emotion model with new emotion categories because of their importance in the analysis of depression. Most of those additional emotions have not been used in previous emotion mining research. Our experimental analyses show that a cost sensitive RankSVM algorithm and a Deep Learning model are both robust, measured by both Macro F-measures and Micro F-measures. This suggests that these algorithms are superior in addressing the widely known data imbalance problem in multi-label learning. Moreover, our application of Deep Learning performs the best, giving it an edge in modeling deep semantic features of our extended emotional categories.
Generating Responses Expressing Emotion in an Open-domain Dialogue System
Huang, Chenyang, Zaïane, Osmar R.
Neural network-based Open-ended conversational agents automatically generate responses based on predictive models learned from a large number of pairs of utterances. The generated responses are typically acceptable as a sentence but are often dull, generic, and certainly devoid of any emotion. In this paper, we present neural models that learn to express a given emotion in the generated response. We propose four models and evaluate them against 3 baselines. An encoder-decoder framework-based model with multiple attention layers provides the best overall performance in terms of expressing the required emotion. While it does not outperform other models on all emotions, it presents promising results in most cases.