Cheng, Lizhi
A Scope Sensitive and Result Attentive Model for Multi-Intent Spoken Language Understanding
Cheng, Lizhi, Yang, Wenmian, Jia, Weijia
Multi-Intent Spoken Language Understanding (SLU), a novel and more complex scenario of SLU, is attracting increasing attention. Unlike traditional SLU, each intent in this scenario has its specific scope. Semantic information outside the scope even hinders the prediction, which tremendously increases the difficulty of intent detection. More seriously, guiding slot filling with these inaccurate intent labels suffers error propagation problems, resulting in unsatisfied overall performance. To solve these challenges, in this paper, we propose a novel Scope-Sensitive Result Attention Network (SSRAN) based on Transformer, which contains a Scope Recognizer (SR) and a Result Attention Network (RAN). Scope Recognizer assignments scope information to each token, reducing the distraction of out-of-scope tokens. Result Attention Network effectively utilizes the bidirectional interaction between results of slot filling and intent detection, mitigating the error propagation problem. Experiments on two public datasets indicate that our model significantly improves SLU performance (5.4\% and 2.1\% on Overall accuracy) over the state-of-the-art baseline.
An Effective Non-Autoregressive Model for Spoken Language Understanding
Cheng, Lizhi, Jia, Weijia, Yang, Wenmian
Spoken Language Understanding (SLU), a core component of the task-oriented dialogue system, expects a shorter inference latency due to the impatience of humans. Non-autoregressive SLU models clearly increase the inference speed but suffer uncoordinated-slot problems caused by the lack of sequential dependency information among each slot chunk. To gap this shortcoming, in this paper, we propose a novel non-autoregressive SLU model named Layered-Refine Transformer, which contains a Slot Label Generation (SLG) task and a Layered Refine Mechanism (LRM). SLG is defined as generating the next slot label with the token sequence and generated slot labels. With SLG, the non-autoregressive model can efficiently obtain dependency information during training and spend no extra time in inference. LRM predicts the preliminary SLU results from Transformer's middle states and utilizes them to guide the final prediction. Experiments on two public datasets indicate that our model significantly improves SLU performance (1.5\% on Overall accuracy) while substantially speed up (more than 10 times) the inference process over the state-of-the-art baseline.
Training GANs with Centripetal Acceleration
Peng, Wei, Dai, Yuhong, Zhang, Hui, Cheng, Lizhi
Training generative adversarial networks (GANs) often suffers from cyclic behaviors of iterates. Based on a simple intuition that the direction of centripetal acceleration of an object moving in uniform circular motion is toward the center of the circle, we present the Simultaneous Centripetal Acceleration (SCA) method and the Alternating Centripetal Acceleration (ACA) method to alleviate the cyclic behaviors. Under suitable conditions, gradient descent methods with either SCA or ACA are shown to be linearly convergent for bilinear games. Numerical experiments are conducted by applying ACA to existing gradient-based algorithms in a GAN setup scenario, which demonstrate the superiority of ACA.
Iteratively Linearized Reweighted Alternating Direction Method of Multipliers for a Class of Nonconvex Problems
Sun, Tao, Jiang, Hao, Cheng, Lizhi, Zhu, Wei
In this paper, we consider solving a class of nonconvex and nonsmooth problems frequently appearing in signal processing and machine learning research. The traditional alternating direction method of multipliers encounters troubles in both mathematics and computations in solving the nonconvex and nonsmooth subproblem. In view of this, we propose a reweighted alternating direction method of multipliers. In this algorithm, all subproblems are convex and easy to solve. We also provide several guarantees for the convergence and prove that the algorithm globally converges to a critical point of an auxiliary function with the help of the Kurdyka-{\L}ojasiewicz property. Several numerical results are presented to demonstrate the efficiency of the proposed algorithm.
A convergence frame for inexact nonconvex and nonsmooth algorithms and its applications to several iterations
Sun, Tao, Jiang, Hao, Cheng, Lizhi, Zhu, Wei
In this paper, we consider the convergence of an abstract inexact nonconvex and nonsmooth algorithm. We promise a pseudo sufficient descent condition and a pseudo relative error condition, which both are related to an auxiliary sequence, for the algorithm; and a continuity condition is assumed to hold. In fact, a wide of classical inexact nonconvex and nonsmooth algorithms allow these three conditions. Under the finite energy assumption on the auxiliary sequence, we prove the sequence generated by the general algorithm converges to a critical point of the objective function if being assumed Kurdyka- Lojasiewicz property. The core of the proofs lies on building a new Lyapunov function, whose successive difference provides a bound for the successive difference of the points generated by the algorithm. And then, we apply our findings to several classical nonconvex iterative algorithms and derive corresponding convergence results.