Grammars & Parsing
LasUIE: Unifying Information Extraction with Latent Adaptive Structure-aware Generative Language Model
Fei, Hao, Wu, Shengqiong, Li, Jingye, Li, Bobo, Li, Fei, Qin, Libo, Zhang, Meishan, Zhang, Min, Chua, Tat-Seng
Universally modeling all typical information extraction tasks (UIE) with one generative language model (GLM) has revealed great potential by the latest study, where various IE predictions are unified into a linearized hierarchical expression under a GLM. Syntactic structure information, a type of effective feature which has been extensively utilized in IE community, should also be beneficial to UIE. In this work, we propose a novel structure-aware GLM, fully unleashing the power of syntactic knowledge for UIE. A heterogeneous structure inductor is explored to unsupervisedly induce rich heterogeneous structural representations by post-training an existing GLM. In particular, a structural broadcaster is devised to compact various latent trees into explicit high-order forests, helping to guide a better generation during decoding. We finally introduce a task-oriented structure fine-tuning mechanism, further adjusting the learned structures to most coincide with the end-task's need. Over 12 IE benchmarks across 7 tasks our system shows significant improvements over the baseline UIE system. Further in-depth analyses show that our GLM learns rich task-adaptive structural bias that greatly resolves the UIE crux, the long-range dependence issue and boundary identifying. Source codes are open at https://github.com/ChocoWu/LasUIE.
Innovations in Neural Data-to-text Generation: A Survey
Sharma, Mandar, Gogineni, Ajay, Ramakrishnan, Naren
The neural boom that has sparked natural language processing (NLP) research through the last decade has similarly led to significant innovations in data-to-text generation (DTG). This survey offers a consolidated view into the neural DTG paradigm with a structured examination of the approaches, benchmark datasets, and evaluation protocols. This survey draws boundaries separating DTG from the rest of the natural language generation (NLG) landscape, encompassing an up-to-date synthesis of the literature, and highlighting the stages of technological adoption from within and outside the greater NLG umbrella. With this holistic view, we highlight promising avenues for DTG research that not only focus on the design of linguistically capable systems but also systems that exhibit fairness and accountability.
Towards Efficient Fine-tuning of Pre-trained Code Models: An Experimental Study and Beyond
Shi, Ensheng, Wang, Yanlin, Zhang, Hongyu, Du, Lun, Han, Shi, Zhang, Dongmei, Sun, Hongbin
Recently, fine-tuning pre-trained code models such as CodeBERT on downstream tasks has achieved great success in many software testing and analysis tasks. While effective and prevalent, fine-tuning the pre-trained parameters incurs a large computational cost. In this paper, we conduct an extensive experimental study to explore what happens to layer-wise pre-trained representations and their encoded code knowledge during fine-tuning. We then propose efficient alternatives to fine-tune the large pre-trained code model based on the above findings. Our experimental study shows that (1) lexical, syntactic and structural properties of source code are encoded in the lower, intermediate, and higher layers, respectively, while the semantic property spans across the entire model. (2) The process of fine-tuning preserves most of the code properties. Specifically, the basic code properties captured by lower and intermediate layers are still preserved during fine-tuning. Furthermore, we find that only the representations of the top two layers change most during fine-tuning for various downstream tasks. (3) Based on the above findings, we propose Telly to efficiently fine-tune pre-trained code models via layer freezing. The extensive experimental results on five various downstream tasks demonstrate that training parameters and the corresponding time cost are greatly reduced, while performances are similar or better. Replication package including source code, datasets, and online Appendix is available at: \url{https://github.com/DeepSoftwareAnalytics/Telly}.
A Cheaper and Better Diffusion Language Model with Soft-Masked Noise
Chen, Jiaao, Zhang, Aston, Li, Mu, Smola, Alex, Yang, Diyi
Diffusion models that are based on iterative denoising have been recently proposed and leveraged in various generation tasks like image generation. Whereas, as a way inherently built for continuous data, existing diffusion models still have some limitations in modeling discrete data, e.g., languages. For example, the generally used Gaussian noise can not handle the discrete corruption well, and the objectives in continuous spaces fail to be stable for textual data in the diffusion process especially when the dimension is high. To alleviate these issues, we introduce a novel diffusion model for language modeling, Masked-Diffuse LM, with lower training cost and better performances, inspired by linguistic features in languages. Specifically, we design a linguistic-informed forward process which adds corruptions to the text through strategically soft-masking to better noise the textual data. Also, we directly predict the categorical distribution with cross-entropy loss function in every diffusion step to connect the continuous space and discrete space in a more efficient and straightforward way. Through experiments on 5 controlled generation tasks, we demonstrate that our Masked-Diffuse LM can achieve better generation quality than the state-of-the-art diffusion models with better efficiency.
RESDSQL: Decoupling Schema Linking and Skeleton Parsing for Text-to-SQL
Li, Haoyang, Zhang, Jing, Li, Cuiping, Chen, Hong
One of the recent best attempts at Text-to-SQL is the pre-trained language model. Due to the structural property of the SQL queries, the seq2seq model takes the responsibility of parsing both the schema items (i.e., tables and columns) and the skeleton (i.e., SQL keywords). Such coupled targets increase the difficulty of parsing the correct SQL queries especially when they involve many schema items and logic operators. This paper proposes a ranking-enhanced encoding and skeleton-aware decoding framework to decouple the schema linking and the skeleton parsing. Specifically, for a seq2seq encoder-decode model, its encoder is injected by the most relevant schema items instead of the whole unordered ones, which could alleviate the schema linking effort during SQL parsing, and its decoder first generates the skeleton and then the actual SQL query, which could implicitly constrain the SQL parsing. We evaluate our proposed framework on Spider and its three robustness variants: Spider-DK, Spider-Syn, and Spider-Realistic. The experimental results show that our framework delivers promising performance and robustness. Our code is available at https://github.com/RUCKBReasoning/RESDSQL.
Scallop: A Language for Neurosymbolic Programming
Li, Ziyang, Huang, Jiani, Naik, Mayur
We present Scallop, a language which combines the benefits of deep learning and logical reasoning. Scallop enables users to write a wide range of neurosymbolic applications and train them in a data- and compute-efficient manner. It achieves these goals through three key features: 1) a flexible symbolic representation that is based on the relational data model; 2) a declarative logic programming language that is based on Datalog and supports recursion, aggregation, and negation; and 3) a framework for automatic and efficient differentiable reasoning that is based on the theory of provenance semirings. We evaluate Scallop on a suite of eight neurosymbolic applications from the literature. Our evaluation demonstrates that Scallop is capable of expressing algorithmic reasoning in diverse and challenging AI tasks, provides a succinct interface for machine learning programmers to integrate logical domain knowledge, and yields solutions that are comparable or superior to state-of-the-art models in terms of accuracy. Furthermore, Scallop's solutions outperform these models in aspects such as runtime and data efficiency, interpretability, and generalizability.
Incremental Parsing by Modular Recurrent Connectionist Networks
We present a novel, modular, recurrent connectionist network architec(cid:173) ture which learns to robustly perform incremental parsing of complex sentences. From sequential input, one word at a time, our networks learn to do semantic role assignment, noun phrase attachment, and clause structure recognition for sentences with passive constructions and center embedded clauses. The networks make syntactic and semantic predictions at every point in time, and previous predictions are revised as expectations are affirmed or violated with the arrival of new informa(cid:173) tion. Our networks induce their own "grammar rules" for dynamically transforming an input sequence of words into a syntactic/semantic in(cid:173) terpretation. These networks generalize and display tolerance to input which has been corrupted in ways common in spoken language.
Generalization Performance in PARSEC - A Structured Connectionist Parsing Architecture
This paper presents PARSEC-a system for generating connectionist parsing networks from example parses. PARSEC is not based on formal grammar systems and is geared toward spoken language tasks. PARSEC networks exhibit three strengths important for application to speech pro(cid:173) cessing: 1) they learn to parse, and generalize well compared to hand(cid:173) coded grammars; 2) they tolerate several types of noise; 3) they can learn to use multi-modal input. Presented are the PARSEC architecture and performance analyses along several dimensions that demonstrate PARSEC's features. PARSEC's performance is compared to that of tra(cid:173) ditional grammar-based parsing systems.
Grammar Learning by a Self-Organizing Network
This paper presents the design and simulation results of a self(cid:173) organizing neural network which induces a grammar from exam(cid:173) ple sentences. Input sentences are generated from a simple phrase structure grammar including number agreement, verb transitiv(cid:173) ity, and recursive noun phrase construction rules. The network induces a grammar explicitly in the form of symbol categorization rules and phrase structure rules. The purpose of this research is to show that a self-organizing network with a certain structure can acquire syntactic knowledge from only positive (i.e. There has been research on supervised neural network models of language acquisi(cid:173) tion tasks [Elman, 1991, Miikkulainen and Dyer, 1988, John and McClelland, 1988].
Harmony Networks Do Not Work
Harmony networks have been proposed as a means by which con(cid:173) nectionist models can perform symbolic computation. Indeed, pro(cid:173) ponents claim that a harmony network can be built that constructs parse trees for strings in a context free language. This paper shows that harmony networks do not work in the following sense: they construct many outputs that are not valid parse trees. In order to show that the notion of systematicity is compatible with connectionism, Paul Smolensky, Geraldine Legendre and Yoshiro Miyata (Smolensky, Legendre, and Miyata 1992; Smolen sky 1993; Smolen sky, Legendre, and Miyata 1994) pro(cid:173) posed a mechanism, "Harmony Theory," by which connectionist models purportedly perform structure sensitive operations without implementing classical algorithms. Harmony theory describes a "harmony network" which, in the course of reaching a stable equilibrium, apparently computes parse trees that are valid according to the rules of a particular context-free grammar.