AITopics | Grammars & Parsing

Collaborating Authors

Grammars & Parsing

News Overviews Instructional Materials AI-Alerts Classics

BanglaParaphrase: A High-Quality Bangla Paraphrase Dataset

Akil, Ajwad, Sultana, Najrin, Bhattacharjee, Abhik, Shahriyar, Rifat

arXiv.org Artificial IntelligenceOct-10-2022

In this work, we present BanglaParaphrase, a high-quality synthetic Bangla Paraphrase dataset curated by a novel filtering pipeline. We aim to take a step towards alleviating the low resource status of the Bangla language in the NLP domain through the introduction of BanglaParaphrase, which ensures quality by preserving both semantics and diversity, making it particularly useful to enhance other Bangla datasets. We show a detailed comparative analysis between our dataset and models trained on it with other existing works to establish the viability of our synthetic paraphrase data generation pipeline. We are making the dataset and models publicly available at https://github.com/csebuetnlp/banglaparaphrase to further the state of Bangla NLP.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2210.05109

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Indonesia > Bali (0.04)
Asia > Bangladesh (0.04)
(17 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.46)

Add feedback

Breaking BERT: Evaluating and Optimizing Sparsified Attention

Brahma, Siddhartha, Zablotskaia, Polina, Mimno, David

arXiv.org Artificial IntelligenceOct-7-2022

Transformers allow attention between all pairs of tokens, but there is reason to believe that most of these connections - and their quadratic time and memory - may not be necessary. But which ones? We evaluate the impact of sparsification patterns with a series of ablation experiments. First, we compare masks based on syntax, lexical similarity, and token position to random connections, and measure which patterns reduce performance the least. We find that on three common finetuning tasks even using attention that is at least 78% sparse can have little effect on performance if applied at later transformer layers, but that applying sparsity throughout the network reduces performance significantly. Second, we vary the degree of sparsity for three patterns supported by previous work, and find that connections to neighbouring tokens are the most significant. Finally, we treat sparsity as an optimizable parameter, and present an algorithm to learn degrees of neighboring connections that gives a fine-grained control over the accuracy-sparsity trade-off while approaching the performance of existing methods.

machine learning, natural language, sparsity, (19 more...)

arXiv.org Artificial Intelligence

2210.03841

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Denmark > Capital Region > Copenhagen (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(5 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.48)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Conversational Semantic Role Labeling with Predicate-Oriented Latent Graph

Fei, Hao, Wu, Shengqiong, Zhang, Meishan, Ren, Yafeng, Ji, Donghong

arXiv.org Artificial IntelligenceOct-6-2022

Conversational semantic role labeling (CSRL) is a newly proposed task that uncovers the shallow semantic structures in a dialogue text. Unfortunately several important characteristics of the CSRL task have been overlooked by the existing works, such as the structural information integration, near-neighbor influence. In this work, we investigate the integration of a latent graph for CSRL. We propose to automatically induce a predicate-oriented latent graph (POLar) with a predicate-centered Gaussian mechanism, by which the nearer and informative words to the predicate will be allocated with more attention. The POLar structure is then dynamically pruned and refined so as to best fit the task need. We additionally introduce an effective dialogue-level pre-trained language model, CoDiaBERT, for better supporting multiple utterance sentences and handling the speaker coreference issue in CSRL. Our system outperforms best-performing baselines on three benchmark CSRL datasets with big margins, especially achieving over 4% F1 score improvements on the cross-utterance argument detection. Further analyses are presented to better understand the effectiveness of our proposed methods.

artificial intelligence, natural language, text processing, (17 more...)

arXiv.org Artificial Intelligence

2210.03037

Country:

Asia > China > Hubei Province > Wuhan (0.04)
Asia > China > Heilongjiang Province > Harbin (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.86)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.63)

Add feedback

SynKB: Semantic Search for Synthetic Procedures

Bai, Fan, Ritter, Alan, Madrid, Peter, Freitag, Dayne, Niekrasz, John

arXiv.org Artificial IntelligenceOct-6-2022

In this paper we present SynKB, an open-source, automatically extracted knowledge base of chemical synthesis protocols. Similar to proprietary chemistry databases such as Reaxsys, SynKB allows chemists to retrieve structured knowledge about synthetic procedures. By taking advantage of recent advances in natural language processing for procedural texts, SynKB supports more flexible queries about reaction conditions, and thus has the potential to help chemists search the literature for conditions used in relevant reactions as they design new synthetic routes. Using customized Transformer models to automatically extract information from 6 million synthesis procedures described in U.S. and EU patents, we show that for many queries, SynKB has higher recall than Reaxsys, while maintaining high precision. We plan to make SynKB available as an open-source tool; in contrast, proprietary chemistry databases require costly subscriptions.

artificial intelligence, information retrieval, natural language, (20 more...)

arXiv.org Artificial Intelligence

2208.074

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Spain > Galicia > Madrid (0.05)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(2 more...)

Genre: Research Report (0.40)

Industry:

Government > Regional Government > North America Government > United States Government (0.71)
Materials > Chemicals > Commodity Chemicals (0.51)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.70)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.47)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.46)

Add feedback

Every word counts: A multilingual analysis of individual human alignment with model attention

Brandl, Stephanie, Hollenstein, Nora

arXiv.org Artificial IntelligenceOct-5-2022

We carry out this correlation reading (Morger et al., 2022; Eberle et al., 2022; analysis on the participants' respective native Bensemann et al., 2022; Hollenstein and Beinborn, languages (L1) and data from an English experiment 2021; Sood et al., 2020). This approach serves as (L2) of the same participants. We analyse an interpretability tool and helps to quantify the the influence of processing depth, i.e., quantifying cognitive plausibility of language models. However, the thoroughness of reading through the readers' what drives these correlations in terms of differences skipping behaviour, part-of-speech (POS) tags, and between individual readers has not been vocabulary knowledge in the form of LexTALE investigated.

artificial intelligence, correlation, natural language, (16 more...)

arXiv.org Artificial Intelligence

2210.04963

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Ireland > Leinster > County Dublin > Dublin (0.05)
Europe > Denmark > Capital Region > Copenhagen (0.05)
(6 more...)

Genre: Research Report (1.00)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.35)

Add feedback

Neural-Symbolic Recursive Machine for Systematic Generalization

Li, Qing, Zhu, Yixin, Liang, Yitao, Wu, Ying Nian, Zhu, Song-Chun, Huang, Siyuan

arXiv.org Artificial IntelligenceOct-4-2022

Despite the tremendous success, existing machine learning models still fall short of human-like systematic generalization -- learning compositional rules from limited data and applying them to unseen combinations in various domains. We propose Neural-Symbolic Recursive Machine (NSR) to tackle this deficiency. The core representation of NSR is a Grounded Symbol System (GSS) with combinatorial syntax and semantics, which entirely emerges from training data. Akin to the neuroscience studies suggesting separate brain systems for perceptual, syntactic, and semantic processing, NSR implements analogous separate modules of neural perception, syntactic parsing, and semantic reasoning, which are jointly learned by a deduction-abduction algorithm. We prove that NSR is expressive enough to model various sequence-to-sequence tasks. Superior systematic generalization is achieved via the inductive biases of equivariance and recursiveness embedded in NSR. In experiments, NSR achieves state-of-the-art performance in three benchmarks from different domains: SCAN for semantic parsing, PCFG for string manipulation, and HINT for arithmetic reasoning. Specifically, NSR achieves 100% generalization accuracy on SCAN and PCFG and outperforms state-of-the-art models on HINT by about 23%. Our NSR demonstrates stronger generalization than pure neural networks due to its symbolic representation and inductive biases. NSR also demonstrates better transferability than existing neural-symbolic approaches due to less domain-specific knowledge required.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2210.01603

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(2 more...)

Genre: Research Report (0.70)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

CGELBank: CGEL as a Framework for English Syntax Annotation

Reynolds, Brett, Arora, Aryaman, Schneider, Nathan

arXiv.org Artificial IntelligenceOct-1-2022

We introduce the syntactic formalism of the \textit{Cambridge Grammar of the English Language} (CGEL) to the world of treebanking through the CGELBank project. We discuss some issues in linguistic analysis that arose in adapting the formalism to corpus annotation, followed by quantitative and qualitative comparisons with parallel UD and PTB treebanks. We argue that CGEL provides a good tradeoff between comprehensiveness of analysis and usability for annotation, which motivates expanding the treebank with automatic conversion in the future.

artificial intelligence, cgelbank, natural language, (16 more...)

arXiv.org Artificial Intelligence

2210.00394

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Bulgaria > Sofia City Province > Sofia (0.04)
Europe > Slovenia (0.04)
(9 more...)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Add feedback

Emergence of order in random languages

De Giuli, Eric

arXiv.org Artificial IntelligenceSep-30-2022

We consider languages generated by weighted context-free grammars. It is shown that the behavior of large texts is controlled by saddle-point equations for an appropriate generating function. We then consider ensembles of grammars, in particular the Random Language Model of [1]. This model is solved in the replicasymmetric ansatz, which is valid in the high-temperature, disordered phase. It is shown that in the phase in which languages carry information, the replica symmetry must be broken. Keywords: context-free grammar, language, replicas Note: The body is this work is as published in J. Phys.

artificial intelligence, grammar, natural language, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1088/1751-8121/ab293c

1902.07516

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.69)

Add feedback

Compositional Semantic Parsing with Large Language Models

Drozdov, Andrew, Schärli, Nathanael, Akyürek, Ekin, Scales, Nathan, Song, Xinying, Chen, Xinyun, Bousquet, Olivier, Zhou, Denny

arXiv.org Artificial IntelligenceSep-29-2022

Humans can reason compositionally when presented with new tasks. Previous research shows that appropriate prompting techniques enable large language models (LLMs) to solve artificial compositional generalization tasks such as SCAN. In this work, we identify additional challenges in more realistic semantic parsing tasks with larger vocabulary and refine these prompting techniques to address them. Our best method is based on least-to-most prompting: it decomposes the problem using prompting-based syntactic parsing, then uses this decomposition to select appropriate exemplars and to sequentially generate the semantic parse. This method allows us to set a new state of the art for CFQ while requiring only 1% of the training data used by traditional approaches. Due to the general nature of our approach, we expect similar efforts will lead to new results in other tasks and domains, especially for knowledge-intensive applications.

machine learning, natural language, parse, (17 more...)

arXiv.org Artificial Intelligence

2209.15003

Country:

North America > United States > Washington > King County > Seattle (0.04)
North America > Dominican Republic (0.04)
Asia > Middle East > Republic of Türkiye (0.04)
(6 more...)

Genre:

Workflow (1.00)
Research Report > New Finding (0.65)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.45)

Add feedback

Generate-and-Retrieve: use your predictions to improve retrieval for semantic parsing

Zemlyanskiy, Yury, de Jong, Michiel, Ainslie, Joshua, Pasupat, Panupong, Shaw, Peter, Qiu, Linlu, Sanghai, Sumit, Sha, Fei

arXiv.org Artificial IntelligenceSep-29-2022

A common recent approach to semantic parsing augments sequence-to-sequence models by retrieving and appending a set of training samples, called exemplars. The effectiveness of this recipe is limited by the ability to retrieve informative exemplars that help produce the correct parse, which is especially challenging in low-resource settings. Existing retrieval is commonly based on similarity of query and exemplar inputs. We propose GandR, a retrieval procedure that retrieves exemplars for which outputs are also similar. GandRfirst generates a preliminary prediction with input-based retrieval. Then, it retrieves exemplars with outputs similar to the preliminary prediction which are used to generate a final prediction. GandR sets the state of the art on multiple low-resource semantic parsing tasks.

artificial intelligence, exemplar, natural language, (17 more...)

arXiv.org Artificial Intelligence

2209.14899

Country:

North America > United States > California (0.14)
North America > United States > New York (0.04)
North America > Dominican Republic (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Add feedback