AITopics | Grammars & Parsing

Collaborating Authors

Grammars & Parsing

News Overviews Instructional Materials AI-Alerts Classics

QASem Parsing: Text-to-text Modeling of QA-based Semantics

Klein, Ayal, Hirsch, Eran, Eliav, Ron, Pyatkin, Valentina, Caciularu, Avi, Dagan, Ido

arXiv.org Artificial IntelligenceFeb-14-2023

Several recent works have suggested to represent semantic relations with questions and answers, decomposing textual information into separate interrogative natural language statements. In this paper, we consider three QA-based semantic tasks - namely, QA-SRL, QANom and QADiscourse, each targeting a certain type of predication - and propose to regard them as jointly providing a comprehensive representation of textual information. To promote this goal, we investigate how to best utilize the power of sequence-to-sequence (seq2seq) pre-trained language models, within the unique setup of semi-structured outputs, consisting of an unordered set of question-answer pairs. We examine different input and output linearization strategies, and assess the effect of multitask learning and of simple data augmentation techniques in the setting of imbalanced training data. Consequently, we release the first unified QASem parsing tool, practical for downstream applications who can benefit from an explicit, QA-based account of information units in a text.

computational linguistic, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2205.11413

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > United Kingdom > Scotland (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(11 more...)

Genre: Research Report (1.00)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.85)

Add feedback

The Question Answering concept part3 (Machine Learning)

#artificialintelligenceFeb-13-2023, 21:55:43 GMT

Abstract: In this paper, we are interested in developing semantic parsers which understand natural language questions embedded in a conversation with a user and ground them to formal queries over definitions in a general purpose knowledge graph (KG) with very large vocabularies (covering thousands of concept names and relations, and millions of entities). To this end, we develop a dataset where user questions are annotated with Sparql parses and system answers correspond to execution results thereof. We present two different semantic parsing approaches and highlight the challenges of the task: dealing with large vocabularies, modelling conversation context, predicting queries with multiple entities, and generalising to new questions at test time. We hope our dataset will serve as useful testbed for the development of conversational semantic parsers.

concept part3, machine learning, semantic parser, (2 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Add feedback

Why Can't Discourse Parsing Generalize? A Thorough Investigation of the Impact of Data Diversity

Liu, Yang Janet, Zeldes, Amir

arXiv.org Artificial IntelligenceFeb-13-2023

Recent advances in discourse parsing performance create the impression that, as in other NLP tasks, performance for high-resource languages such as English is finally becoming reliable. In this paper we demonstrate that this is not the case, and thoroughly investigate the impact of data diversity on RST parsing stability. We show that state-of-the-art architectures trained on the standard English newswire benchmark do not generalize well, even within the news domain. Using the two largest RST corpora of English with text from multiple genres, we quantify the impact of genre diversity in training data for achieving generalization to text types unseen during training. Our results show that a heterogeneous training regime is critical for stable and generalizable models, across parser architectures. We also provide error analyses of model outputs and out-of-domain performance. To our knowledge, this study is the first to fully evaluate cross-corpus RST parsing generalizability on complete trees, examine between-genre degradation within an RST corpus, and investigate the impact of genre diversity in training data composition.

computational linguistic, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2302.06488

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States > New Mexico > Santa Fe County > Santa Fe (0.04)
(25 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.68)

Add feedback

An Extended Sequence Tagging Vocabulary for Grammatical Error Correction

Mesham, Stuart, Bryant, Christopher, Rei, Marek, Yuan, Zheng

arXiv.org Artificial IntelligenceFeb-12-2023

We extend a current sequence-tagging approach to Grammatical Error Correction (GEC) by introducing specialised tags for spelling correction and morphological inflection using the SymSpell and LemmInflect algorithms. Our approach improves generalisation: the proposed new tagset allows a smaller number of tags to correct a larger range of errors. Our results show a performance improvement both overall and in the targeted error categories. We further show that ensembles trained with our new tagset outperform those trained with the baseline tagset on the public BEA benchmark.

artificial intelligence, data quality, natural language, (19 more...)

arXiv.org Artificial Intelligence

2302.05913

Country:

North America > United States > Washington > King County > Seattle (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Italy > Tuscany > Florence (0.04)
(13 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.72)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Information Technology > Data Science > Data Quality > Data Cleaning (0.62)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.46)

Add feedback

Metaphor Detection with Effective Context Denoising

Wang, Shun, Li, Yucheng, Lin, Chenghua, Barrault, Loïc, Guerin, Frank

arXiv.org Artificial IntelligenceFeb-11-2023

Metaphor is a pervasive linguistic device, which Some recent efforts (Le et al., 2020; Song et al., attracts attention from both the fields of psycholinguistics 2021a) attempt to improve context modelling by and computational linguistics due to the explicitly leveraging the syntactic structure (e.g., key role it plays in the cognitive and communicative dependency parse tree) of a sentence in order to capture functions of language (Wilks, 1978; Lakoff important context words, where the parse trees and Johnson, 1980; Lakoff, 1993). Linguistically, are typically encoded with graph convolutional neural metaphor is defined as a figurative expression that networks. MelBERT (Choi et al., 2021) employs uses one or several words to represent another concept a simple chunking method which separates given the context, rather than taking the literal sub-sentences by commas.

artificial intelligence, metaphor detection, natural language, (15 more...)

arXiv.org Artificial Intelligence

2302.05611

Country:

North America > United States > Illinois > Cook County > Chicago (0.05)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > United Kingdom > England > Surrey (0.04)
(4 more...)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Add feedback

Bootstrapping Multilingual Semantic Parsers using Large Language Models

Awasthi, Abhijeet, Gupta, Nitish, Samanta, Bidisha, Dave, Shachi, Sarawagi, Sunita, Talukdar, Partha

arXiv.org Artificial IntelligenceFeb-11-2023

Despite cross-lingual generalization demonstrated by pre-trained multilingual models, the translate-train paradigm of transferring English datasets across multiple languages remains to be a key mechanism for training task-specific multilingual models. However, for many low-resource languages, the availability of a reliable translation service entails significant amounts of costly human-annotated translation pairs. Further, translation services may continue to be brittle due to domain mismatch between task-specific input text and general-purpose text used for training translation models. For multilingual semantic parsing, we demonstrate the effectiveness and flexibility offered by large language models (LLMs) for translating English datasets into several languages via few-shot prompting. Through extensive comparisons on two public datasets, MTOP and MASSIVE, spanning 50 languages and several domains, we show that our method of translating data using LLMs outperforms a strong translate-train baseline on 41 out of 50 languages. We study the key design choices that enable more effective multilingual data translation via prompted LLMs.

large language model, machine learning, translation, (16 more...)

arXiv.org Artificial Intelligence

2210.07313

Country:

Europe > Middle East > Cyprus > Nicosia > Nicosia (0.05)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
North America > United States > Virginia > Arlington County > Arlington (0.04)
(3 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.40)

Add feedback

A Large-Scale Multilingual Study of Visual Constraints on Linguistic Selection of Descriptions

Berger, Uri, Frermann, Lea, Stanovsky, Gabriel, Abend, Omri

arXiv.org Artificial IntelligenceFeb-9-2023

We present a large, multilingual study into how vision constrains linguistic choice, covering four languages and five linguistic properties, such as verb transitivity or use of numerals. We propose a novel method that leverages existing corpora of images with captions written by native speakers, and apply it to nine corpora, comprising 600k images and 3M captions. We study the relation between visual input and linguistic choices by training classifiers to predict the probability of expressing a property from raw images, and find evidence supporting the claim that linguistic properties are constrained by visual context across languages. We complement this investigation with a corpus study, taking the test case of numerals. Specifically, we use existing annotations (number or type of objects) to investigate the effect of different visual conditions on the use of numeral expressions in captions, and show that similar patterns emerge across languages. Our methods and findings both confirm and extend existing research in the cognitive literature. We additionally discuss possible applications for language generation.

caption, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2302.04811

Country:

Europe > Germany > Berlin (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(12 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.46)
(2 more...)

Add feedback

Syntax-guided Neural Module Distillation to Probe Compositionality in Sentence Embeddings

Pandey, Rohan

arXiv.org Artificial IntelligenceFeb-8-2023

Past work probing compositionality in sentence embedding models faces issues determining the causal impact of implicit syntax representations. Given a sentence, we construct a neural module net based on its syntax parse and train it end-to-end to approximate the sentence's embedding generated by a transformer model. The distillability of a transformer to a Syntactic NeurAl Module Net (SynNaMoN) then captures whether syntax is a strong causal model of its compositional ability. Furthermore, we address questions about the geometry of semantic composition by specifying individual SynNaMoN modules' internal architecture & linearity. We find differences in the distillability of various sentence embedding models that broadly correlate with their performance, but observe that distillability doesn't considerably vary by model size. We also present preliminary evidence that much syntax-guided composition in sentence embedding models is linear, and that non-linearities may serve primarily to handle non-compositional phrases.

computational linguistic, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2301.08998

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New Jersey (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.89)

Add feedback

Learning of Structurally Unambiguous Probabilistic Grammars

Fisman, Dana, Nitay, Dolav, Ziv-Ukelson, Michal

arXiv.org Artificial IntelligenceFeb-7-2023

The problem of identifying a probabilistic context free grammar has two aspects: the first is determining the grammar's topology (the rules of the grammar) and the second is estimating probabilistic weights for each rule. Given the hardness results for learning context-free grammars in general, and probabilistic grammars in particular, most of the literature has concentrated on the second problem. In this work we address the first problem. We restrict attention to structurally unambiguous weighted context-free grammars (SUWCFG) and provide a query learning algorithm for structurally unambiguous probabilistic context-free grammars (SUPCFG). We show that SUWCFG can be represented using co-linear multiplicity tree automata (CMTA), and provide a polynomial learning algorithm that learns CMTAs. We show that the learned CMTA can be converted into a probabilistic grammar, thus providing a complete algorithm for learning a structurally unambiguous probabilistic context free grammar (both the grammar topology and the probabilistic weights) using structured membership queries and structured equivalence queries. A summarized version of this work was published at AAAI 21 [NFZ21].

artificial intelligence, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.46298/lmcs-19(1:10)2023

2203.09441

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Texas > Travis County > Austin (0.04)
(5 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.92)

Add feedback

Real-Word Error Correction with Trigrams: Correcting Multiple Errors in a Sentence

Dashti, Seyed MohammadSadegh

arXiv.org Artificial IntelligenceFeb-7-2023

Spelling correction is a fundamental task in Text Mining. In this study, we assess the real-word error correction model proposed by Mays, Damerau and Mercer and describe several drawbacks of the model. We propose a new variation which focuses on detecting and correcting multiple real-word errors in a sentence, by manipulating a Probabilistic Context-Free Grammar (PCFG) to discriminate between items in the search space. We test our approach on the Wall Street Journal corpus and show that it outperforms Hirst and Budanitsky's WordNet-based method and Wilcox-O'Hearn, Hirst, and Budanitsky's fixed windows size method.-O'Hearn, Hirst, and Budanitsky's fixed windows size method.

artificial intelligence, natural language, text processing, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/s10579-017-9397-4

2302.04096

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > California > Santa Cruz County > Santa Cruz (0.04)
(3 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Add feedback