AITopics | Grammars & Parsing

Collaborating Authors

Grammars & Parsing

News Overviews Instructional Materials AI-Alerts Classics

Leveraging Linguistically Enhanced Embeddings for Open Information Extraction

Farooqui, Fauzan, Jayakumar, Thanmay, Mathur, Pulkit, Radke, Mansi

arXiv.org Artificial IntelligenceMar-20-2024

Open Information Extraction (OIE) is a structured prediction (SP) task in Natural Language Processing (NLP) that aims to extract structured $n$-ary tuples - usually subject-relation-object triples - from free text. The word embeddings in the input text can be enhanced with linguistic features, usually Part-of-Speech (PoS) and Syntactic Dependency Parse (SynDP) labels. However, past enhancement techniques cannot leverage the power of pretrained language models (PLMs), which themselves have been hardly used for OIE. To bridge this gap, we are the first to leverage linguistic features with a Seq2Seq PLM for OIE. We do so by introducing two methods - Weighted Addition and Linearized Concatenation. Our work can give any neural OIE architecture the key performance boost from both PLMs and linguistic features in one go. In our settings, this shows wide improvements of up to 24.9%, 27.3% and 14.9% on Precision, Recall and F1 scores respectively over the baseline. Beyond this, we address other important challenges in the field: to reduce compute overheads with the features, we are the first ones to exploit Semantic Dependency Parse (SemDP) tags; to address flaws in current datasets, we create a clean synthetic dataset; finally, we contribute the first known study of OIE behaviour in SP models.

computational linguistic, dataset, extraction, (15 more...)

arXiv.org Artificial Intelligence

2403.13903

Country:

North America > United States > Texas (0.04)
Asia > China > Hong Kong (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(14 more...)

Genre: Research Report (1.00)

Industry: Government (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.86)
Information Technology > Data Science > Data Mining > Text Mining (0.62)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.62)

Add feedback

eRST: A Signaled Graph Theory of Discourse Relations and Organization

Zeldes, Amir, Aoyama, Tatsuya, Liu, Yang Janet, Peng, Siyao, Das, Debopam, Gessler, Luke

arXiv.org Artificial IntelligenceMar-20-2024

In this article we present Enhanced Rhetorical Structure Theory (eRST), a new theoretical framework for computational discourse analysis, based on an expansion of Rhetorical Structure Theory (RST). The framework encompasses discourse relation graphs with tree-breaking, nonprojective and concurrent relations, as well as implicit and explicit signals which give explainable rationales to our analyses. We survey shortcomings of RST and other existing frameworks, such as Segmented Discourse Representation Theory (SDRT), the Penn Discourse Treebank (PDTB) and Discourse Dependencies, and address these using constructs in the proposed theory. We provide annotation, search and visualization tools for data, and present and evaluate a freely available corpus of English annotated according to our framework, encompassing 12 spoken and written genres with over 200K tokens. Finally, we discuss automatic parsing, evaluation metrics and applications for data in our framework.

proceedings, relation, secondary edge, (15 more...)

arXiv.org Artificial Intelligence

2403.1356

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > Canada > Ontario > Toronto (0.04)
(33 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.45)

Industry:

Government (0.67)
Media > News (0.67)
Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Document Author Classification Using Parsed Language Structure

Moon, Todd K, Gunther, Jacob H.

arXiv.org Artificial IntelligenceMar-19-2024

Over the years there has been ongoing interest in detecting authorship of a text based on statistical properties of the text, such as by using occurrence rates of noncontextual words. In previous work, these techniques have been used, for example, to determine authorship of all of \emph{The Federalist Papers}. Such methods may be useful in more modern times to detect fake or AI authorship. Progress in statistical natural language parsers introduces the possibility of using grammatical structure to detect authorship. In this paper we explore a new possibility for detecting authorship using grammatical structural information extracted using a statistical natural language parser. This paper provides a proof of concept, testing author classification based on grammatical structure on a set of "proof texts," The Federalist Papers and Sanditon which have been as test cases in previous authorship detection studies. Several features extracted from the statistical natural language parser were explored: all subtrees of some depth from any level; rooted subtrees of some depth, part of speech, and part of speech by level in the parse tree. It was found to be helpful to project the features into a lower dimensional space. Statistical experiments on these documents demonstrate that information from a statistical parser can, in fact, assist in distinguishing authors.

artificial intelligence, feature vector, natural language, (15 more...)

arXiv.org Artificial Intelligence

2403.13253

Country:

North America > United States > Virginia (0.04)
North America > United States > New York (0.04)
North America > United States > Utah > Cache County > Logan (0.04)
(5 more...)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Add feedback

A Closer Look at Claim Decomposition

Wanner, Miriam, Ebner, Seth, Jiang, Zhengping, Dredze, Mark, Van Durme, Benjamin

arXiv.org Artificial IntelligenceMar-18-2024

As generated text becomes more commonplace, it is increasingly important to evaluate how well-supported such text is by external knowledge sources. Many approaches for evaluating textual support rely on some method for decomposing text into its individual subclaims which are scored against a trusted reference. We investigate how various methods of claim decomposition -- especially LLM-based methods -- affect the result of an evaluation approach such as the recently proposed FActScore, finding that it is sensitive to the decomposition method used. This sensitivity arises because such metrics attribute overall textual support to the model that generated the text even though error can also come from the metric's decomposition step. To measure decomposition quality, we introduce an adaptation of FActScore, which we call DecompScore. We then propose an LLM-based approach to generating decompositions inspired by Bertrand Russell's theory of logical atomism and neo-Davidsonian semantics and demonstrate its improved decomposition quality over previous methods.

alfred hitchcock, decomposition, subclaim, (13 more...)

arXiv.org Artificial Intelligence

2403.11903

Country:

North America > United States > Minnesota (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Asia > Singapore (0.04)
(12 more...)

Genre: Research Report (0.50)

Industry:

Media > Film (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Media > Music (0.68)
Leisure & Entertainment > Sports > Football (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Image Parsing via Stochastic Scene Grammar

Neural Information Processing SystemsMar-15-2024, 00:46:22 GMT

This paper proposes a parsing algorithm for scene understanding which includes four aspects: computing 3D scene layout, detecting 3D objects (e.g.

contextual relation, production rule, relation, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.29)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Rules still work for Open Information Extraction

Hua, Jialin, Luo, Liangqing, Ping, Weiying, Liao, Yan, Tao, Chunhai, Lub, Xuewen

arXiv.org Artificial IntelligenceMar-15-2024

Open information extraction (OIE) aims to extract surface relations and their corresponding arguments from natural language text, irrespective of domain. This paper presents an innovative OIE model, APRCOIE, tailored for Chinese text. Diverging from previous models, our model generates extraction patterns autonomously. The model defines a new pattern form for Chinese OIE and proposes an automated pattern generation methodology. In that way, the model can handle a wide array of complex and diverse Chinese grammatical phenomena. We design a preliminary filter based on tensor computing to conduct the extraction procedure efficiently. To train the model, we manually annotated a large-scale Chinese OIE dataset. In the comparative evaluation, we demonstrate that APRCOIE outperforms state-of-the-art Chinese OIE models and significantly expands the boundaries of achievable OIE performance. The code of APRCOIE and the annotated dataset are released on GitHub (https://github.com/jialin666/APRCOIE_v1)

extraction, information extraction, proceedings, (13 more...)

arXiv.org Artificial Intelligence

2403.10758

Country:

North America > United States (0.28)
North America > Canada > Alberta > Census Division No. 6 > Calgary Metropolitan Region > Calgary (0.04)
Asia > China > Jiangxi Province > Nanchang (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.82)

Industry: Government > Regional Government (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

A Multilingual Perspective on Probing Gender Bias

Stańczak, Karolina

arXiv.org Artificial IntelligenceMar-15-2024

Gender bias represents a form of systematic negative treatment that targets individuals based on their gender. This discrimination can range from subtle sexist remarks and gendered stereotypes to outright hate speech. Prior research has revealed that ignoring online abuse not only affects the individuals targeted but also has broader societal implications. These consequences extend to the discouragement of women's engagement and visibility within public spheres, thereby reinforcing gender inequality. This thesis investigates the nuances of how gender bias is expressed through language and within language technologies. Significantly, this thesis expands research on gender bias to multilingual contexts, emphasising the importance of a multilingual and multicultural perspective in understanding societal biases. In this thesis, I adopt an interdisciplinary approach, bridging natural language processing with other disciplines such as political science and history, to probe gender bias in natural language and language models.

background information and human perception, gender inequality and discriminatory practice, multidimensional and multilingual approach, (16 more...)

arXiv.org Artificial Intelligence

2403.10699

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.27)
Asia > Middle East > Iraq (0.27)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
(119 more...)

Genre:

Workflow (1.00)
Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
(2 more...)

Industry:

Media > News (1.00)
Leisure & Entertainment (1.00)
Law > Civil Rights & Constitutional Law (1.00)
(12 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(9 more...)

Add feedback

MaiBaam: A Multi-Dialectal Bavarian Universal Dependency Treebank

Blaschke, Verena, Kovačić, Barbara, Peng, Siyao, Schütze, Hinrich, Plank, Barbara

arXiv.org Artificial IntelligenceMar-15-2024

Despite the success of the Universal Dependencies (UD) project exemplified by its impressive language breadth, there is still a lack in `within-language breadth': most treebanks focus on standard languages. Even for German, the language with the most annotations in UD, so far no treebank exists for one of its language varieties spoken by over 10M people: Bavarian. To contribute to closing this gap, we present the first multi-dialect Bavarian treebank (MaiBaam) manually annotated with part-of-speech and syntactic dependency information in UD, covering multiple text genres (wiki, fiction, grammar examples, social, non-fiction). We highlight the morphosyntactic differences between the closely-related Bavarian and German and showcase the rich variability of speakers' orthographies. Our corpus includes 15k tokens, covering dialects from all Bavarian-speaking areas spanning three countries. We provide baseline parsing and POS tagging results, which are lower than results obtained on German and vary substantially between different graph-based parsers. To support further research on Bavarian syntax, we make our dataset, language-specific guidelines and code publicly available.

bavarian, computational linguistic, treebank, (13 more...)

arXiv.org Artificial Intelligence

2403.10293

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.05)
Europe > Italy > Trentino-Alto Adige/Südtirol > South Tyrol (0.04)
Africa > Middle East > Egypt > Cairo Governorate > Cairo (0.04)
(24 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Tensor Decomposition for Fast Parsing with Latent-Variable PCFGs

Neural Information Processing SystemsMar-14-2024, 14:40:18 GMT

We describe an approach to speed-up inference with latent-variable PCFGs, which have been shown to be highly effective for natural language parsing. Our approach is based on a tensor formulation recently introduced for spectral estimation of latent-variable PCFGs coupled with a tensor decomposition algorithm well-known in the multilinear algebra literature. We also describe an error bound for this approximation, which gives guarantees showing that if the underlying tensors are well approximated, then the probability distribution over trees will also be well approximated. Empirical evaluation on real-world natural language parsing data demonstrates a significant speed-up at minimal cost for parsing performance.

algorithm, decomposition, tensor, (14 more...)

Neural Information Processing Systems

Country:

Africa > Senegal > Kolda Region > Kolda (0.04)
North America > United States > New York > New York County > New York City (0.04)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Add feedback

Learned Prioritization for Trading Off Accuracy and Speed Adam Teichert Hal Daumé III

Neural Information Processing SystemsMar-14-2024, 11:54:00 GMT

Users want inference to be both fast and accurate, but quality often comes at the cost of speed. The field has experimented with approximate inference algorithms that make different speed-accuracy tradeoffs (for particular problems and datasets). We aim to explore this space automatically, focusing here on the case of agenda-based syntactic parsing [12]. Unfortunately, off-the-shelf reinforcement learning techniques fail to learn good policies: the state space is simply too large to explore naively. An attempt to counteract this by applying imitation learning algorithms also fails: the "teacher" follows a far better policy than anything in our learner's policy space, free of the speed-accuracy tradeoff that arises when oracle information is unavailable, and thus largely insensitive to the known reward functfion. We propose a hybrid reinforcement/apprenticeship learning algorithm that learns to speed up an initial policy, trading off accuracy for speed according to various settings of a speed term in the loss function.

accuracy, constituent, trajectory, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Maryland > Prince George's County > College Park (0.04)
North America > United States > Maryland > Baltimore (0.04)
Europe > United Kingdom > England > Greater Manchester > Manchester (0.04)
Europe > Finland > Uusimaa > Helsinki (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.93)

Add feedback