AITopics | Grammars & Parsing

Collaborating Authors

Grammars & Parsing

News Overviews Instructional Materials AI-Alerts Classics

Improving Unsupervised Constituency Parsing via Maximizing Semantic Information

Chen, Junjie, He, Xiangheng, Miyao, Yusuke, Bollegala, Danushka

arXiv.org Artificial IntelligenceOct-3-2024

Unsupervised constituency parsers organize phrases within a sentence into a tree-shaped syntactic constituent structure that reflects the organization of sentence semantics. However, the traditional objective of maximizing sentence log-likelihood (LL) does not explicitly account for the close relationship between the constituent structure and the semantics, resulting in a weak correlation between LL values and parsing accuracy. In this paper, we introduce a novel objective for training unsupervised parsers: maximizing the information between constituent structures and sentence semantics (SemInfo). We introduce a bag-of-substrings model to represent the semantics and apply the probability-weighted information metric to estimate the SemInfo. Additionally, we develop a Tree Conditional Random Field (TreeCRF)-based model to apply the SemInfo maximization objective to Probabilistic Context-Free Grammar (PCFG) induction, the state-of-the-art method for unsupervised constituency parsing. Experiments demonstrate that SemInfo correlates more strongly with parsing accuracy than LL. Our algorithm significantly enhances parsing accuracy by an average of 7.85 points across five PCFG variants and in four languages, achieving new state-of-the-art results in three of the four languages.

computational linguistic, constituent structure, information, (10 more...)

arXiv.org Artificial Intelligence

2410.02558

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
North America > United States > Massachusetts > Middlesex County > Malden (0.04)
(18 more...)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Add feedback

Make Compound Sentences Simple to Analyze: Learning to Split Sentences for Aspect-based Sentiment Analysis

Seo, Yongsik, Song, Sungwon, Heo, Ryang, Kim, Jieyong, Lee, Dongha

arXiv.org Artificial IntelligenceOct-3-2024

In the domain of Aspect-Based Sentiment Analysis (ABSA), generative methods have shown promising results and achieved substantial advancements. However, despite these advancements, the tasks of extracting sentiment quadruplets, which capture the nuanced sentiment expressions within a sentence, remain significant challenges. In particular, compound sentences can potentially contain multiple quadruplets, making the extraction task increasingly difficult as sentence complexity grows. To address this issue, we are focusing on simplifying sentence structures to facilitate the easier recognition of these elements and crafting a model that integrates seamlessly with various ABSA tasks. In this paper, we propose Aspect Term Oriented Sentence Splitter (ATOSS), which simplifies compound sentence into simpler and clearer forms, thereby clarifying their structure and intent. As a plug-and-play module, this approach retains the parameters of the ABSA model while making it easier to identify essential intent within input sentences. Extensive experimental results show that utilizing ATOSS outperforms existing methods in both ASQP and ACOS tasks, which are the primary tasks for extracting sentiment quadruplets.

absa model, quadruplet, split sentence, (16 more...)

arXiv.org Artificial Intelligence

2410.02297

Country:

Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Europe > Italy (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Consumer Products & Services (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.82)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.76)

Add feedback

Matrix and Relative Weak Crossover in Japanese: An Experimental Investigation

Fukushima, Haruka, Plesniak, Daniel, Bekki, Daisuke

arXiv.org Artificial IntelligenceOct-2-2024

This paper provides evidence that weak crossover effects differ in nature between matrix and relative clauses. Fukushima et al. (2024) provided similar evidence, showing that, when various non-structural factors were eliminated English speakers never accepted matrix weak crossover cases, but often accepted relative weak crossover ones. Those results were limited, however, by English word order, which lead to uncertainty as to whether this difference was due to the effects of linear precedence or syntactic structure. In this paper, to distinguish between these two possibilities, we conduct an experiment using Japanese, which lacks the word-order confound that English had. We find results that are qualitatively in line with Fukushima et al. (2024) suggesting that the relevant distinction is structural and not based simply on precedence.

bva, interpretation, matrix and relative weak crossover, (13 more...)

arXiv.org Artificial Intelligence

2410.02149

Country:

Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.48)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States > Massachusetts (0.04)
(4 more...)

Genre: Research Report (0.64)

Industry: Automobiles & Trucks > Manufacturer (0.31)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.55)

Add feedback

Unifying the Scope of Bridging Anaphora Types in English: Bridging Annotations in ARRAU and GUM

Levine, Lauren, Zeldes, Amir

arXiv.org Artificial IntelligenceOct-1-2024

Comparing bridging annotations across coreference resources is difficult, largely due to a lack of standardization across definitions and annotation schemas and narrow coverage of disparate text domains across resources. To alleviate domain coverage issues and consolidate schemas, we compare guidelines and use interpretable predictive models to examine the bridging instances annotated in the GUM, GENTLE and ARRAU corpora. Examining these cases, we find that there is a large difference in types of phenomena annotated as bridging. Beyond theoretical results, we release a harmonized, subcategorized version of the test sets of GUM, GENTLE and the ARRAU Wall Street Journal data to promote meaningful and reliable evaluation of bridging resolution across domains.

anaphor, annotation, classifier, (15 more...)

arXiv.org Artificial Intelligence

2410.0117

Country:

North America > United States > Maryland > Baltimore (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
North America > Dominican Republic (0.04)
(8 more...)

Genre:

Research Report (0.50)
Overview (0.46)

Industry:

Law (0.46)
Media > News (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.52)

Add feedback

Cross-lingual Back-Parsing: Utterance Synthesis from Meaning Representation for Zero-Resource Semantic Parsing

Kang, Deokhyung, Hwang, Seonjeong, Kim, Yunsu, Lee, Gary Geunbae

arXiv.org Artificial IntelligenceOct-1-2024

Recent efforts have aimed to utilize multilingual pretrained language models (mPLMs) to extend semantic parsing (SP) across multiple languages without requiring extensive annotations. However, achieving zero-shot cross-lingual transfer for SP remains challenging, leading to a performance gap between source and target languages. In this study, we propose Cross-Lingual Back-Parsing (CBP), a novel data augmentation methodology designed to enhance cross-lingual transfer for SP. Leveraging the representation geometry of the mPLMs, CBP synthesizes target language utterances from source meaning representations. Our methodology effectively performs cross-lingual data augmentation in challenging zero-resource settings, by utilizing only labeled data in the source language and monolingual corpora. Extensive experiments on two cross-language SP benchmarks (Mschema2QA and Xspider) demonstrate that CBP brings substantial gains in the target language. Further analysis of the synthesized utterances shows that our method successfully generates target language utterances with high slot value alignment rates while preserving semantic integrity. Our codes and data are publicly available at https://github.com/deokhk/CBP.

representation, target language, utterance, (15 more...)

arXiv.org Artificial Intelligence

2410.00513

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Middle East > Cyprus > Nicosia > Nicosia (0.04)
North America > Canada > Ontario > Toronto (0.04)
(3 more...)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Add feedback

Grammar Induction from Visual, Speech and Text

Zhao, Yu, Fei, Hao, Wu, Shengqiong, Zhang, Meishan, Zhang, Min, Chua, Tat-seng

arXiv.org Artificial IntelligenceSep-30-2024

Grammar Induction could benefit from rich heterogeneous signals, such as text, vision, and acoustics. In the process, features from distinct modalities essentially serve complementary roles to each other. With such intuition, this work introduces a novel \emph{unsupervised visual-audio-text grammar induction} task (named \textbf{VAT-GI}), to induce the constituent grammar trees from parallel images, text, and speech inputs. Inspired by the fact that language grammar natively exists beyond the texts, we argue that the text has not to be the predominant modality in grammar induction. Thus we further introduce a \emph{textless} setting of VAT-GI, wherein the task solely relies on visual and auditory inputs. To approach the task, we propose a visual-audio-text inside-outside recursive autoencoder (\textbf{VaTiora}) framework, which leverages rich modal-specific and complementary features for effective grammar parsing. Besides, a more challenging benchmark data is constructed to assess the generalization ability of VAT-GI system. Experiments on two benchmark datasets demonstrate that our proposed VaTiora system is more effective in incorporating the various multimodal signals, and also presents new state-of-the-art performance of VAT-GI.

artificial intelligence, natural language, template style, (17 more...)

arXiv.org Artificial Intelligence

2410.03739

Country: North America > United States > District of Columbia > Washington (0.05)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.53)

Add feedback

Leveraging Surgical Activity Grammar for Primary Intention Prediction in Laparoscopy Procedures

Zhang, Jie, Zhou, Song, Wang, Yiwei, Wan, Chidan, Zhao, Huan, Cai, Xiong, Ding, Han

arXiv.org Artificial IntelligenceSep-29-2024

Surgical procedures are inherently complex and dynamic, with intricate dependencies and various execution paths. Accurate identification of the intentions behind critical actions, referred to as Primary Intentions (PIs), is crucial to understanding and planning the procedure. This paper presents a novel framework that advances PI recognition in instructional videos by combining top-down grammatical structure with bottom-up visual cues. The grammatical structure is based on a rich corpus of surgical procedures, offering a hierarchical perspective on surgical activities. A grammar parser, utilizing the surgical activity grammar, processes visual data obtained from laparoscopic images through surgical action detectors, ensuring a more precise interpretation of the visual information. Experimental results on the benchmark dataset demonstrate that our method outperforms existing surgical activity detectors that rely solely on visual features. Our research provides a promising foundation for developing advanced robotic surgical systems with enhanced planning and automation capabilities.

grammar, procedure, recognition, (15 more...)

arXiv.org Artificial Intelligence

2409.19579

Country:

North America > United States (0.14)
Asia > China > Hubei Province > Wuhan (0.05)
South America > Peru > Lima Department > Lima Province > Lima (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Surgery (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.98)

Add feedback

data2lang2vec: Data Driven Typological Features Completion

Amirzadeh, Hamidreza, Jafari, Sadegh, Harju, Anika, van der Goot, Rob

arXiv.org Artificial IntelligenceSep-25-2024

Language typology databases enhance multi-lingual Natural Language Processing (NLP) by improving model adaptability to diverse linguistic structures. The widely-used lang2vec toolkit integrates several such databases, but its coverage remains limited at 28.9\%. Previous work on automatically increasing coverage predicts missing values based on features from other languages or focuses on single features, we propose to use textual data for better-informed feature prediction. To this end, we introduce a multi-lingual Part-of-Speech (POS) tagger, achieving over 70\% accuracy across 1,749 languages, and experiment with external statistical features and a variety of machine learning algorithms. We also introduce a more realistic evaluation setup, focusing on likely to be missing typology features, and show that our approach outperforms previous work in both setups.

classifier, computational linguistic, target feature, (13 more...)

arXiv.org Artificial Intelligence

2409.17373

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Middle East > Iran (0.05)
North America > United States > Hawaii (0.04)
(7 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.91)

Add feedback

Parse Trees Guided LLM Prompt Compression

Mao, Wenhao, Hou, Chengbin, Zhang, Tianyu, Lin, Xinyu, Tang, Ke, Lv, Hairong

arXiv.org Artificial IntelligenceSep-23-2024

Offering rich contexts to Large Language Models (LLMs) has shown to boost the performance in various tasks, but the resulting longer prompt would increase the computational cost and might exceed the input limit of LLMs. Recently, some prompt compression methods have been suggested to shorten the length of prompts by using language models to generate shorter prompts or by developing computational models to select important parts of original prompt. The generative compression methods would suffer from issues like hallucination, while the selective compression methods have not involved linguistic rules and overlook the global structure of prompt. To this end, we propose a novel selective compression method called PartPrompt. It first obtains a parse tree for each sentence based on linguistic rules, and calculates local information entropy for each node in a parse tree. These local parse trees are then organized into a global tree according to the hierarchical structure such as the dependency of sentences, paragraphs, and sections. After that, the root-ward propagation and leaf-ward propagation are proposed to adjust node values over the global tree. Finally, a recursive algorithm is developed to prune the global tree based on the adjusted node values. The experiments show that PartPrompt receives the state-of-the-art performance across various datasets, metrics, compression ratios, and target LLMs for inference. The in-depth ablation studies confirm the effectiveness of designs in PartPrompt, and other additional experiments also demonstrate its superiority in terms of the coherence of compressed prompts and in the extreme long prompt scenario.

node, parse tree, partprompt, (16 more...)

arXiv.org Artificial Intelligence

2409.15395

Country:

Asia > Kazakhstan > Almaty Region > Almaty (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Belgium (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Bilingual Rhetorical Structure Parsing with Large Parallel Annotations

Chistova, Elena

arXiv.org Artificial IntelligenceSep-23-2024

Discourse parsing is a crucial task in natural language processing that aims to reveal the higher-level relations in a text. Despite growing interest in cross-lingual discourse parsing, challenges persist due to limited parallel data and inconsistencies in the Rhetorical Structure Theory (RST) application across languages and corpora. To address this, we introduce a parallel Russian annotation for the large and diverse English GUM RST corpus. Leveraging recent advances, our end-to-end RST parser achieves state-of-the-art results on both English and Russian corpora. It demonstrates effectiveness in both monolingual and bilingual settings, successfully transferring even with limited second-language annotation. To the best of our knowledge, this work is the first to evaluate the potential of cross-lingual end-to-end RST parsing on a manually annotated parallel corpus.

annotation, computational linguistic, proceedings, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2024.findings-acl.577

2409.14969

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Texas > Travis County > Austin (0.14)
North America > United States > Maryland > Baltimore (0.04)
(16 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback