AITopics | Grammars & Parsing

Collaborating Authors

Grammars & Parsing

News Overviews Instructional Materials AI-Alerts Classics

P(Expression|Grammar): Probability of deriving an algebraic expression with a probabilistic context-free grammar

Primožič, Urh, Todorovski, Ljupčo, Petković, Matej

arXiv.org Artificial IntelligenceDec-2-2022

Probabilistic context-free grammars have a long-term record of use as generative models in machine learning and symbolic regression. When used for symbolic regression, they generate algebraic expressions. We define the latter as equivalence classes of strings derived by grammar and address the problem of calculating the probability of deriving a given expression with a given grammar. We show that the problem is undecidable in general. We then present specific grammars for generating linear, polynomial, and rational expressions, where algorithms for calculating the probability of a given expression exist. For those grammars, we design algorithms for calculating the exact probability and efficient approximation with arbitrary precision.

artificial intelligence, grammar, natural language, (18 more...)

arXiv.org Artificial Intelligence

2212.00751

Country:

Europe > Slovenia > Central Slovenia > Municipality of Ljubljana > Ljubljana (0.04)
Europe > Bulgaria > Varna Province > Varna (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Add feedback

Joint Chinese Word Segmentation and Span-based Constituency Parsing

Wang, Zhicheng, Shi, Tianyu, Liu, Cong

arXiv.org Artificial IntelligenceNov-30-2022

In constituency parsing, span-based decoding is an important direction. However, for Chinese sentences, because of their linguistic characteristics, it is necessary to utilize other models to perform word segmentation first, which introduces a series of uncertainties and generally leads to errors in the computation of the constituency tree afterward. This work proposes a method for joint Chinese word segmentation and Span-based Constituency Parsing by adding extra labels to individual Chinese characters on the parse trees. Through experiments, the proposed algorithm outperforms the recent models for joint segmentation and constituency parsing on CTB 5.1.

artificial intelligence, natural language, word segmentation, (14 more...)

arXiv.org Artificial Intelligence

2211.01638

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Add feedback

PIZZA: A new benchmark for complex end-to-end task-oriented parsing

Arkoudas, Konstantine, Mesnards, Nicolas Guenon des, Rubino, Melanie, Swamy, Sandesh, Khanna, Saarthak, Sun, Weiqi, Haidar, Khan

arXiv.org Artificial IntelligenceNov-30-2022

Much recent work in task-oriented parsing has focused on finding a middle ground between flat slots and intents, which are inexpressive but easy to annotate, and powerful representations such as the lambda calculus, which are expressive but costly to annotate. This paper continues the exploration of task-oriented parsing by introducing a new dataset for parsing pizza and drink orders, whose semantics cannot be captured by flat slots and intents. We perform an extensive evaluation of deep-learning techniques for task-oriented parsing on this dataset, including different flavors of seq2seq systems and RNNGs. The dataset comes in two main versions, one in a recently introduced utterance-level hierarchical notation that we call TOP, and one whose targets are executable representations (EXR). We demonstrate empirically that training the parser to directly generate EXR notation not only solves the problem of entity resolution in one fell swoop and overcomes a number of expressive limitations of TOP notation, but also results in significantly greater parsing accuracy.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2212.00265

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Pennsylvania (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(2 more...)

Genre: Research Report (0.64)

Industry: Consumer Products & Services (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Lexicon-injected Semantic Parsing for Task-Oriented Dialog

Meng, Xiaojun, Dai, Wenlin, Wang, Yasheng, Wang, Baojun, Wu, Zhiyong, Jiang, Xin, Liu, Qun

arXiv.org Artificial IntelligenceNov-26-2022

Recently, semantic parsing using hierarchical representations for dialog systems has captured substantial attention. Task-Oriented Parse (TOP), a tree representation with intents and slots as labels of nested tree nodes, has been proposed for parsing user utterances. Previous TOP parsing methods are limited on tackling unseen dynamic slot values (e.g., new songs and locations added), which is an urgent matter for real dialog systems. To mitigate this issue, we first propose a novel span-splitting representation for span-based parser that outperforms existing methods. Then we present a novel lexicon-injected semantic parser, which collects slot labels of tree representation as a lexicon, and injects lexical features to the span representation of parser. An additional slot disambiguation technique is involved to remove inappropriate span match occurrences from the lexicon. Our best parser produces a new state-of-the-art result (87.62%) on the TOP dataset, and demonstrates its adaptability to frequently updated slot lexicon entries in real task-oriented dialog, with no need of retraining.

artificial intelligence, natural language, representation, (17 more...)

arXiv.org Artificial Intelligence

2211.14508

Country:

Asia > China > Guangdong Province > Shenzhen (0.04)
North America > United States > Pennsylvania (0.04)
North America > United States > California > Monterey County > Pacific Grove (0.04)
Asia > Singapore (0.04)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Add feedback

Exposure and Emergence in Usage-Based Grammar: Computational Experiments in 35 Languages

Dunn, Jonathan

arXiv.org Artificial IntelligenceNov-25-2022

This paper uses computational experiments to explore the role of exposure in the emergence of construction grammars. While usage-based grammars are hypothesized to depend on a learner's exposure to actual language use, the mechanisms of such exposure have only been studied in a few constructions in isolation. This paper experiments with (i) the growth rate of the constructicon, (ii) the convergence rate of grammars exposed to independent registers, and (iii) the rate at which constructions are forgotten when they have not been recently observed. These experiments show that the lexicon grows more quickly than the grammar and that the growth rate of the grammar is not dependent on the growth rate of the lexicon. At the same time, register-specific grammars converge onto more similar constructions as the amount of exposure increases. This means that the influence of specific registers becomes less important as exposure increases. Finally, the rate at which constructions are forgotten when they have not been recently observed mirrors the growth rate of the constructicon. This paper thus presents a computational model of usage-based grammar that includes both the emergence and the unentrenchment of constructions.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.18710/CES0L8

2211.1416

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Pacific Ocean (0.04)
Oceania > New Zealand > South Island > Canterbury Region > Christchurch (0.04)
(5 more...)

Genre:

Research Report > New Finding (0.92)
Research Report > Experimental Study (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
(2 more...)

Add feedback

Transition-based Semantic Role Labeling with Pointer Networks

Fernández-González, Daniel

arXiv.org Artificial IntelligenceNov-24-2022

Semantic role labeling (SRL) focuses on recognizing the predicate-argument structure of a sentence and plays a critical role in many natural language processing tasks such as machine translation and question answering. Practically all available methods do not perform full SRL, since they rely on pre-identified predicates, and most of them follow a pipeline strategy, using specific models for undertaking one or several SRL subtasks. In addition, previous approaches have a strong dependence on syntactic information to achieve state-of-the-art performance, despite being syntactic trees equally hard to produce. These simplifications and requirements make the majority of SRL systems impractical for real-world applications. In this article, we propose the first transition-based SRL approach that is capable of completely processing an input sentence in a single left-to-right pass, with neither leveraging syntactic information nor resorting to additional modules. Thanks to our implementation based on Pointer Networks, full SRL can be accurately and efficiently done in $O(n^2)$, achieving the best performance to date on the majority of languages from the CoNLL-2009 shared task.

computational linguistic, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2205.10023

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Oceania > Australia > Victoria > Melbourne (0.04)
Europe > Germany > Berlin (0.04)
(20 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The Four Principles of Semantic Parsing - DataScienceCentral.com

#artificialintelligenceNov-23-2022, 09:00:54 GMT

Over the last few decades, I have frequently heard vendors and developers talking about structured data, unstructured data, semi-structured data, and so forth. The arguments about what constitutes each of these categories get fairly vocal, particularly since everyone has a rough intuitive idea about what constitutes structure (tables) and what doesn't (text). However, I had an epiphany the other day that makes the distinction between the two obvious, and it has nothing to do with the amount of text a given "blob" of data contains. I'd argue that four principals dictate how data is structured: The first of these, the Parser Principle, describes what we mean by Structured Data. If a parser exists for identifying components within a block of text (a sequence of characters), then that text is structured.

datatype, information, parser, (13 more...)

#artificialintelligence

Country:

North America > United States > Washington > King County > Seattle (0.04)
Europe (0.04)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.87)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)

Add feedback

Automatic extraction of materials and properties from superconductors scientific literature

Foppiano, Luca, de Castro, Pedro Baptista, Suarez, Pedro Ortiz, Terashima, Kensei, Takano, Yoshihiko, Ishii, Masashi

arXiv.org Artificial IntelligenceNov-22-2022

The automatic extraction of materials and related properties from the scientific literature is gaining attention in data-driven materials science (Materials Informatics). In this paper, we discuss Grobid-superconductors, our solution for automatically extracting superconductor material names and respective properties from text. Built as a Grobid module, it combines machine learning and heuristic approaches in a multi-step architecture that supports input data as raw text or PDF documents. Using Grobid-superconductors, we built SuperCon2, a database of 40324 materials and properties records from 37700 papers. The material (or sample) information is represented by name, chemical formula, and material class, and is characterized by shape, doping, substitution variables for components, and substrate as adjoined information. The properties include the Tc superconducting critical temperature and, when available, applied pressure with the Tc measurement method.

information, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1080/27660400.2022.2153633

2210.156

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Europe > Netherlands > South Holland > Delft (0.05)
Asia > Japan > Honshū > Kantō > Ibaraki Prefecture > Tsukuba (0.04)
(6 more...)

Genre:

Workflow (0.93)
Research Report (0.64)

Industry: Materials > Chemicals > Industrial Gases (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.94)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.70)
(2 more...)

Add feedback

Leveraging Data Recasting to Enhance Tabular Reasoning

Jena, Aashna, Gupta, Vivek, Shrivastava, Manish, Eisenschlos, Julian Martin

arXiv.org Artificial IntelligenceNov-22-2022

Creating challenging tabular inference data is essential for learning complex reasoning. Prior work has mostly relied on two data generation strategies. The first is human annotation, which yields linguistically diverse data but is difficult to scale. The second category for creation is synthetic generation, which is scalable and cost effective but lacks inventiveness. In this research, we present a framework for semi-automatically recasting existing tabular data to make use of the benefits of both approaches. We utilize our framework to build tabular NLI instances from five datasets that were initially intended for tasks like table2text creation, tabular Q/A, and semantic parsing. We demonstrate that recasted data could be used as evaluation benchmarks as well as augmentation data to enhance performance on tabular NLI tasks. Furthermore, we investigate the effectiveness of models trained on recasted data in the zero-shot scenario, and analyse trends in performance across different recasted datasets types.

artificial intelligence, computational linguistic, natural language, (17 more...)

arXiv.org Artificial Intelligence

2211.12641

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Dominican Republic (0.05)
North America > United States > Utah (0.04)
(12 more...)

Genre: Research Report > New Finding (0.67)

Industry:

Government > Regional Government > North America Government > United States Government (0.67)
Leisure & Entertainment > Sports (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.87)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)

Add feedback

PESE: Event Structure Extraction using Pointer Network based Encoder-Decoder Architecture

Kuila, Alapan, Sarkar, Sudeshan

arXiv.org Artificial IntelligenceNov-22-2022

The task of event extraction (EE) aims to find the events and event-related argument information from the text and represent them in a structured format. Most previous works try to solve the problem by separately identifying multiple substructures and aggregating them to get the complete event structure. The problem with the methods is that it fails to identify all the interdependencies among the event participants (event-triggers, arguments, and roles). In this paper, we represent each event record in a unique tuple format that contains trigger phrase, trigger type, argument phrase, and corresponding role information. Our proposed pointer network-based encoder-decoder model generates an event tuple in each time step by exploiting the interactions among event participants and presenting a truly end-to-end solution to the EE task. We evaluate our model on the ACE2005 dataset, and experimental results demonstrate the effectiveness of our model by achieving competitive performance compared to the state-of-the-art methods.

argument, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2211.12157

Country:

Asia > Middle East > Iraq > Baghdad Governorate > Baghdad (0.05)
Asia > Middle East > Palestine (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(9 more...)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.68)

Add feedback