Grammars & Parsing
Supplementary Material: Strongly Incremental Constituency Parsing with Graph Neural Networks
The root node must be an internal node. Now we are ready to state and prove Theorem 1 and Theorem 2 in the main paper. We prove the correctness of Algorithm 1 by induction on the sentence length n . " to denote the execution trace taking the Case 1-1-- last_leaf has siblings, and last_subtree is the root node. We have last_subtree = last_leaf (the first conditional statement).
A Dynamic Programs For SSK Evaluations and Gradients We now detail recursive calculation strategies for calculating k n (a, b) and its gradients with O (nl
A recursive strategy is able to efficiently calculate the contributions of particular substring, pre-calculating contributions of the smaller sub-strings contained within the target string. Context-free grammars (CFG) are 4-tuples G = ( V, Σ,R,S), consisting of: a set of non-terminal symbols V, a set of terminal symbols Σ (also known as an alphabet), a set of production rules R, a non-terminal starting symbol S from which all strings are generated. The CFG for the symbolic regression task of Section 5.3 is given by the following rules: S S '+' T S S ' ' T S S '/' T S T T '(' S ')' T ' sin (' S ')' T'exp (' S ')' T'x' T '1' T '2' T '3', We now provide implementation details for our GA acquisition function optimizers. The GA begins with a randomly sampled population and ends once the best string in the population stops improving between iterations (Algorithm 1). Although seemingly simple tasks, our synthetic string optimization tasks of Section 5.1 are deceptively We now provide comprehensive experimental results across the synthetic string optimization tasks.
Appendix: Structured Reordering for Modeling Latent Alignments in Sequence Transduction
WCFG to PCFG Conversion The algorithm of converting a WCFG to its equivalent PCFG is shown in Algorithm 1. Full proof of this equivalence can be found in Smith and Johnson [1]. Proof of the Dynamic Programming for Marginal Inference We prove the correctness of the dynamic programming algorithm for computing the marginal permutation matrix of separable permutations by induction as follows. As a base case, each word (i.e., segment with length 1) is associated with an identity permutation matrix 1 . In the structured reordering module, we compute the scores for BTG production rules using span 2 Figure 1: The detailed architecture of our seq2seq model for semantic parsing (view in color). First, the structured reordering module genearates a (relaxed) permutation matrix given the input utterrance.
From Surface to Semantics: Semantic Structure Parsing for Table-Centric Document Analysis
Li, Xuan, Dong, Jialiang, Wong, Raymond
Documents are core carriers of information and knowl-edge, with broad applications in finance, healthcare, and scientific research. Tables, as the main medium for structured data, encapsulate key information and are among the most critical document components. Existing studies largely focus on surface-level tasks such as layout analysis, table detection, and data extraction, lacking deep semantic parsing of tables and their contextual associations. This limits advanced tasks like cross-paragraph data interpretation and context-consistent analysis. To address this, we propose DOTABLER, a table-centric semantic document parsing framework designed to uncover deep semantic links between tables and their context. DOTABLER leverages a custom dataset and domain-specific fine-tuning of pre-trained models, integrating a complete parsing pipeline to identify context segments semantically tied to tables. Built on this semantic understanding, DOTABLER implements two core functionalities: table-centric document structure parsing and domain-specific table retrieval, delivering comprehensive table-anchored semantic analysis and precise extraction of semantically relevant tables. Evaluated on nearly 4,000 pages with over 1,000 tables from real-world PDFs, DOTABLER achieves over 90% Precision and F1 scores, demonstrating superior performance in table-context semantic analysis and deep document parsing compared to advanced models such as GPT-4o.
A Computational Approach to Analyzing Language Change and Variation in the Constructed Language Toki Pona
This study explores language change and variation in Toki Pona, a constructed language with approximately 120 core words. Taking a computational and corpus-based approach, the study examines features including fluid word classes and transitivity in order to examine (1) changes in preferences of content words for different syntactic positions over time and (2) variation in usage across different corpora. The results suggest that sociolinguistic factors influence Toki Pona in the same way as natural languages, and that even constructed linguistic systems naturally evolve as communities use them.
Appendices for Submission # 2981 678 Below we include additional implementation details, experimental results, as well as findings and
Signature:param media_id::param self: bot:param text: text of message:param user_ids: list of user_ids for creating group or one user_id for send to one person:param thread_id: thread_id Multi-sentence Assumed called on Travis, to prepare a package to be deployed This method prints on stdout for Travis. Return is obj to pass to sys.exit() directly Noisy bandwidths are inaccurate, as we don't account for parallel transfers here Table 5: Example queries that were not included due to query parsing errors